There are several ways to check the status of your jobs in the queue.  Below are a few SLURM commands to make use of.  Use the Linux 'man' command to find loads of additional information about these commands as well.


NOTE: Slurm commands use what is set as the default cluster on the server you're logged into.  For most servers at CCR, this is the ub-hpc cluster.  To run these commands against the faculty cluster, use the '-M cluster' option:  -M faculty


squeue - Show the State of Jobs in the Queue

squeue <flags>

  • -u username
  • -j jobid
  • -p partition
  • -q qos


Example:
[ccruser@vortex:/ifs/user/ccruser]$ squeue -u cdc
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
92311 debug test cdc R 0:08 2 d09n29s02,d16n02
88915 general-c GPU_test cdc PD 0:00 1 (Priority)
91716 general-c hello_te cdc PD 0:00 2 (Priority)
91791 general-c hello_te cdc PD 0:00 2 (Priority)
91792 general-c hello_te cdc PD 0:00 2 (Priority)




squeue or stimes - Show the Estimated Start Time of a Job


squeue <flags> --start
Shows only jobs in PD (pending) state

Example:


[ccruser@vortex:/ifs/user/ccruser]$ squeue -u cdc --start
JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON)
88915 general-c GPU_test cdc PD 2013-07-09T13:09:40 1 (Priority)
91487 general-c hello_te cdc PD N/A 2 (Priority)


[ccruser@vortex:/ifs/user/ccruser]$ squeue -j 91487 --start
JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON)
91487 general-c hello_te cdc PD N/A 2 (Priority)



or use stimes for more detailed information:

stimes <flags - same as squeue>



Example:


[ccruser@vortex:/]$ stimes -u xdtas
JOBID USER PARTITION JOB_NAME REQUEST_TIME NODES CPUS REASON_FOR_WAIT PRIORITY JOB_STARTS_IN
4753092 xdtas general-compute xdmod.benchm 14:00 16 192 (Resources) 1.16 2.11 days
4753045 xdtas general-compute xdmod.app.md 2:00 8 96 (Resources) 1.12 2.11 days
4753053 xdtas general-compute xdmod.app.ch 2:00 8 96 (Resources) 1.12 2.11 days
4753060 xdtas general-compute xdmod.app.as 43:00 8 96 (Resources) 1.12 2.11 days
4753098 xdtas general-compute xdmod.benchm 30:00 8 96 (Resources) 1.11 2.11 days
4753068 xdtas general-compute xdmod.app.md 3:00 4 48 (Resources) 1.09 2.11 days
4753085 xdtas general-compute xdmod.benchm 9:00 4 48 (Resources) 1.08 2.11 days
4753099 xdtas general-compute xdmod.app.as 55:00 4 48 (Resources) 1.08 2.11 days
4753114 xdtas general-compute xdmod.app.ch 3:00 4 48 (Resources) 1.08 2.11 days
4753123 xdtas general-compute xdmod.benchm 15:00 4 48 (Resources) 1.08 2.11 days
4753052 xdtas general-compute xdmod.app.ch 3:00 2 24 (Resources) 1.08 1.56 days
4753155 xdtas general-compute xdmod.benchm 2:00 4 48 (Resources) 1.07 1.56 days
4753070 xdtas general-compute xdmod.app.ch 4:00 1 12 (Resources) 1.07 1.40 days
4753115 xdtas general-compute xdmod.benchm 7:00 2 24 (Resources) 1.07 1.57 days
4753121 xdtas general-compute xdmod.app.md 4:00 2 24 (Resources) 1.06 1.57 days
4753122 xdtas general-compute xdmod.benchm 8:00 2 24 (Resources) 1.06 1.57 days
4753134 xdtas general-compute xdmod.app.as 1:03:00 2 24 (Resources) 1.06 1.58 days
4753164 xdtas general-compute xdmod.app.md 6:00 1 12 (Resources) 1.05 1.40 days




squeue - Show Jobs Running on Compute Nodes

squeue --nodelist=f16n35,f16n37




 squeue - Job States
  • R - Job is running on compute nodes
  • PD - Job is waiting on compute nodes
  • CG - Job is completing

squeue - Job Reasons
  • (Resources) - Job is waiting for compute nodes to become available
  • (Priority) - Jobs with higher priority are waiting for compute nodes.  Check this knowledge base article for info about job priority
  • (ReqNodeNotAvail) - The compute nodes requested by the job are not available for a variety of reasons, including:
    • cluster downtime
    • nodes offline
    • temporary scheduling backlog



sinfo - Show the State of Nodes

sinfo -p partition


Example:
[ccruser@vortex:/ifs/user/ccruser]$ sinfo -p debug
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up 1:00:00 1 alloc k05n26
debug up 1:00:00 3 idle d09n29s02,d16n[02-03]




snodes - Show Node State and Feature Details

snodes all <cluster>/<partition>


Example:
[ccruser@vortex:/ifs/user/ccruser]$ snodes all general-compute | more
HOSTNAMES STATE CPUS S:C:T CPUS(A/I/O/T) CPU_LOAD MEMORY GRES PARTITION FEATURES
d07n04s01 alloc 8 2:4:1 8/0/0/8 8.02 24000 (null) general-compute* IB,CPU-L5630
d07n04s02 alloc 8 2:4:1 8/0/0/8 7.97 24000 (null) general-compute* IB,CPU-L5630

...




sinfo and snodes - Node States

  • idle- all cores are available on the compute node
    • no jobs are running on the compute node
  • mix - at least one core is available on the compute node
    • compute node has one or more jobs running on it
  • alloc - all cores on the compute node are assigned to jobs



How to check the status of running jobs


How to cancel running or pending jobs