There are several ways to check the status of your jobs in the queue. Below are a few SLURM commands to make use of. Use the Linux 'man' command to find loads of additional information about these commands as well.
NOTE: Slurm commands use what is set as the default cluster on the server you're logged into. For most servers at CCR, this is the ub-hpc cluster. To run these commands against the faculty cluster, use the '-M cluster' option: -M faculty |
squeue - Show the State of Jobs in the Queue
squeue <flags>
- -u username
- -j jobid
- -p partition
- -q qos
Example:
[ccruser@vortex:/ifs/user/ccruser]$ squeue -u cdc
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
92311 debug test cdc R 0:08 2 d09n29s02,d16n02
88915 general-c GPU_test cdc PD 0:00 1 (Priority)
91716 general-c hello_te cdc PD 0:00 2 (Priority)
91791 general-c hello_te cdc PD 0:00 2 (Priority)
91792 general-c hello_te cdc PD 0:00 2 (Priority)
squeue or stimes - Show the Estimated Start Time of a Job
squeue <flags> --start
Shows only jobs in PD (pending) state
Example:
[ccruser@vortex:/ifs/user/ccruser]$ squeue -u cdc --start
JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON)
88915 general-c GPU_test cdc PD 2013-07-09T13:09:40 1 (Priority)
91487 general-c hello_te cdc PD N/A 2 (Priority)
[ccruser@vortex:/ifs/user/ccruser]$ squeue -j 91487 --start
JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON)
91487 general-c hello_te cdc PD N/A 2 (Priority)
or use stimes for more detailed information:
stimes <flags - same as squeue>
Example:
[ccruser@vortex:/]$ stimes -u xdtas
JOBID USER PARTITION JOB_NAME REQUEST_TIME NODES CPUS REASON_FOR_WAIT PRIORITY JOB_STARTS_IN
4753092 xdtas general-compute xdmod.benchm 14:00 16 192 (Resources) 1.16 2.11 days
4753045 xdtas general-compute xdmod.app.md 2:00 8 96 (Resources) 1.12 2.11 days
4753053 xdtas general-compute xdmod.app.ch 2:00 8 96 (Resources) 1.12 2.11 days
4753060 xdtas general-compute xdmod.app.as 43:00 8 96 (Resources) 1.12 2.11 days
4753098 xdtas general-compute xdmod.benchm 30:00 8 96 (Resources) 1.11 2.11 days
4753068 xdtas general-compute xdmod.app.md 3:00 4 48 (Resources) 1.09 2.11 days
4753085 xdtas general-compute xdmod.benchm 9:00 4 48 (Resources) 1.08 2.11 days
4753099 xdtas general-compute xdmod.app.as 55:00 4 48 (Resources) 1.08 2.11 days
4753114 xdtas general-compute xdmod.app.ch 3:00 4 48 (Resources) 1.08 2.11 days
4753123 xdtas general-compute xdmod.benchm 15:00 4 48 (Resources) 1.08 2.11 days
4753052 xdtas general-compute xdmod.app.ch 3:00 2 24 (Resources) 1.08 1.56 days
4753155 xdtas general-compute xdmod.benchm 2:00 4 48 (Resources) 1.07 1.56 days
4753070 xdtas general-compute xdmod.app.ch 4:00 1 12 (Resources) 1.07 1.40 days
4753115 xdtas general-compute xdmod.benchm 7:00 2 24 (Resources) 1.07 1.57 days
4753121 xdtas general-compute xdmod.app.md 4:00 2 24 (Resources) 1.06 1.57 days
4753122 xdtas general-compute xdmod.benchm 8:00 2 24 (Resources) 1.06 1.57 days
4753134 xdtas general-compute xdmod.app.as 1:03:00 2 24 (Resources) 1.06 1.58 days
4753164 xdtas general-compute xdmod.app.md 6:00 1 12 (Resources) 1.05 1.40 days
squeue - Show Jobs Running on Compute Nodes
squeue --nodelist=f16n35,f16n37
squeue - Job States
- R - Job is running on compute nodes
- PD - Job is waiting on compute nodes
- CG - Job is completing
squeue - Job Reasons
- (Resources) - Job is waiting for compute nodes to become available
- (Priority) - Jobs with higher priority are waiting for compute nodes. Check this knowledge base article for info about job priority
- (ReqNodeNotAvail) - The compute nodes requested by the job are not available for a variety of reasons, including:
- cluster downtime
- nodes offline
- temporary scheduling backlog
sinfo - Show the State of Nodes
sinfo -p partition
Example:
[ccruser@vortex:/ifs/user/ccruser]$ sinfo -p debug
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up 1:00:00 1 alloc k05n26
debug up 1:00:00 3 idle d09n29s02,d16n[02-03]
snodes - Show Node State and Feature Details
snodes all <cluster>/<partition>
Example:
[ccruser@vortex:/ifs/user/ccruser]$ snodes all general-compute | more
HOSTNAMES STATE CPUS S:C:T CPUS(A/I/O/T) CPU_LOAD MEMORY GRES PARTITION FEATURES
d07n04s01 alloc 8 2:4:1 8/0/0/8 8.02 24000 (null) general-compute* IB,CPU-L5630
d07n04s02 alloc 8 2:4:1 8/0/0/8 7.97 24000 (null) general-compute* IB,CPU-L5630
...
sinfo and snodes - Node States
- idle- all cores are available on the compute node
- no jobs are running on the compute node
- mix - at least one core is available on the compute node
- compute node has one or more jobs running on it
- alloc - all cores on the compute node are assigned to jobs
How to check the status of running jobs