There are several ways to check the status of your jobs in the queue.  Below are a few SLURM commands to make use of.  Use the Linux 'man' command to find loads of additional information about these commands as well.



squeue - Show the State of Jobs in the Queue

squeue <flags>

  • -u username
  • -j jobid
  • -p partition
  • -q qos


Example:
[ccruser@vortex:/ifs/user/ccruser]$ squeue -u cdc
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
  92311     debug     test      cdc   R       0:08      2 d09n29s02,d16n02
  88915 general-c GPU_test      cdc  PD       0:00      1 (Priority)
  91716 general-c hello_te      cdc  PD       0:00      2 (Priority)
  91791 general-c hello_te      cdc  PD       0:00      2 (Priority)
  91792 general-c hello_te      cdc  PD       0:00      2 (Priority)




squeue or stimes - Show the Estimated Start Time of a Job


squeue <flags> --start
Shows only jobs in PD (pending) state
 
Example:
[ccruser@vortex:/ifs/user/ccruser]$ squeue -u cdc --start
  JOBID PARTITION     NAME     USER  ST           START_TIME  NODES NODELIST(REASON)
  88915 general-c GPU_test      cdc  PD  2013-07-09T13:09:40      1 (Priority)
  91487 general-c hello_te      cdc  PD                  N/A      2 (Priority)


[ccruser@vortex:/ifs/user/ccruser]$ squeue -j 91487 --start
  JOBID PARTITION     NAME     USER  ST           START_TIME  NODES NODELIST(REASON)
  91487 general-c hello_te      cdc  PD                  N/A      2 (Priority)



or use stimes for more detailed information:

stimes <flags - same as squeue>


Example:


[ccruser@vortex:/]$ stimes -u xdtas
JOBID       USER      PARTITION        JOB_NAME      REQUEST_TIME NODES  CPUS REASON_FOR_WAIT    PRIORITY   JOB_STARTS_IN
4753092     xdtas     general-compute  xdmod.benchm         14:00    16   192 (Resources)            1.16     2.11 days
4753045     xdtas     general-compute  xdmod.app.md          2:00     8    96 (Resources)            1.12     2.11 days
4753053     xdtas     general-compute  xdmod.app.ch          2:00     8    96 (Resources)            1.12     2.11 days
4753060     xdtas     general-compute  xdmod.app.as         43:00     8    96 (Resources)            1.12     2.11 days
4753098     xdtas     general-compute  xdmod.benchm         30:00     8    96 (Resources)            1.11     2.11 days
4753068     xdtas     general-compute  xdmod.app.md          3:00     4    48 (Resources)            1.09     2.11 days
4753085     xdtas     general-compute  xdmod.benchm          9:00     4    48 (Resources)            1.08     2.11 days
4753099     xdtas     general-compute  xdmod.app.as         55:00     4    48 (Resources)            1.08     2.11 days
4753114     xdtas     general-compute  xdmod.app.ch          3:00     4    48 (Resources)            1.08     2.11 days
4753123     xdtas     general-compute  xdmod.benchm         15:00     4    48 (Resources)            1.08     2.11 days
4753052     xdtas     general-compute  xdmod.app.ch          3:00     2    24 (Resources)            1.08     1.56 days
4753155     xdtas     general-compute  xdmod.benchm          2:00     4    48 (Resources)            1.07     1.56 days
4753070     xdtas     general-compute  xdmod.app.ch          4:00     1    12 (Resources)            1.07     1.40 days
4753115     xdtas     general-compute  xdmod.benchm          7:00     2    24 (Resources)            1.07     1.57 days
4753121     xdtas     general-compute  xdmod.app.md          4:00     2    24 (Resources)            1.06     1.57 days
4753122     xdtas     general-compute  xdmod.benchm          8:00     2    24 (Resources)            1.06     1.57 days
4753134     xdtas     general-compute  xdmod.app.as       1:03:00     2    24 (Resources)            1.06     1.58 days
4753164     xdtas     general-compute  xdmod.app.md          6:00     1    12 (Resources)            1.05     1.40 days




squeue - Show Jobs Running on Compute Nodes

squeue --nodelist=f16n35,f16n37




 squeue - Job States
  • R - Job is running on compute nodes
  • PD - Job is waiting on compute nodes
  • CG - Job is completing

squeue - Job Reasons
  • (Resources) - Job is waiting for compute nodes to become available
  • (Priority) - Jobs with higher priority are waiting for compute nodes.  Check this knowledge base article for info about job priority
  • (ReqNodeNotAvail) - The compute nodes requested by the job are not available for a variety of reasons, including:
    • cluster downtime
    • nodes offline
    • temporary scheduling backlog



sinfo - Show the State of Nodes

sinfo -p partition


Example:
[ccruser@vortex:/ifs/user/ccruser]$ sinfo -p debug
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug        up    1:00:00      1  alloc k05n26
debug        up    1:00:00      3   idle d09n29s02,d16n[02-03]




snodes - Show Node State and Feature Details

snodes all <cluster>/<partition>


Example:
[ccruser@vortex:/ifs/user/ccruser]$ snodes all general-compute | more
HOSTNAMES  STATE    CPUS S:C:T    CPUS(A/I/O/T)   CPU_LOAD MEMORY   GRES     PARTITION          FEATURES
d07n04s01  alloc    8    2:4:1    8/0/0/8         8.02     24000    (null)   general-compute*   IB,CPU-L5630
d07n04s02  alloc    8    2:4:1    8/0/0/8         7.97     24000    (null)   general-compute*   IB,CPU-L5630
...




sinfo and snodes - Node States

  • idle- all cores are available on the compute node
    • no jobs are running on the compute node
  • mix - at least one core is available on the compute node
    • compute node has one or more jobs running on it
  • alloc - all cores on the compute node are assigned to jobs



How to check the status of running jobs


How to cancel running or pending jobs