In addition to CCR's large production cluster and visualization hardware which are freely available to UB and affiliated researchers, CCR maintains a wide variety of project-specific storage systems and compute clusters.   Faculty (PI) partitions are resources purchased by faculty for use by a specific research group.  These nodes are not available to all CCR users.  If you are interested in purchasing your own equipment, please see this page for more details.


The faculty partitions are grouped in clusters according to department.  When issuing a SLURM command, users must specify the cluster name.  In most cases, a partition must be specified as well.  If a cluster is not specified, then the default will be the CCR cluster, UB-HPC.



Cluster
Partition
chemistry
 
 
amurkin
 
beta
 
coppens
 
ezurek
 
jbb6
 
jochena
 
mdupuis2
 
technetium
 
valhalla
 
webmo-class
mae
 
 
compbio

copernicus

davidsal
 
dsmackay

estellec

gbc

geosolver
 
isecc
 
ped3
 
planex
 
psingla

thales

vidia
physics
 

ciaranwi
 
pzhang3
 
srrappoc
 
wjzheng



Important!

The following instructions assume a basic familiarity with using SLURM on the CCR cluster.  If you are a new user, then please read material on the Introductory SLURM page first.



Specifying a Cluster and Partition

Use the following SLURM flags to specify a cluster:

-M cluster_name

or

--clusters=cluster_name


Use the following SLURM flags to specify a partition and qos within a cluster:

--partition=partition_name

--qos=qos_name


See list of QOS names here


Submitting Jobs to a PI partition

Use a cluster flag and a partition flag when submitting a job.



vi test_script.sh


#!/bin/sh
#SBATCH --partition=partition_name --qos=qos_name
#SBATCH --clusters=cluster_name
#SBATCH --account=account_name
#SBATCH --time=00:15:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
##SBATCH --mem=24000
#SBATCH --job-name="hello_test"
#SBATCH --output=test.out
#SBATCH --mail-user=username@buffalo.edu
#SBATCH --mail-type=ALL

 

Viewing the Status of Jobs in a Partition or Cluster

Use a cluster flag to view jobs running in any partition within the cluster.  The partition flag with show the jobs in that specific partition.

squeue -M cluster_name

squeue --clusters=cluster_name

squeue -M cluster_name -p partition_name -q qos_name

squeue --clusters=cluster_name --partition=partition_name --qos=qos_name

 

The following will show jobs on all clusters and partitions:

squeue -M all


If I wanted to view the status of my jobs running on any cluster and partition, then I would use the following command:

squeue -M all -u cdc

 

The graphical monitor slurmjobvis can be used to view jobs running on a faculty partition:

slurmjobvis jobid cluster


Example:

slurmjobvis 14527 chemistry

 


[cdc@vortex:~]$ squeue -M all -u cdc
CLUSTER: chemistry
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               319   coppens hello_te      cdc  R       0:33      2 m24n40s[01-02]
               320    ezurek hello_te      cdc PD       0:00      1 (Priority)

CLUSTER: mae
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

CLUSTER: physics
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               606   pzhang3 hello_te      cdc PD       0:00      4 (Priority)

CLUSTER: ub-hpc
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1248400     debug hello_te      cdc  R       0:16      3 d07n33s01,d16n[02-03]
           1248401 general-c hello_te      cdc PD       0:00      8 (Priority)
[cdc@vortex:~]$ 

[cdc@vortex:~]$ squeue -M chemistry -u cdc
CLUSTER: chemistry
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               319   coppens hello_te      cdc  R       0:55      2 m24n40s[01-02]
               320    ezurek hello_te      cdc PD       0:00      1 (Priority)
[cdc@vortex:~]$ squeue -M chemistry -p ezurek -u cdc
CLUSTER: chemistry
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               320    ezurek hello_te      cdc PD       0:00      1 (Priority)
[cdc@vortex:~]$ 


Cancelling a Job

Use the cluster flag when canceling a job:

 

scancel -M cluster_name jobid

 

scancel --clusters jobid

 


[cdc@vortex:~]$ scancel -M chemistry 320
[cdc@vortex:~]$ scancel -M physics 606


Status of Nodes

Use the sinfo or snodes commands to check that status of nodes in a cluster or partition.:

 

sinfo -M cluster_name

 

sinfo -M cluster_name -p partition_name

 

snodes all cluster_name/all

 

snodes all cluster_name/partition_name

 

snodes all cluster_name/partition_name idle


[cdc@vortex:~]$ sinfo -M chemistry
CLUSTER: chemistry
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
jochena        up 125-00:00:     19    mix d11n[13,19-20],d12n[14-15],f15n[03-15,17]
jochena        up 125-00:00:      5   idle f15n[16,18,23-25]
ezurek         up 125-00:00:     26    mix d11n[01-12,17-18],d12n[02-13]
jbb6           up 28-00:00:0      2  alloc f15n[37-38]
jbb6           up 28-00:00:0      2   idle f15n[39-40]
coppens        up 28-00:00:0      1  alloc m24n34s01
coppens        up 28-00:00:0     15   idle m24n34s02,m24n35s[01-02],m24n36s[01-02],
m24n37s[01-02],m24n38s[01-02],m24n39s[01-02], m24n40s[01-02],m24n41s[01-02]
webmo-class    up 125-00:00:      2  alloc f15n[33,35]
webmo-class    up 125-00:00:      2   down f15n[34,36]
amurkin        up 3-00:00:00      1   idle d11n15
[cdc@vortex:~]$ 

[cdc@vortex:~]$ sinfo -M chemistry -p coppens
CLUSTER: chemistry
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
coppens      up 28-00:00:0      1  alloc m24n34s01
coppens      up 28-00:00:0     15   idle m24n34s02,m24n35s[01-02],m24n36s[01-02],
m24n37s[01-02],m24n38s[01-02],m24n39s[01-02],m24n40s[01-02],m24n41s[01-02]
[cdc@vortex:~]$ 


Access a PI Partition using salloc

The salloc and srun commands do not accept any cluster flags.  The default will be to use the CCR cluster, UB-HPC.  Here are instructions to use salloc on the mae cluster as an example:

 

export SLURM_CONF=/util/academic/slurm/conf/mae/slurm.conf

 


 

This will set the default SLURM cluster to the mae cluster.  Now both salloc and srun will use the mae cluster, rather than the CCR UB-HPC cluster.  Once SLURM_CONF is set to a PI cluster, then the -M ub-hpc flag or --clusters=ub-hpc flag must be used to submit jobs to the CCR cluster.  You can return to the original cluster default by unsetting the variable.

 

unset SLURM_CONF