In addition to CCR's large production cluster and visualization hardware which are freely available to UB and affiliated researchers, CCR maintains a wide variety of project-specific storage systems and compute clusters.  If you are a UB faculty researcher and are interested in buying your own HPC computing or storage resources, we encourage you to contact us.   The machine room space, cooling, and networking available at CCR along with the technical expertise in system administration and programming allow UB researchers to devote their time to important research rather than cluster/systems maintenance.  If the systems to be maintained fit in well with the overall architecture of CCR, then the additional staff time and effort spent maintaining them will be minimized and there will not need to be any additional associated maintenance costs.  If the systems are instead special purpose or otherwise outside CCR core expertise, then some provision will need to be made for additional staff time and maintenance costs.  As always CCR is open to discussing your research needs, just contact us and we will be happy to consult with you. 


Faculty (PI) partitions are resources purchased by faculty for use by a specific research group.  These nodes are not available to all CCR users.


The faculty partitions are grouped in clusters according to department.  When issuing a SLURM command, users must specify the cluster name.  In most cases, a partition must be specified as well.  If a cluster is not specified, then the default will be the CCR cluster, UB-HPC.



Cluster Partition
chemistry  
  amurkin
  beta
  coppens
  ezurek
  jbb6
  jochena
  mdupuis2
  technetium
  valhalla
  webmo-class
mae  
  compbio
  davidsal
  isecc
  ped3
  planex
  psingla
physics  
  pzhang3
  srrappoc
  wjzheng



Important!

The following instructions assume a basic familiarity with using SLURM on the CCR cluster.  If you are a new user, then please read material on the Introductory SLURM page first.



Specifying a Cluster and Partition

Use the following SLURM flags to specify a cluster:

-M cluster_name

or

--clusters=cluster_name


Use the following SLURM flags to specify a partition or qos within a cluster:

-p partition_name

or

-q qos_name

or

--partition=partition_name

--qos=qos_name


See list of QOS names here


Submitting Jobs to a PI partition

Use a cluster flag and a partition flag when submitting a job.


vi test_script.sh


#!/bin/sh

#SBATCH --partition=partition_name --qos=qos_name

#SBATCH --clusters=cluster_name

#SBATCH --account=account_name

#SBATCH --time=00:15:00

#SBATCH --nodes=2

#SBATCH --ntasks-per-node=8

##SBATCH --mem=24000

#SBATCH --job-name="hello_test"

#SBATCH --output=test.out

#SBATCH --mail-user=username@buffalo.edu

#SBATCH --mail-type=ALL

 

Viewing the Status of Jobs in a Partition or Cluster

Use a cluster flag to view jobs running in any partition within the cluster.  The partition flag with show the jobs in that specific partition.

squeue -M cluster_name

squeue --clusters=cluster_name

squeue -M cluster_name -p partition_name -q qos_name

squeue --clusters=cluster_name --partition=partition_name --qos=qos_name

 

The following will show jobs on all clusters and partitions:

squeue -M all


If I wanted to view the status of my jobs running on any cluster and partition, then I would use the following command:

squeue -M all -u cdc

 

The graphical monitor slurmjobvis can be used to view jobs running on a faculty partition:

slurmjobvis jobid cluster


Example:

slurmjobvis 14527 chemistry

 


[cdc@rush:~]$ squeue -M all -u cdc
CLUSTER: chemistry
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               319   coppens hello_te      cdc  R       0:33      2 m24n40s[01-02]
               320    ezurek hello_te      cdc PD       0:00      1 (Priority)

CLUSTER: mae
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

CLUSTER: physics
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               606   pzhang3 hello_te      cdc PD       0:00      4 (Priority)

CLUSTER: ub-hpc
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1248400     debug hello_te      cdc  R       0:16      3 d07n33s01,d16n[02-03]
           1248401 general-c hello_te      cdc PD       0:00      8 (Priority)
[cdc@rush:~]$ 

[cdc@rush:~]$ squeue -M chemistry -u cdc
CLUSTER: chemistry
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               319   coppens hello_te      cdc  R       0:55      2 m24n40s[01-02]
               320    ezurek hello_te      cdc PD       0:00      1 (Priority)
[cdc@rush:~]$ squeue -M chemistry -p ezurek -u cdc
CLUSTER: chemistry
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               320    ezurek hello_te      cdc PD       0:00      1 (Priority)
[cdc@rush:~]$ 


Cancelling a Job

Use the cluster flag when canceling a job:

scancel -M cluster_name jobid

scancel --clusters jobid


[cdc@rush:~]$ scancel -M chemistry 320
[cdc@rush:~]$ scancel -M physics 606


Status of Nodes

Use the sinfo or snodes commands to check that status of nodes in a cluster or partition.:

sinfo -M cluster_name

sinfo -M cluster_name -p partition_name

snodes all cluster_name/all

snodes all cluster_name/partition_name

snodes all cluster_name/partition_name idle


[cdc@rush:~]$ sinfo -M chemistry
CLUSTER: chemistry
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
jochena        up 125-00:00:     19    mix d11n[13,19-20],d12n[14-15],f15n[03-15,17]
jochena        up 125-00:00:      5   idle f15n[16,18,23-25]
ezurek         up 125-00:00:     26    mix d11n[01-12,17-18],d12n[02-13]
jbb6           up 28-00:00:0      2  alloc f15n[37-38]
jbb6           up 28-00:00:0      2   idle f15n[39-40]
coppens        up 28-00:00:0      1  alloc m24n34s01
coppens        up 28-00:00:0     15   idle m24n34s02,m24n35s[01-02],m24n36s[01-02],
m24n37s[01-02],m24n38s[01-02],m24n39s[01-02], m24n40s[01-02],m24n41s[01-02]
webmo-class    up 125-00:00:      2  alloc f15n[33,35]
webmo-class    up 125-00:00:      2   down f15n[34,36]
amurkin        up 3-00:00:00      1   idle d11n15
[cdc@rush:~]$ 

[cdc@rush:~]$ sinfo -M chemistry -p coppens
CLUSTER: chemistry
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
coppens      up 28-00:00:0      1  alloc m24n34s01
coppens      up 28-00:00:0     15   idle m24n34s02,m24n35s[01-02],m24n36s[01-02],
m24n37s[01-02],m24n38s[01-02],m24n39s[01-02],m24n40s[01-02],m24n41s[01-02]
[cdc@rush:~]$ 


Access a PI Partition using salloc

The salloc and srun commands do not accept any cluster flags.  The default will be to use the CCR cluster, UB-HPC.  Here are instructions to use salloc on the mae cluster as an example:

export SLURM_CONF=/util/academic/slurm/conf/mae/slurm.conf


This will set the default SLURM cluster to the mae cluster.  Now both salloc and srun will use the mae cluster, rather than the CCR UB-HPC cluster.  Once SLURM_CONF is set to a PI cluster, then the -M ub-hpc flag or --clusters=ub-hpc flag must be used to submit jobs to the CCR cluster.  You can return to the original cluster default by unsetting the variable.

unset SLURM_CONF