During the June 30, 2020 maintenance downtime the academic (UB-HPC) cluster partition layout will change.
Currently, the cluster nodes are separated into many partitions; debug, viz, general-compute, skylake, cascade, gpu, and largemem. The general-compute partition will now contain all nodes from the skylake, cascade, gpu, and largemem partitions and these partitions & associated qos values will be deleted. The debug and viz partitions will remain the same.
Why are we doing this?
There are historical reasons for the separation into many partitions, but at this time, thanks to modifications of the scheduling software over time, our capability to tag resources to allow users to request specific hardware to run on, and our ability to monitor cluster usage in minute detail, we've decided to merge most of these partitions into one. This should improve the efficiency of the scheduler, encourage users to request only the resources they need, and decrease the wait times for some jobs.
- Any jobs pending in the queue at the start of the downtime directed to any of the partitions being deleted will be removed. You will need to resubmit them.
- You will need to change your scripts to remove any partitions and qos values that we're deleting. Most jobs should now be directed to the general-compute partition. Since this is the default partition on the academic cluster, you do not need to specify it.
- If you have access to any priority boost qos values (nih, mri, supporters) you may use them on the general-compute partition
- If you care what type of CPU your jobs run on, you will need to update your scripts to include the --constraint slurm directive and specify the CPU type. More details
- Slurm features are a way to specify exact hardware requests for your jobs. This is especially useful for CPU and GPU types, fast network interconnects, and even to single out hardware purchased under different grants (NIH, MRI). More details
You can start using Slurm features right NOW! Partition and QOS will still be required for the skylake, cascade, gpu, largemem partitions until after the June 30th downtime.