The batch scheduler is configured with a number of scheduling policies to keep in mind. The policies attempt to balance the competing objectives of reasonable queue wait times and efficient system utilization. The details of these policies differ slightly on each system. Exceptions to the limits can not be granted as they adversely affect other users' jobs as well as the scheduler. However, we may be able to offer suggestions on how to get the best throughput for your jobs within the confines of the policies. Please email ccr-help for assistance.
Each system differs in the number of processors (cores) and the amount of memory and disk they have per node. We commonly find jobs waiting in the queue that cannot be run on the system where they were submitted because their resource requests exceed the limits of the available hardware. Jobs never migrate between clusters or partitions, so please pay attention to these limits. CCR currently has more standard (general-compute) nodes and only a small number of large memory and GPU nodes. Your jobs are likely to wait in the queue much longer for a large memory or GPU node than for a standard node. Users often inadvertently request more memory than is available on a standard node and end up waiting for one of the scarce large memory nodes, so check your requests carefully. More details on specifying node types in your job script, can be found here
Walltime limits per job
Walltime limits vary per system. The academic (ub-hpc) and industry clusters have a maximum walltime of 72 hours. The PI clusters have a maximum walltime of 30 days (unless the owner has requested a shorter time). If your jobs require longer run times, you will need to utilize a form of checkpointing where your job can be picked up where it left off and continue its calculations. More details about checkpointing can be found here.
Limits per user
These limits are applied separately on each system and also vary by partition. Jobs submitted in excess of these limits are rejected by the scheduler.
|Cluster||Partition||Limit (# of jobs allowed per user: queued & running)|
|Industry||compute (industry users only)||1000|
|scavenger (academic users)||1000|
|PI clusters (MAE, Physics, Chemistry)||All PI partition users||1000|
|scavenger (academic users)||1000|
Short jobs for debugging
A small number of nodes are set aside in the debug partition of the academic (ub-hpc) cluster with a walltime limit of 1 hour. Users are permitted to submit 4 jobs to the debug partition at one time. These nodes are intended for use in testing codes and setting up job parameters to run on a larger scale. They are not intended for users to use long-term. If we believe this policy is being abused, the user will be warned and, if excessive use continues, access to the debug partition will be blocked.
All GPU nodes are reserved for jobs that request GPUs - meaning you must specify --gres=gpu:1 or more in your job script. Users are permitted to submit 32 jobs to the GPU partition at one time. If users are found not to be utilizing the GPU properly in their scripts, they may be blocked from submitting jobs to the GPU partition. More details on specifying node types in your job script, can be found here
Large Memory Jobs
The largemem partition of the academic cluster offers nodes with 32 cores and memory of either 256GB or 512GB. Users are permitted to submit 32 jobs to the largemem partition at one time. Wait times are generally long for these nodes so we encourage users to ensure they actually need this much RAM before submitting to this partition. If users are found not to be utilizing all the memory requested in their scripts, they may be blocked from submitting jobs to the largemem partition. More details on specifying node types in your job script, can be found here
There are scavenger partitions on the industry cluster as well as all the faculty clusters. These partitions provide a way for users to access idle nodes. Jobs submitted to the scavenger partition will run when there are no other pending or running jobs in the compute partitions on these clusters. Once a user with access to that cluster submits a job requesting resources, jobs in the scavenger partition are stopped and re-queued. This means if you're running a job in the scavenger partition on the industry cluster and an industry user submits a job requiring the resources you're consuming, your job will be stopped. Use of the scavenger partition requires advanced setup and knowledge and is a privilege. If your jobs are determined to cause problems on any of the private cluster nodes and we receive complaints from the owners of those nodes, your access to the scavenger partitions will be removed. More details on using idle nodes in the scavenger partitions can be found here.
To keep any one user or group from monopolizing the system when others need the same resources, the scheduler imposes what are known as fair-share limits. If a user or group uses large amounts of computing resources over a period of a month, any new jobs they submit during that period will have reduced priority.
The priority of a job is influenced by many factors, including the processor count requested, the length of time the job has been waiting, and how much other computing has been done by the user and their group over the last month (fairshare). However, having the highest priority does not necessarily mean that a job will run immediately, as there must also be enough processors and memory available to run it. See more details on how priority is calculated and how to check the priority of your pending job(s)