Is your batch job pending in the queue with this reason? QOSMaxNodePerJobLimit
This is because on the GPU partition of the UB-HPC cluster, users are restricted to submitting jobs that request one node only. If you submit a job requesting more than 1 node, the job will sit in the queue as a pending job with this error message. You must cancel the job and resubmit it with only one node.
If a user can provide a valid reason for needing to use more than one GPU node per job, they may be permitted an exception to this restriction. If you'd like to be considered, please submit a help ticket with a justification for your use case including scaling results. Please also explain what type of research you will be using this for, software you will be using, any previous experience using GPU servers, and examples of jobs you've previously run on the CCR GPU nodes.