Is your batch job pending in the queue with this reason? QOSMaxNodePerJobLimit
This is because on the GPU partition of the UB-HPC cluster, users are restricted to submitting jobs that request one node only. If you submit a job requesting more than 1 node, the job will sit in the queue as a pending job with this error message. You must cancel the job and resubmit it with only one node.
If a user can provide a valid reason for needing to use more than one GPU node per job, they may be permitted an exception to this restriction. If you'd like to be considered, please submit a help ticket with a justification for your use case. Please explain what type of research you will be using this for and what software you will be using to accomplish this. If you have previous experience using GPU servers or examples of jobs you've previously run on the CCR GPU nodes, please include that information in your justification.