RESOLVED: 8/5/20: Fisbatch and SSH to nodes not working

UPDATE - 8/7/20:  A patch has been applied to our systems so that these issues are now resolved.  Please contact CCR Help if you continue to have problems.


8/6/20:  This is a bug discovered in the Slurm job scheduler.  We are waiting for the vendor to fix it and will update our systems at that time.  For now, please use the work around below.  For software that requires SSH to launch, unfortunately there is no work around at this time.



Due to updates performed during the August 4, 2020 downtime, several things are not working on the clusters.  Users with jobs running on a node will not be able to SSH into that node.  Users attempting to use the fisbatch script will get an error.  These errors may look like:


FISBATCH -- Connecting to head node (cpn-u28-38)

Access denied: user ccruser (uid=12345) has no active jobs on this node.

Authentication failed."


or


Access denied: user ccruser(uid=12345) has no active jobs on this node.

Authentication failed.


Fisbatch users should use the 'salloc' command to request compute node resources.  Details on how to do so can be found here:  

How to Submit an Interactive Job (see the section below fisbatch)

Then once your job starts:
srun --jobid=[your_jobid] --pty /bin/bash



If you have a job running and want to connect to the node, rather than using ssh use 'srun':

srun --jobid=[your_jobid] --pty /bin/bash 



If you're running on an alternate cluster, specify the cluster in your srun command:

srun --clusters=faculty --jobid=[your_jobid] --pty /bin/bash



If you've been allocated multiple nodes and want to specify which node to login to:
srun  --clusters=cluster-name --jobid=[your_jobid] --pty --nodes=1 --nodelist=host /bin/bash



Thank you for your patience while we work through this issue.


1 person likes this