RESOLVED: 8/5/20: Fisbatch and SSH to nodes not working
D
Dori Sajdak
started a topic
over 4 years ago
UPDATE - 8/7/20: A patch has been applied to our systems so that these issues are now resolved. Please contact CCR Help if you continue to have problems.
8/6/20: This is a bug discovered in the Slurm job scheduler. We are waiting for the vendor to fix it and will update our systems at that time. For now, please use the work around below. For software that requires SSH to launch, unfortunately there is no work around at this time.
Due to updates performed during the August 4, 2020 downtime, several things are not working on the clusters. Users with jobs running on a node will not be able to SSH into that node. Users attempting to use the fisbatch script will get an error. These errors may look like:
FISBATCH -- Connecting to head node (cpn-u28-38)
Access denied: user ccruser (uid=12345) has no active jobs on this node.
Authentication failed."
or
Access denied: user ccruser(uid=12345) has no active jobs on this node.
Authentication failed.
Fisbatch users should use the 'salloc' command to request compute node resources. Details on how to do so can be found here:
Dori Sajdak
UPDATE - 8/7/20: A patch has been applied to our systems so that these issues are now resolved. Please contact CCR Help if you continue to have problems.
8/6/20: This is a bug discovered in the Slurm job scheduler. We are waiting for the vendor to fix it and will update our systems at that time. For now, please use the work around below. For software that requires SSH to launch, unfortunately there is no work around at this time.
Due to updates performed during the August 4, 2020 downtime, several things are not working on the clusters. Users with jobs running on a node will not be able to SSH into that node. Users attempting to use the fisbatch script will get an error. These errors may look like:
FISBATCH -- Connecting to head node (cpn-u28-38)
Access denied: user ccruser (uid=12345) has no active jobs on this node.
Authentication failed."
or
Access denied: user ccruser(uid=12345) has no active jobs on this node.
Authentication failed.
Fisbatch users should use the 'salloc' command to request compute node resources. Details on how to do so can be found here:
If you have a job running and want to connect to the node, rather than using ssh use 'srun':
srun --jobid=[your_jobid] --pty /bin/bash
If you're running on an alternate cluster, specify the cluster in your srun command:
srun --clusters=faculty --jobid=[your_jobid] --pty /bin/bash
Thank you for your patience while we work through this issue.
1 person likes this