June 2022: Monthly Maintenance Downtime (6/21/22 - NOTE: date change)
D
Dori Sajdak
started a topic
over 2 years ago
Date of downtime: Tuesday, June 21, 2022 NOTE: Date changed from regular last Tues of the month
Approximate time of outage: 7am-5pm
Resources affected by downtime:
UB-HPC cluster (all partitions)
Industry cluster (all partitions)
Faculty cluster (all partitions)
Portals: WebMO, OnDemand, ColdFront (short time offline for updates)
What will be done:
Apply new OS image and reboot of all cluster nodes
Slurm minor update to version 21.08.8-2
OS updates of front-end login nodes (vortex1/2, transfer)
OnDemand OS and software update to version 2.0.26
The Industry and Academic (UB-HPC) clusters will be merged. IMPORTANT POINTS:
All queued jobs on both UB-HPC (academic) and industry clusters will be deleted.
Academic cluster users: You will need to specify your partition and QOS. Previously if you didn't specify these, the cluster default would be used. There will no longer be a default set on the UB-HPC cluster and these are now required for batch scripts and salloc.
Academic users that utilize the Industry cluster scavenger partition: All nodes will be in the UB-HPC cluster scavenger partition for your use
Industry cluster business users: Change your Slurm scripts to use:
#SBATCH --cluster=ub-hpc
#SBATCH --partition=industry
#SBATCH --qos=industry
If you're getting errors like these, you're not specifying the right combination of cluster, account, partition, and qos:
Dori Sajdak
Date of downtime: Tuesday, June 21, 2022 NOTE: Date changed from regular last Tues of the month
Approximate time of outage: 7am-5pm
Resources affected by downtime:
UB-HPC cluster (all partitions)
Industry cluster (all partitions)
Faculty cluster (all partitions)
Portals: WebMO, OnDemand, ColdFront (short time offline for updates)
What will be done: