June 2022: Monthly Maintenance Downtime (6/21/22 - NOTE: date change)

Date of downtime: Tuesday, June 21, 2022  NOTE: Date changed from regular last Tues of the month

Approximate time of outage: 7am-5pm

Resources affected by downtime:

UB-HPC cluster (all partitions) 

Industry cluster (all partitions)

Faculty cluster (all partitions)

Portals: WebMO, OnDemand, ColdFront (short time offline for updates)

What will be done:  

  • Apply new OS image and reboot of all cluster nodes
  • Slurm minor update to version 21.08.8-2
  • OS updates of front-end login nodes (vortex1/2, transfer)
  • OnDemand OS and software update to version 2.0.26
  • The Industry and Academic (UB-HPC) clusters will be merged.  IMPORTANT POINTS:
    • All queued jobs on both UB-HPC (academic) and industry clusters will be deleted.  
    • Academic cluster users:  You will need to specify your partition and QOS.  Previously if you didn't specify these, the cluster default would be used.  There will no longer be a default set on the UB-HPC cluster and these are now required for batch scripts and salloc.
    • Academic users that utilize the Industry cluster scavenger partition:  All nodes will be in the UB-HPC cluster scavenger partition for your use
    • Industry cluster business users:  Change your Slurm scripts to use:
      • #SBATCH --cluster=ub-hpc
      • #SBATCH --partition=industry
      • #SBATCH --qos=industry
    • If you're getting errors like these, you're not specifying the right combination of cluster, account, partition, and qos:
      • salloc: error: Job submit/allocate failed: Invalid qos specification
      • salloc: error: Job submit/allocate failed: Invalid account or account/partition combination specified
      • sbatch: error: Batch job submission failed: Invalid partition or qos specification
      • sinfo: error: 'industry' can't be reached now, or it is an invalid entry for --cluster.  Use 'sacctmgr list clusters' to see available clusters.
    • Use the 'slimits' command to see what you have access to
    • If you previously specified to use the industry cluster as the default cluster in your .bashrc file, you will need to remove this
  • ColdFront: update to latest version, time permitting.  If not, this update will be done Wed, June 22