October 2024: Monthly Maintenance Downtime (10/29/24)

Date of downtime: Tuesday, October 29, 2024


Approximate time of outage: 7am-5pm


Resources affected by downtime:

UB-HPC cluster (all partitions) 

Faculty cluster (all partitions)

Portals: OnDemand, ColdFront, IDM


What will be done:  

  • Reboot of all cluster nodes
  • Updates of front-end login nodes (login1/2, vortex-future) and OnDemand
  • OnDemand update to version 3.1.9 - this includes a long requested feature of keeping the cluster app alive/active for longer than just a few minutes 
  • Slurm upgrade to version 24.05.4
  • Compute node operating system updates - this includes an update to NVIDIA drivers
  • Identity management portal software update
  • Infrastructure services updated


Specific Effects to CCR users:

We do not anticipate any of these updates will result in system problems.  However, with operating system updates on the compute nodes, it's possible these might have an effect on users' workflows.  Please report any suspected problems to CCR Help with details so that we may attempt to replicate them.



1 person likes this