October 2024: Monthly Maintenance Downtime (10/29/24)
D
Dori Sajdak
started a topic
2 months ago
Date of downtime: Tuesday, October 29, 2024
Approximate time of outage: 7am-5pm
Resources affected by downtime:
UB-HPC cluster (all partitions)
Faculty cluster (all partitions)
Portals: OnDemand, ColdFront, IDM
What will be done:
Reboot of all cluster nodes
Updates of front-end login nodes (login1/2, vortex-future) and OnDemand
OnDemand update to version 3.1.9 - this includes a long requested feature of keeping the cluster app alive/active for longer than just a few minutes
Slurm upgrade to version 24.05.4
Compute node operating system updates - this includes an update to NVIDIA drivers
Identity management portal software update
Infrastructure services updated
Specific Effects to CCR users:
We do not anticipate any of these updates will result in system problems. However, with operating system updates on the compute nodes, it's possible these might have an effect on users' workflows. Please report any suspected problems to CCR Help with details so that we may attempt to replicate them.
Dori Sajdak
Date of downtime: Tuesday, October 29, 2024
Approximate time of outage: 7am-5pm
Resources affected by downtime:
UB-HPC cluster (all partitions)
Faculty cluster (all partitions)
Portals: OnDemand, ColdFront, IDM
What will be done:
Specific Effects to CCR users:
We do not anticipate any of these updates will result in system problems. However, with operating system updates on the compute nodes, it's possible these might have an effect on users' workflows. Please report any suspected problems to CCR Help with details so that we may attempt to replicate them.
1 person likes this