July/August 2023: CENTERWIDE Downtime (8/8 - 8/9/23)

8/9/23 3:45pm STATUS:  Downtime is complete and systems are back online.  There are 2 minor remaining issues: the compile nodes are offline and will be brought back in the next 24-48 hours.  We apologize for the inconvenience.  The Slurm job emails that you're used to getting with detailed job information in them will only contain basic information until this service is updated.


8/9/23 8am STATUS:  Services such as the identity management server, ColdFront, LakeEffect Horizon dashboard, and Globus are all back online.  


8/8/23 3:30pm STATUS: Tasks planned for day 1 of the downtime have been completed and we're on track for the rest of the maintenance schedule.  If we complete our tasks early, we will bring the systems back online for users.



THIS IS A CENTERWIDE DOWNTIME


Date of downtime: August 8-9, 2023


Approximate time of outage: 7am-5pm


Resources affected by downtime:

 ALL CCR RESOURCES!!! 

UB-HPC cluster (all partitions) 

Faculty cluster (all partitions)

Portals: WebMO, OnDemand, ColdFront, Identity Management Portal, Lake Effect dashboard

Anything using Vast storage (/user, /projects, /util, /vscratch mounts)


Virtual machines running in the Lake Effect research cloud will not be affected unless your virtual machines mount the Vast storage.  However, the Openstack Horizon dashboard may be unavailable


What will be done:  

  • Reboot of all cluster nodes
  • Updates of front-end login nodes (vortex1/2) and OnDemand
  • Infrastructure services updated
  • Migrate Vast storage to new core switch
  • Migrate slurm controllers to new networking


Jobs will be held in queue during the maintenance downtime and will run after the updates are complete.  


If you have  any questions or concerns please e-mail ccr-help_at_buffalo.edu