Downtime Schedule for 2016-17 Academic Year

In an effort to keep the center updated and secure, we are implementing a new downtime schedule for the 2016-2017 academic year.  While we will end up having more centerwide downtimes, this is to ensure that we don't attempt to update too many things at once and cause cascading problems and extended outages.  We have tried to space out the centerwide downtimes that will affect the majority of our equipment.  However, this schedule will allow us to update SLURM (job scheduler) and other products on a more regular basis.  We’ve tried to keep the downtimes that affect the most services to only one major product/hardware update each.  Operating system updates will happen if we have time during those downtimes.  We’ve also moved some of the downtime dates to accommodate the academic calendar.  For example, we will able to get two major updates in with 4 weeks separating them in the period after classes end in December and before they start in January.  Though this will mean the faculty clusters will be affected by several downtimes in a row, we believe we're conducting them during periods with lower usage. We hope knowing this schedule in advance will prove to be helpful to you for planning your research computing usage.  If this schedule severely impacts your work, please submit a ticket to CCR help to discuss this.


Please note: This schedule can and likely will change to respond to vendor software updates or unplanned problems.  We will update this page as soon as any decisions are made that affect the schedule.


Types of downtimes:

Center-wide – all services at CCR affected

Cluster-wide – all clusters affected but no other services

Cluster-specific – some, but not all, clusters affected but no other services

Cloud – all cloud services affected but no other CCR services


Services to be Updated:

FreeIPA – LDAP & authentication servers

Foreman – Installation servers - clusters affected

DNS – 1-2 times per year; done during center-wide downtimes only

Isilon Storage – 1-2 times per year; center-wide

Arista Networking – once per year; center-wide

GPFS Storage – once per year; center-wide

SLURM – all clusters affected; aim to update 4 times/year; dependent on software release schedule

Cloud - affects the CCR research cloud only; no other services affected


Downtime Dates & Tentative Update Plan

8/9-10 (completed) – Center-wide: SLURM updates; Isilon update; UB-HPC, Industry & PI cluster OS updates & reboots

8/29 – classes start

9/6 – Cluster-specific: UB-HPC & Industry reboots; FreeIPA server updates

10/4 – Cluster-specific: UB-HPC & Industry reboots; Foreman updates; mount Isilon on rush, presto & viz nodes using NFSv4 (work around for locking problems discovered after Isilon updates in August)

10/25 (emergency patch required) moved from 11/1 – Cluster-wide: GPFS updates; UB-HPC, Industry & PI clusters - OS updates & reboots

11/7 - Lake Effect Cloud downtime – all cloud services affected but no other CCR services

12/20 (moved from 12/6) – Cluster-wide: GPFS updates, SLURM update to latest 16.05.7; UB-HPC, Industry & PI cluster reboots

12/20-1/30 – winter recess

1/17 (moved from 1/3) – Center-wide: Isilon storage updates, SLURM update,  FreeIPA server updates, DNS updates, operating system updates on core infrastructure servers, nodes, and front end servers

1/30 – classes start

2/7 – Cluster-specific: UB-HPC & Industry reboots

3/21 (moved from 3/7) – Cluster-wide: Core infrastructure OS & software updates, UB-HPC, Industry & PI cluster OS updates & reboots

3/20-25 – spring break

4/4 – Cluster-specific: UB-HPC & Industry reboots; FreeIPA server updates CANCELLED

5/2 – Cluster-specific: UB-HPC & Industry reboots; FreeIPA server updates (tentative); Isilon server updates

5/12 – last day of classes
5/20 – last day of exams

6/6 – Cluster-wide: SLURM update to 17.05.x; UB-HPC, Industry & PI cluster OS updates & reboots

See downtime schedule for 2017-18 for more dates