RESOLVED: 4/30/22: On-prem cloud issues affecting OnDemand & other services
D
Dori Sajdak
started a topic
over 2 years ago
This issue is being marked as resolved because all Openstack services are back online. If you have any instances with problems, please submit a ticket to ccr-help and include the instance ID(s) and the errors you're getting.
UPDATE 5/1 10AM- The Openstack services have been restarted and the cloud is in a better state. Please report any other issues to CCR Help
UPDATE 5/1 9:30AM- Additional issues were seen overnight. We're investigating now.
UPDATE: 10PM- A networking issue has been resolved that caused cloud services to be unavailable for a few hours. This issue has been resolved but it may take a few hours for the Openstack recovery processes to complete.
UPDATE: 6:15pm - This appears to be a network switch issue. A CCR system administrator is heading on site now to investigate.
We are aware of an issue with our on-premise cloud that supports services such as OnDemand, ColdFront, Identity Management (accounts), Metrics, and other related services. This also affects he Industry cluster job scheduler so started jobs will continue to run but users will not be able to start new jobs or use Slurm tools to get status, etc. We are troubleshooting and hope to resolve this problem quickly. We apologize for the inconvenience.
Dori Sajdak
This issue is being marked as resolved because all Openstack services are back online. If you have any instances with problems, please submit a ticket to ccr-help and include the instance ID(s) and the errors you're getting.
UPDATE 5/1 10AM - The Openstack services have been restarted and the cloud is in a better state. Please report any other issues to CCR Help
UPDATE 5/1 9:30AM - Additional issues were seen overnight. We're investigating now.
UPDATE: 10PM - A networking issue has been resolved that caused cloud services to be unavailable for a few hours. This issue has been resolved but it may take a few hours for the Openstack recovery processes to complete.
UPDATE: 6:15pm - This appears to be a network switch issue. A CCR system administrator is heading on site now to investigate.
We are aware of an issue with our on-premise cloud that supports services such as OnDemand, ColdFront, Identity Management (accounts), Metrics, and other related services. This also affects he Industry cluster job scheduler so started jobs will continue to run but users will not be able to start new jobs or use Slurm tools to get status, etc. We are troubleshooting and hope to resolve this problem quickly. We apologize for the inconvenience.