BEAR Service Outage/Disruption for Essential Data Centre Maintenance – April 12th – April 16th 2018

Published: Posted on

Summary [April 12th – April 16th 2018]

 IT Services with Estates engineers will be carrying out essential maintenance to the power supplies to the University’s primary Data Centre during the second weekend in April which will entail a total power down of the centre. Additionally, they will carry out the annual fire systems testing postponed from December (which also requires most of the equipment to be shutdown). Inevitably, this means disruption to BEAR service for researchers.

At the same time, the Research Computing Team will take this opportunity to carry out a set of updates and fixes to the BEAR infrastructure. This will extend the outage by roughly 24 hours as this process will begin on Thursday 12th. While we know any downtime is inconvenient, we can assure you that it is all vital to the long-term health and availability of our services.

Please note particularly that this work is in addition to the migration to the new research data centre set for May (more details to follow).

Details of Services Affected [April 12th – April 16th 2018]

Some of our services are already resilient so the Research Data Store, CaStLeS (replicated) Storage and BEAR DataShare will continue to operate uninterrupted, running on the replicated copy in our secondary data centre throughout this period.

There will be no BlueBEAR HPC or BEARCloud/CaStLeS (VM) services available from 3pm on Thursday 12th April. We expect normal service to be resumed by noon on Monday, 16th April.

While the systems are out of action, we will be:

  • upgrading the code on the Research Data Network – there will be some periods of no connectivity for RDN attached devices (some sequencers/mass specs, electron microscopes etc.) Note this is not the normal campus network and affects a very small number of specialist installations
  • rebooting the switches located in the secondary data centre in order to pick up the changes to the RDN. This will cause a brief period when access to RDS and DataShare will be lost temporarily. This is expected to be no more than 15 mins.
  • making some small changes that are essential preparation for the migration to the new £5.4M research data centre.
  • completing the move of BEAR user home directories to new storage. As a result of this, it’s possible that some batch jobs, submitted before the maintenance window, will fail. All data currently in your home directory will remain available for some time after the changeover, but the path to $HOME will change. You will need to ensure that you migrate any data you wish to keep from your old user home-directory to either the new location or to an RDS project. (We sent notifications about this last year but the implementation has been delayed.) We will send more detailed information in March about the work and what you need to do, in preparation for the changes we will make during the outage.