Multi-service Interruption - ECE Virtualization Platform – Memory Failure (Apr 25th 2014)

From ECE Information Technology Services
Revision as of 13:37, 28 April 2014 by Mberdan (talk | contribs)
Jump to navigationJump to search

This issue was resolved at ~4pm on April 25, 2014.

In the morning of April 25th, ECE’s server virtualization platform experienced a hardware memory failure, destabilizing several dependent virtual servers. Faculty, researchers, students and staff experienced the following issues as a result:

  1. Individuals with MATLAB installations depending on ECE’s departmental licensing server could not use MATLAB
  2. ECE’s E-mail system experienced numerous LDAP lookup query failures, despite in-place LDAP load-balancing and redundancy. The mail system subsequently failed to deliver messages to @ece.ubc.ca mail accounts. Messages remained queued, but not delivered for 3+ hours.
  3. The Electric Power & Energy Systems Lab software license server intermittently failed to issue licenses to requesting hosts

The virtualization platform was restored to working order ~4pm on April 25, 2014. All dependent virtual machines and their hosted services were recovered to working order. The issues above were resolved as a result, with e-mail messages delivered up to 3-hours late.

Thank you for your patience while we addressed this issue. If you have any questions or comments about this outage, don’t hesitate to contact the IT Support Team at help@ece.ubc.ca or in MacLeod 105.