Difference between revisions of "Kaiser Server Room Network Failure (Mar 2013)"

From ECE Information Technology Services
Jump to navigationJump to search
(Repeat incident 2013-03-22)
(Service restored 2013-02-22)
 
(One intermediate revision by the same user not shown)
Line 11: Line 11:
 
* Several research groups' servers hosted in the Kaiser server room
 
* Several research groups' servers hosted in the Kaiser server room
  
**UPDATE 22 March 2013**
+
=== Update 22 March 2013 ===
 
+
The same network switch failed again 22 March 19:33.  We have confirmed that it is due to failing hardware, and have replaced it with another unit.  Service was restored just before midnight.
The same network switch failed again 22 March 19:33.  We are investigating the possibility of replacing the switch.
 

Latest revision as of 01:48, 23 March 2013

At 2 am on 21 March 2013, a network switch in the Kaiser server room spontaneously failed. Service was restored at 4:15 am by power-cycling the switch.

As a result, the following services were unavailable during the outage:

  • Authentication to the UBC_ECE domain
  • The ability to change account passwords
  • ssh-linux5, ssh-linux6, ssh-linux7
  • Electronic Software Distribution
  • Graduate Application Data Store
  • Several software license servers
  • CMC CAD tools
  • Several research groups' servers hosted in the Kaiser server room

Update 22 March 2013

The same network switch failed again 22 March 19:33. We have confirmed that it is due to failing hardware, and have replaced it with another unit. Service was restored just before midnight.