Difference between revisions of "Database Failure (Apr 2012)"

From ECE Information Technology Services
Jump to navigationJump to search
(Announcement)
 
(Updated announcement: MySQL failover completed)
 
Line 1: Line 1:
 
Early in the morning on 2012-04-10, the machine hosting ECE's PostgreSQL and MySQL databases suffered a hardware failure.  As a result, many of our services became unavailable.
 
Early in the morning on 2012-04-10, the machine hosting ECE's PostgreSQL and MySQL databases suffered a hardware failure.  As a result, many of our services became unavailable.
  
We are working to replicate the databases to a spare machine.  No data loss is expected, but the failover procedure will take some time.
+
We have replicate the databases to a spare machine.  We believe that there was no data loss as a result of this incident, but the failover procedure took some time.
  
 
The failover process for the PostgreSQL server was completed at 08:45 PDT.  The services that depend on PostgreSQL (namely Request Tracker, GADS, Meeting Room Booking System, and some MediaWiki-based websites) should be fully operational now.
 
The failover process for the PostgreSQL server was completed at 08:45 PDT.  The services that depend on PostgreSQL (namely Request Tracker, GADS, Meeting Room Booking System, and some MediaWiki-based websites) should be fully operational now.
  
The failover process for MySQL is still in progressThe websites that rely on MySQL (the ECE department's website in particular) remain down.
+
The failover process for MySQL took longer because the last full backup was made a long time ago and we had to replay many incremental log files.  MySQL failover completed at 15:55 PDT.

Latest revision as of 15:58, 10 April 2012

Early in the morning on 2012-04-10, the machine hosting ECE's PostgreSQL and MySQL databases suffered a hardware failure. As a result, many of our services became unavailable.

We have replicate the databases to a spare machine. We believe that there was no data loss as a result of this incident, but the failover procedure took some time.

The failover process for the PostgreSQL server was completed at 08:45 PDT. The services that depend on PostgreSQL (namely Request Tracker, GADS, Meeting Room Booking System, and some MediaWiki-based websites) should be fully operational now.

The failover process for MySQL took longer because the last full backup was made a long time ago and we had to replay many incremental log files. MySQL failover completed at 15:55 PDT.