Database Failure (Apr 2012)

From ECE Information Technology Services
Revision as of 14:58, 10 April 2012 by Derekp (talk | contribs) (Updated announcement: MySQL failover completed)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Early in the morning on 2012-04-10, the machine hosting ECE's PostgreSQL and MySQL databases suffered a hardware failure. As a result, many of our services became unavailable.

We have replicate the databases to a spare machine. We believe that there was no data loss as a result of this incident, but the failover procedure took some time.

The failover process for the PostgreSQL server was completed at 08:45 PDT. The services that depend on PostgreSQL (namely Request Tracker, GADS, Meeting Room Booking System, and some MediaWiki-based websites) should be fully operational now.

The failover process for MySQL took longer because the last full backup was made a long time ago and we had to replay many incremental log files. MySQL failover completed at 15:55 PDT.