Email/login failures due to ldap server overload (Apr 25, 2013)

From ECE Information Technology Services
Revision as of 16:40, 26 April 2013 by Derekp (talk | contribs) (automount problems)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Around 3:30pm yesterday (April 24, 2013) the load on our ldap server went up due to the large number of requests. This caused a slow down in ldap responses to our mail and file servers. Today, this slow down caused some login/webmail/email access as well as automounting of home directories on Linux machines to fail.

We have deployed an extra ldap server to remedy the problem. As of 3:10pm the new server is up and it seems to reduce the overall load and resolve the problem. If you are experiencing similar problems again (after 4 pm), please let us know via help@ece.ubc.ca.

Thank you.

Update April 26 2013

Home directories (and therefore IMAP mail access) were intermittently inaccessible from April 25 17:00 until April 26 16:00, due to sysadmin error.

While rushing to deploy an additional LDAP server, a change to a configuration file to add some indexes to the LDAP database was accidentally pushed to the LDAP servers. However, the existing LDAP server was not given the command to rebuild its index. Therefore, client machines searching for automount entries would erroneously get empty results — the LDAP server was instructed by its configuration file to rely on an index that had not actually been built.

Of the pair of machines in the pool of LDAP servers, only the original node was afflicted with this problem. The newly deployed machine was built using the new configuration file, and was therefore consistent. Since our load balancer randomly assigns incoming connections to the two nodes, LDAP queries for automount entries had approximately a 50-50 chance erroneously returning empty results.

We apologize for the inconvenience, and thank the users who reported the problem.