SEDNA Users unable to login to SEDNA

Incident Report for Sedna Systems

Postmortem

SEDNA Incident - Post Mortem

05 March 2019

Summary

On March 5th at approximately 1510 PST (2310 UTC) customers on our European cluster were unable to access SEDNA. Our Development Team was alerted that our event API had seen a larger than expected volume of calls which resulted in SEDNA becoming inaccessible. At 1557 PST (2357 UTC) the task queue was back to zero and customers regained full access to SEDNA. We are greatly sorry for the disruption to the customers on our European cluster and have taken immediate action to prevent this specific issue from recurring.

Why this happened.

Our event API was being heavily which caused the database to be overused and eventually stopped responding to requests. This was caused by a combination of higher than normal mail volumes (due to historical import) and traffic from a third party integration. We are now in the process of improving the the performance of the event API and the 3rd party integration which we expect to prevent any recurrence.

Actions and opportunities for improvement

1. Improve the efficiency of our event API which will result in better handling of greater volumes of mail ingestion - IN PROGRESS

2. Improved database monitoring to ensure we are aware of this issue before any customers are impacted - COMPLETE

Posted Mar 06, 2019 - 19:51 GMT

Resolved

SEDNA is now back up and running. A likely cause was an overloaded database. An incident post-mortem may shed more light on other proximate causes. A public report will follow our internal post-mortem.

Posted Mar 06, 2019 - 00:33 GMT

Update

Login is still only working only sporadically for customers on our European cluster. We continue to investigate. At the same time, and in the last 20 minutes, our queue to send and receive mail backed up. Those mail delays have been resolved.

Posted Mar 06, 2019 - 00:10 GMT

Investigating

Logging in to SEDNA is sporadically unavailable and very slow for all users on our European cluster. We think we have found the cause of the incident and are working to resolve it. We will post our next update in 20 minutes (at 12:10 UTC).

Posted Mar 05, 2019 - 23:50 GMT

This incident affected: SEDNA - Private Hosting.