Outbound and Inbound Message Service Disruption

Incident Report for Sedna Systems

Postmortem

SEDNA Incident - 15 May 2018

Summary

On 15 May 2018, 15:15 PDT, SEDNA experienced an outage on our North American cluster that prevented inbound and outbound messages. No emails were lost. All messages sent to your mailboxes during the outage have been queued and will be delivered. You will see all new messages as they arrive.

We understand the seriousness of our responsibility to your communications. We are sorry for the disruption to your business caused by today’s outage. We review these types of incidents in depth, reflect on our systems, and adapt them in order to constantly improve our reliability.

Why this happened

Application code deployment involved a database change. It had previously been run successfully on a test environment, but that environment does not have nearly as much data as production to generate this kind of failure the DB change did not successfully complete in a timely manner, which caused a backlog of database changes that were waiting their turn tasks such as sending and receiving email were stuck waiting for the database changes to complete

Actions and opportunities for improvement

Changes involving the database are particularly difficult to recover from and require exceptional care. The biggest opportunity for improvement in this particular case is to perform a trial-run of database changes on a database as similar to production as possible. This has now been established as policy. We also have action items related to speedier monitoring and alerting the technical team to assist in our response procedures.

Posted May 16, 2018 - 23:38 BST

Resolved

SEDNA experienced an outage this afternoon at 3:15pm PDT. Full service on SEDNA was restored as of 3:53 PM PDT.

No emails have been lost. All messages sent to your mailboxes during the outage have been queued and will be delivered. You will see all new messages as they arrive.

We understand the seriousness of our responsibility to your communications. We apologize for the disruption to your business caused by today’s outage. We review these types of incidents in depth, reflect on our systems, and adapt them in order to constantly improve our reliability.
Posted May 16, 2018 - 23:15 BST
This incident affected: SEDNA - www.sednanetwork.com.