Inability to access Sedna - customers on the Dublin3 Cluster
Incident Report for Sedna Systems
Postmortem

Post Incident Report

Dublin 3 Cluster Application Incident

Date of Issue 23 April 2024

Incident Reference INC-20240423-1418

01Summary

On the 23rd April 2024 , the Sedna Platform experienced a partial loss of service. A small number of customers were affected by an application outage and were therefore unable to send or receive emails for a period of approximately 15 minutes. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s availability. We have conducted an internal investigation and are taking steps to improve our service.

02Detailed description

At approximately 14:08 UTC on 23 April 2024 Sedna incurred a restart of one of its primary applications (Node). The restart was an automated action triggered by the system as a result of an unhealthy state, and resulted in a period of 15 minutes of downtime while the restart completed. 

The reason for the restart was related to a memory issue with the service, combined with an extraordinarily high workload on the system. The workload caused a backup in requests, which eventually exhausted the system memory and triggered a restart. The restart itself is an automated action that allows the system to respond to such an event and recover quickly, however during the restart process systems can be unavailable.

The first customer case surfacing symptoms of an application outage was raised with Customer Support at 14:13 UTC, at which point the Sedna Support team triggered a major incident with the engineering team to urgently investigate the issue. The Sedna team deployed a Status Page notification at 14:21 UTC notifying all Status Page followers of the incident under investigation. 

The service was fully operational at 14:23 UTC and all customers who reported an incident were informed of the incident closure on the same day.

03Remediation and Prevention

An incident of this nature receives Sedna’s highest level of scrutiny to ensure we can provide our customers with full confidence in the system. Following the incident the team conducted a retrospective to review the remediation taken and to detail next steps to ensure prevention of similar issues occurring in the future. See below the Remediation and Prevention details:

  • Remediation:

    • As of this report the issue itself has been fully resolved
    • Engineering has conducted a full review of related code - logs and telemetry data, to reduce the likelihood of follow on issues.

  • Prevention: 

We have put the following changes in effect to reduce the likelihood of the issue from recurring:

  • Engineering has provisioned additional instances of the impacted application to help handle similar unexpected spikes in the future
  • Engineering has increased the memory size of the Node Application to add additional coverage for similar unexpected spikes in the future

04What you can expect from SEDNA

We understand the critical nature of the services SEDNA provides your business. We will continue to communicate with customers to answer any questions and ensure we do our best to provide a seamless customer experience. We apologize for any issues these events may have caused. 

Please reach out directly to SEDNA Support (support@sedna.com) with any questions.

Posted Apr 29, 2024 - 11:49 BST

Resolved
This incident has been marked as resolved. For more details on this incident, see the linked Postmortem.
Posted Apr 29, 2024 - 11:45 BST
Update
We are continuing to monitor for any further issues.
Posted Apr 24, 2024 - 10:40 BST
Update
We are continuing to monitor for any further issues.
Posted Apr 24, 2024 - 10:40 BST
Monitoring
An issue has been detected in the database layer for the customer tenants affected in Dublin 3. The team has resolved the immediate issue. We are continuing to investigate the root cause. Customers should now have access to Sedna.
Posted Apr 23, 2024 - 15:33 BST
Investigating
We are aware of an issue affecting a number of customers on a specific infrastructure cluster (Dublin 3). Affected end users are unable to log into Sedna. We are investigating this as an emergency.
Posted Apr 23, 2024 - 15:21 BST
This incident affected: SEDNA - www.sednanetwork.com.