[Engage] Messages not being sent

Incident Report for Voyado

Postmortem

Summary

On the morning of March 11, 2025, an issue occurred in the Engage platform that resulted in delays for message send-outs for a sub-set of our customers. The incident was triggered by an unexpected event in our in-memory database setup, which temporarily disrupted the platform’s ability to process and send messages. The issue was resolved rapidly and all affected send-puts were successfully delivered, either automatically or through manual resending.

Customer Impact

Approximately 54 customers experienced a temporary halt in their message send-outs for about one hour. Most messages were eventually sent out automatically once the issue resolved itself, but a smaller portion required manual resending by our team. No messages were lost.

Root Cause

The issue was caused by an unexpected failover in our in-memory database, which altered the primary-secondary configuration and triggered faulty callbacks in our system. This misconfiguration prevented messages from being processed as expected which led to the delay.

Remediation & Mitigation

  • Our team identified the issue quickly through our monitoring and began troubleshooting.
  • A hotfix was implemented the same morning to remediate the faulty callbacks that prevented the message execution and to mitigate future occurrence of the unexpected behavior.
  • Messages stuck in the queue were either automatically processed or manually resent by our support team.

Next Steps

We recognize that similar in-memory database-related issues have occurred in the past. Based on recent events and ss part of our continuous improvement and reliability work, we are reviewing our in-memory database setup to improve its resilience and behavior during failovers.

We appreciate your patience and understanding, and we remain committed to providing a stable and reliable platform experience.

Posted Apr 07, 2025 - 08:50 CEST

Resolved

This incident has been resolved.
Posted Mar 11, 2025 - 12:57 CET

Identified

All the messages that got stuck have been resent. We are rolling out a fix to mediate the cause.
Posted Mar 11, 2025 - 10:46 CET

Update

We have identified that the issue is only affecting a subset of customers.
We are continuing to investigate the issue and are preparing to deploy a fix.
Posted Mar 11, 2025 - 09:57 CET

Investigating

We are currently experiencing issues with sendouts and they are not being sent. We are investigating this and working on a solution.
Posted Mar 11, 2025 - 09:24 CET
This incident affected: Engage (Messaging).