On August 1st, approximately between 14:25–17:30 CEST, Voyado Engage experienced issues where email send-outs triggered by Automation workflows were delayed. The issue was caused by a combination of several factors leading to failure in processing for the internal messages handler, halting workflow execution leading to a delay in email send-outs.
Tenants with Automation workflows triggering email send-outs during the incident window were affected. While no messages were lost, all send-outs were delayed until the issue was resolved and normal operations resumed.
Our investigations leads us to the conclusion that the issue was caused by a combination of several factors where services in charge of processing internal messaging becomes overloaded and unable to process their commands. This caused a full stop in the execution of Automations. A record-high number of messages were in queue, putting an immediate stress to the system and exhausting system capacity. These services had not been fully restarted for a period longer than usual, which may have contributed to degraded performance.
The issue was resolved by incrementally restarting the affected services. As they were brought back online, message processing resumed and queues cleared automatically. No manual reprocessing was required and all delayed send-outs were successfully delivered.
We have reviewed and improved monitoring of queues for internal messaging, added monitoring to understand more on infrastructure health during similar events in the future and are continuously working on improving performance to ensure stability and quality.
We apologize for the inconvenience this may have caused and appreciate your patience as we worked to restore normal operations.