Engage: performance issues affecting tracking pipeline (abandoned cart/products of interests) and reports

Incident Report for Voyado

Postmortem

Summary

Between June 4 at 14:39 and June 5 at 14.10, one of our backend services experienced performance issues leading to delays and partial unavailability. The issue has been resolved, and systems are now operating normally.

Customer Impact

During the incident, features such as ROD, Abandoned Cart, Register Status, Automation Overview and Product of interest were affected. Customers may have experienced slow response times or temporary unavailability when using these functionalities. No data loss occurred, and all services were restored by June 5 at 14.10.

Root Cause and Mitigation

The issue stemmed from an unexpected delay in the cleanup of background processes connected to secure storage systems, following a minor system update. While the connections themselves were successful, the delayed termination caused performance degradation. We mitigated the issue by safely bypassing the affected cleanup process and restored service stability. A support case was also raised with Microsoft Azure to investigate further.

Next Steps

  • Improve automated testing to detect performance issues before deployment.
  • Review and update the logic used for secure storage connections in production environments.
  • Monitor Azure infrastructure updates for changes that may affect our services.
  • Continue to enhance logging and observability for faster diagnostics.
Posted Jun 10, 2025 - 13:43 CEST

Resolved

After deploying the fix and monitoring the system, we can confirm that the issue is resolved.
Posted Jun 05, 2025 - 16:07 CEST

Update

We are continuing to monitor for any further issues.
Posted Jun 05, 2025 - 14:35 CEST

Monitoring

The fix has now been implemented and the system is operating normally. We are continuing monitoring the situation closely.
Posted Jun 05, 2025 - 14:28 CEST

Identified

We have identified a solution and we are now working to deploy this change to production as soon as possible.
Posted Jun 05, 2025 - 13:22 CEST

Update

We are continuing to explore alternative solutions, though the root cause has not yet been fully identified. We are also engaging with our cloud provider to assist in the ongoing investigation. Further updates will be provided as new information becomes available.

Our monitoring shows that Abandoned Browse is functioning as expected, whereas Products of Interest is currently impacted.
Posted Jun 05, 2025 - 10:08 CEST

Update

We are actively working on the issue. We do not yet have a complete understanding of the root cause. Some abandoned cart requests are currently being processed successfully. We appreciate your patience and will continue to provide updates.
Posted Jun 05, 2025 - 09:31 CEST

Update

The root cause is still unknown. We’re continuing our investigation and will share updates as soon as we have more information. Apologies for the inconvenience.
Posted Jun 05, 2025 - 07:16 CEST

Update

We are actively investigating the issue but have not yet identified the root cause.
Posted Jun 04, 2025 - 22:16 CEST

Update

The issue remains under investigation. We will provide updates as they become available.
Posted Jun 04, 2025 - 20:10 CEST

Update

The issue is still under active investigation. We appreciate your patience while we work to resolve it.
Posted Jun 04, 2025 - 17:55 CEST

Investigating

We are currently investigating the issue which seems to be related to scaling and could lead to delays in abandoned cart/browse automations to be triggered. Also affecting loading of dashboard/reports in the UI.
Posted Jun 04, 2025 - 16:19 CEST
This incident affected: Engage (Automations, Other).