[Engage] Tracking Pipeline Degradation

Incident Report for Voyado

Postmortem

Summary

Between June 23 and June 26, 2025, some customers experienced incorrect triggering of Abandoned Cart (AC) emails, where messages were sent despite their carts being empty. This issue was caused by an error in the logic that determines the most recent cart activity within our automation pipeline. While a fix was deployed quickly, the resolution was delayed due to the time required to reprocess historical data through our streaming systems.

Customer Impact

·  Some customers received Abandoned Cart emails that did not reflect their actual shopping activity.

·  The Product of Interest (POI) automation experienced delays and was temporarily paused during recovery efforts.

·  Email reliability and accuracy were temporarily impacted for a subset of users.

Root Cause and Mitigation

The incident was caused by non-deterministic logic in the abandoned_and_poi_pipeline that selects the latest cart tracking event. This led to outdated or incorrect data being interpreted as current, which in turn triggered abandoned cart emails for empty carts.

A hotfix correcting this logic was deployed to production on June 25. However, because the streaming job had to be restarted from a historical point (June 17), the large volume of data delayed full recovery. The pipeline did not catch up to the most recent checkpoint within the expected time frame, requiring manual intervention with larger compute clusters. The Abandoned Cart flow was restored by the afternoon of June 26, followed by the POI flow shortly thereafter.

Next Steps

To prevent recurrence and improve incident response, we are taking the following actions:

·       Refactoring the event-fetching logic to ensure deterministic and reliable behavior.

·       Decoupling the Abandoned Cart and Product of Interest workflows to enable more targeted incident handling.

·       Improving on-call documentation and training to ensure faster, more informed decisions during incident resolution.

·       Enhancing our data pipeline observability and scaling strategies to better handle reprocessing of large datasets.

Posted Jul 04, 2025 - 11:42 CEST

Resolved

This incident has been resolved.
Posted Jun 26, 2025 - 16:47 CEST

Update

Abandoned cart automations are now working as expected. We are still working on a fix for Products of Interest.
Posted Jun 26, 2025 - 15:01 CEST

Update

The fix currently being deployed is expected to be fully implemented in approximately 2 hours. Thank you for your patience.
Posted Jun 26, 2025 - 13:01 CEST

Update

We’re seeing that this issue is affecting the entire tracking pipeline, including abandoned cart and products of interest. A fix is currently being deployed and we are monitoring the progress. We sincerely apologize for any inconvenience this may cause.
Posted Jun 26, 2025 - 12:45 CEST

Identified

There's a current decrease in the amount of Abandoned cart automations being triggered. A fix is being deployed.
Posted Jun 26, 2025 - 11:35 CEST
This incident affected: Engage (Messaging, Other).