Between June 23 and June 26, 2025, some customers experienced incorrect triggering of Abandoned Cart (AC) emails, where messages were sent despite their carts being empty. This issue was caused by an error in the logic that determines the most recent cart activity within our automation pipeline. While a fix was deployed quickly, the resolution was delayed due to the time required to reprocess historical data through our streaming systems.
· Some customers received Abandoned Cart emails that did not reflect their actual shopping activity.
· The Product of Interest (POI) automation experienced delays and was temporarily paused during recovery efforts.
· Email reliability and accuracy were temporarily impacted for a subset of users.
The incident was caused by non-deterministic logic in the abandoned_and_poi_pipeline
that selects the latest cart tracking event. This led to outdated or incorrect data being interpreted as current, which in turn triggered abandoned cart emails for empty carts.
A hotfix correcting this logic was deployed to production on June 25. However, because the streaming job had to be restarted from a historical point (June 17), the large volume of data delayed full recovery. The pipeline did not catch up to the most recent checkpoint within the expected time frame, requiring manual intervention with larger compute clusters. The Abandoned Cart flow was restored by the afternoon of June 26, followed by the POI flow shortly thereafter.
To prevent recurrence and improve incident response, we are taking the following actions:
· Refactoring the event-fetching logic to ensure deterministic and reliable behavior.
· Decoupling the Abandoned Cart and Product of Interest workflows to enable more targeted incident handling.
· Improving on-call documentation and training to ensure faster, more informed decisions during incident resolution.
· Enhancing our data pipeline observability and scaling strategies to better handle reprocessing of large datasets.