[Elevate] Email recommendation degraded service

Incident Report for Voyado

Postmortem

Description and Impact

A recent update to the Email Recommendations service introduced a change intended to simplify configuration and improve caching. However, this inadvertently caused image files to be stored locally on individual servers rather than in shared storage. As a result, image requests frequently failed, triggering a surge in background jobs attempting to recreate missing images.

These jobs launched in an uncontrolled manner, consuming excessive CPU resources. Even with full auto-scaling in effect, all available server capacity was quickly saturated, which led to degraded performance and service outages. Most requests during this period failed with error responses, and any successful responses were noticeably delayed.

We understand the inconvenience this caused and acted swiftly to resolve the situation.

Affected Area

Email Recommendations

Timeline

  • 2025-05-27 12:00 UTC – A new version of Email Recommendations, that included the bug, was deployed 
  • 2025-05-27 18:50 UTC – Service degradation began
  • 2025-05-27 19:00 UTC – Issue detected and investigation started
  • 2025-05-27 21:00 UTC – Service fully restored

Actions Going Forward

  • Configuration has been corrected to ensure proper handling of image storage
  • New alerts have been added to detect high CPU usage in fully scaled environments at an earlier stage
  • Additional automated testing will be introduced to better catch similar issues before deployment
Posted May 28, 2025 - 10:57 CEST

Resolved

Service is back to normal.
Posted May 27, 2025 - 23:05 CEST

Investigating

We are current seeing degraded service in the Email Recommendation service.
Posted May 27, 2025 - 21:00 CEST
This incident affected: Elevate (Email Recommendations).