Resolved -
We are still seeing normal operations and will thanks to this declare this incident resolved.
We regained normal operations approximately 12:10 CEST.
We will keep working with the incident to make sure we minimize the risk of this happening again.
A post mortem for this incident will be appended as soon as we have mapped not only the whats, but also the whys.
Apr 16, 13:12 CEST
Monitoring -
The initial signs of improvement are still valid, at the moment we can see normal response times throughout the solution.
We will keep working but change the status of the incident to monitoring.
Any degradations will make us revert this status.
Apr 16, 12:23 CEST
Update -
The before mentioned fix has been rolled out and we're seeing some initial signs of improvement.
We will keep working on achieving a full resolution.
Apr 16, 12:12 CEST
Update -
We are continuing to work hard on the mitigation of this disturbance. A new attempted fix is about to be rolled out. Expected ETA 30 min.
We can see that all parts of the platform are affected by the disturbance, even though we are not completely down. Some requests are getting through, some are slow and some are not getting through the door so to speak. We are however approaching this as if we were completely unavailable.
The slowness and unavailability affects all endpoints in our core APIs as well as the graphical user interface of Engage.
Apr 16, 11:36 CEST
Investigating -
The applied fix was not enough to get us out of the woods unfortunately.
We are still seeing the same symptoms come back after the fix had been applied.
Work is still ongoing with top priority.
Any information made available will be communicated as soon as possible.
Apr 16, 11:08 CEST
Update -
A first fix is currently being applied to try to mitigate some of the symptoms we can see through our monitoring.
As the application and it's resources are updated we hope to see a change for the better.
We are however looking into further actions.
Apr 16, 10:41 CEST
Identified -
We have identified a possible cause for the disturbance and are implementing a fix.
Apr 16, 10:24 CEST
Update -
We are continuing to investigate this issue.
Apr 16, 10:17 CEST
Update -
Unfortunately we are still seeing major increases in response times from the application.
This leads to traffic build up causing some requests to be faced with 503-responses.
Mitigating the incident is our top priority and we have all hands on deck.
No ETA at the moment. Information will be provided as soon as it's available.
Apr 16, 10:16 CEST
Investigating -
We are experiencing some longer response times since around 09:18am.
Apr 16, 09:31 CEST