Past Incidents

Monday 30th May 2022

Infrastructure PAR: An hypervisor is currently unreachable

We are currently having an unreachable hypervisor on the Paris zone due to a connection loss. We are trying to restart it. Impacted applications are automatically redeployed.

EDIT 22:46 UTC: The hypervisor doesn't reboot, we continue our investigation.

EDIT 00:06 UTC: The hypervisor is back online since a few minutes. All services are now available again. The extended period of downtime has been identified and will be fixed on similar hypervisors to have a faster recovery next time.

Sunday 29th May 2022

Access logs Metrics / Access logs: ingestion issues

Ingestion of new access logs and metrics points is currently having an issue, leading to missing data points in metrics. Access logs ingestion is currently on hold and will be processed later. The issue has been identified and we are working to fix it.

EDIT 21:04 UTC: Ingestion is now back to normal. Access logs will be processed over the next few hours.

Saturday 28th May 2022

No incidents reported

Friday 27th May 2022

Infrastructure [PAR] Server lost

We lost a server which host severval components on PAR zone

UPDATE: all applications have been redeployed

Thursday 26th May 2022

No incidents reported

Wednesday 25th May 2022

Reverse Proxies Some applications are unavailable

Some applications are experiencing issues. We are investigating it.

UPDATE 14:57 UTC: Some Add-ons are being inaccessible due to a faulty proxy. We're removing it from the pool to mitigate.

UPDATE 14:59 UTC: Services are being reloaded to ensure the faulty proxy is removed from the pool.

UPDATE 15:10 UTC: Services are back online for redeployed apps. A faulty sentry induced an abnormal behaviour in the API.

CALL FOR ACTION 15:23 UTC: Remaining applications are currently redeployed. If you're impacted, we advise you to redeploy your app to accelerate the recovery process

Tuesday 24th May 2022

Deployments Issues with deployment not working correctly

We currently have issues with deployments. Deployments may end up with errors asking you to contact our support alongside a stacktrace. We are currently working on a fix.

EDIT 14:59 UTC - We have identified defaulting component which encounters an issue in the connection pooler.

EDIT 15:09 UTC - deployments queue is being consumed and catching up. Issue it mitigated.

EDIT 15:23 UTC - Incident is fixed.

Root cause: we've found an issue in a messaging driver on a couple of isolated servers. Anyway, we've curated out this specific driver to fall back on an alternative messaging layer. In the coming days, we will dive into this specific bug we've found and will communicate the bug fix upstream.