Wednesday 20th January 2021

Infrastructure Investigating hypervisors issues

We are experiencing issues with hypervisors. We are investigating.

EDIT 15:45 UTC: Two hypervisors went down. The impacted services are:

  • Add-ons -> add-ons hosted on those servers are currently unavailable

  • Applications -> applications that were hosted on those servers should be redeployed or in the redeploy queue

  • Logs -> new logs won't be processed. This includes drains. You might only get old logs when using the CLI / Console

  • Shared RabbitMQ -> A node of the cluster is down, performance might be degraded

  • SSH -> No new SSH connection can be made on the applications as of now.

  • FS Bucket: a FS Bucket server was on one of the servers. Those buckets are unreachable and may timeout when writing / reading files

EDIT 15:54 UTC: Servers are currently rebooting.

EDIT 15:59 UTC: Servers rebooted and the services are currently starting. We are closely monitoring the situation.

EDIT 16:07 UTC: Services are still starting and we are double checking impacted databases.

EDIT 16:11 UTC: Deployment might take a few minutes to start due to the high deployment queue.

EDIT 16:33 UTC: Most services should be back online, including applications and add-ons. The deployment queue is still processing.

EDIT 16:45 UTC: The deployment queue is now empty since a few minutes, all deployments should go through almost instantly.

EDIT 17:13 UTC: Deployment queue is back to normal.

EDIT 17:15 UTC: The incident is over.