Thursday 27th August 2020

Infrastructure An hypervisor went down unexpectedly

An hypervisor went down (electrically shut off) unexpectedly.

This was caused by a human error, partly related to a laggy UI (low-level UI of a server manager used for a group of servers).

The person who triggered this realized the issue immediately and restarted the server which has stopped responding to our monitoring for a total of 3 minutes.

Chronology:

14:01:30 UTC: The server goes down

14:04:30 UTC: The server responds to our monitoring again and starts restarting static VMs (add-ons and custom services)

14:07:05 UTC: The last static VM starts answering to our monitoring again.

Impact:

Customers with add-ons on this server will find connection errors in their application logs during those 3 to 6 minutes and those applications most likely responded with errors to end users during that time.

Customers with applications with a single instance which happened to be on that server will have experienced about 2 to 3 minutes of downtime before a new instance started responding on another server.