All systems are operational

Past Incidents

Sunday 30th August 2020

Cellar Cellar C1: Cluster unreachable

Our old Cellar cluster (cellar.services.clever-cloud.com) which still has some data nodes on Scaleway is currently unreachable due to networking issues on Scaleway's side: https://status.scaleway.com/incident/956

We are monitoring the situation. Our new Cellar cluster (cellar-c2.services.clever-cloud.com) is still reachable and works fine.

EDIT 12:02 UTC: A reverse proxy node is somehow still able to communicate with the nodes on Scaleway. All cellar-c1 traffic has been routed through that reverse proxy and requests should be served as expected.

EDIT 12:34 UTC: The network issue seems to not be on Scaleway's side per say but more on Level3/CenturyLink side which is a more global networking provider.

EDIT 15:17 UTC: The incident on Level3/CenturyLink seems to be resolved. The cluster is now fully reachable.

PostgreSQL shared cluster postgresql-c1 cluster is unreachable

Postgresql-c1 which is an old PostgreSQL cluster still hosted on Scaleway may currently be unreachable due to some Level3/CenturyLink networking issues. Scaleway has an incident opened here: https://status.scaleway.com/incident/956

EDIT 15:17 UTC: The incident on Level3/CenturyLink seems to be resolved. The cluster is now fully reachable.

Infrastructure Peering issues with external services or to reach our services

Due to an outage of the Level3/CenturyLink networking provider, you might experience issues:

  • reaching our services: if your FAI uses this provider, you might experience timeouts reaching our infrastructure

  • reaching external services from our infrastructure: if you contact external services from our infrastructure, the peering routes might use this network provider and your requests might timeout too.

This incident will group the previous opened incidents:

  • https://www.clevercloudstatus.com/incident/294

  • https://www.clevercloudstatus.com/incident/295

We do not have an ETA for the service to come back to normal.

EDIT 15:17 UTC: The incident on Level3/CenturyLink seems to be resolved. All connections either incoming or outgoing to/from our services should be working as expected. Please reach to our support if not.

Saturday 29th August 2020

No incidents reported

Friday 28th August 2020

No incidents reported

Thursday 27th August 2020

Infrastructure An hypervisor went down unexpectedly

An hypervisor went down (electrically shut off) unexpectedly.

This was caused by a human error, partly related to a laggy UI (low-level UI of a server manager used for a group of servers).

The person who triggered this realized the issue immediately and restarted the server which has stopped responding to our monitoring for a total of 3 minutes.

Chronology:

14:01:30 UTC: The server goes down

14:04:30 UTC: The server responds to our monitoring again and starts restarting static VMs (add-ons and custom services)

14:07:05 UTC: The last static VM starts answering to our monitoring again.

Impact:

Customers with add-ons on this server will find connection errors in their application logs during those 3 to 6 minutes and those applications most likely responded with errors to end users during that time.

Customers with applications with a single instance which happened to be on that server will have experienced about 2 to 3 minutes of downtime before a new instance started responding on another server.

Wednesday 26th August 2020

No incidents reported

Tuesday 25th August 2020

No incidents reported

Monday 24th August 2020

No incidents reported