Some systems are experiencing issues

Past Incidents

Friday 12th June 2020

Cellar Cellar Buckets are slow

We are investigating issues on Cellar Addons, we are experiencing network issues.

EDIT 15:20 UTC: fixed.

Thursday 11th June 2020

Reverse Proxies [Montreal] Reverse proxy configuration fails to automatically update

On our MTL zone, reverse proxies are currently not able to update themselves once a configuration change happens (application deployment, added domain, ...). We are looking into it.

19:00 UTC: If your application deploys, your application will not be up-to-date. It will continue to show the old content, the old instance will be kept until this incident is over.

19:12 UTC: The issue has been identified, we are fixing it.

19:20 UTC: The issue was caused by the configuration checker that took way more time than usual before applying each configuration changes. A configuration option to disable those checks inside the program handling the configuration has been enabled. The configuration remains checked by the reverse proxy itself but it is way faster.

Deployments should now be up-to-date.

Wednesday 10th June 2020

Infrastructure Paris zone: Network outage

It seems a global network outage happened for 1 minute, leading to possible loss of connection to most of our services. It seems to be back for now but we are investigating and we will provide further information.

15:22 UTC: We continue to investigate what's been impacted. Currently deployment are disabled to recover from the event.

15:27 UTC: Deployments are now available.

15:56 UTC: The situation on the platform is stabilized. It seems the outage was between both of our datacenters in the Paris zone. We are asking for more details to our hosting provider.

16:05 UTC: Our network provider came back to us. The network outage lasted for 1 minute and 20 seconds. One of the links was lost between those two datacenters. The backup link should have been up 2 seconds after the loss of the first link. But for some reason it did not switch (or not correctly). After a 1 minute timeout, all links were closed and reset leading to a new link election which takes ~20 seconds. From there, the connection has been restored. Our network provider will continue to investigate why the initial backup link did not switch.

Once the network started working again, our monitoring was able to check what was currently "down". The services that were down were restarted but nothing should have impacted reaching your application (it was mostly internal services). Add-ons connections should have been back at the same time from applications but if your application crashed because it couldn't reach the add-on, then it should have been automatically redeployed once the deployment system was up again which should have been a bit before 15:27 UTC.

We are sorry for the inconvenience this outage created. The time of this incident has been changed from 15:06 UTC to 15:04 UTC to correctly match the date and hours.

Tuesday 9th June 2020

No incidents reported

Monday 8th June 2020

Access logs Metrics unavailable

Metrics are currently unavailable.

An index node has been restarted to upscale it. Its replica did not like the surge of requests and decided to crash a few seconds later. We are currently in the process of upscaling all index nodes to avoid such issues, those 2 nodes were the last remaining on the list.

Index nodes have to scan the whole dataset on start, this will take close to an hour to resolve.

08:07 UTC: Incident is over.

Sunday 7th June 2020

No incidents reported

Saturday 6th June 2020

No incidents reported