Some systems are experiencing issues

Past Incidents

Monday 11th July 2022

Cellar [RETROACTIVE][PAR] Cellar unavailability

The service was having troubles handling most of the requests between 11:24 and 11:28 UTC. We will investigate further the issue. The Cellar service is currently operational.

Sunday 10th July 2022

Infrastructure Roubaix: intermittent network failures

Starting 09:31 UTC, we saw intermittent network failures on the Roubaix (RBX) zone hosted on OVH. Failures are both from the external and internal networks. Timeouts reaching your applications or add-ons might have happened.

Some applications are being redeployed for Monitoring/Unreachable because the monitoring couldn't see them anymore.

Things seem to be working fine again since 09:37 UTC. We continue to monitor the situation and will try to get more information from OVH.

EDIT 11:12 UTC: The issue has not occurred again. We will wait for any input from OVH and will add it here if we get any useful information.

Saturday 9th July 2022

No incidents reported

Friday 8th July 2022

Infrastructure Roubaix: an hypervisor has been lost

An hypervisor has been lost on the OVH Roubaix zone. We are investigating. Impacted services are FSBuckets and add-ons.

EDIT 15:32:00 UTC: The server is back online. We are making sure services are correctly restarted. Additional services were impacted: One application reverse proxy and one add-on reverse proxy were unavailable.

EDIT 15:48:00 UTC: We are still investigating the cause of the reboot. We opened a ticket on OVH services to know if they had any un-planned intervention for that machine.

EDIT 16:03:00 UTC: The machine is unreachable again. We are investigating.

EDIT 16:11:00 UTC: The machine is up again. We are starting to suspect a hardware issue.

EDIT 16:30:00 UTC: We will drop all services from the machine to avoid any other issues until we know more about the underlying issue. FSBuckets server will be moved out around 19:00 UTC.

EDIT 19:59:00 UTC: Unfortunately, FSBuckets are going to require more time to move to another server. So far the server is working fine but OVH suspects an issue with the power supply.

EDIT 23:58:00 UTC: The FSBuckets migration is starting. FSBuckets will be set into read-only and applications will be redeployed to use the new server.

EDIT 2022-07-09 00:28:00 UTC: Buckets are fully migrated. The server is now empty and will be investigated further by OVH. This incident is now over.

Thursday 7th July 2022

[PAR] Network maintenance, scheduled 1 year ago

A network maintenance has been scheduled by our network provider for Wednesday 06/07/22 at 22:30 UTC. The maintenance should not have any visible impacts other than a few seconds of network delay while the network links switch to the backup links.

EDIT 22:30 UTC. The maintenance is starting.

EDIT 22:55 UTC: Maintenance is over, no visible impact happened, links failed over in less than 100ms each time.

Wednesday 6th July 2022

No incidents reported

Tuesday 5th July 2022

Access logs Ingestion queue issue

One of the server queue storage reach its disk max storage capacity

One of the partition is corrupted, fixing

EDIT 17:10 UTC: The underlying issue has been fixed. The queue is currently being processed. Some events might have been lost during the cluster rebalance. Data points will take a few more hours to be up-to-date in the various dashboards.

EDIT: Queue is in sync