Some systems are experiencing issues

Past Incidents

Wednesday 13th July 2022

Access logs Metrics maintenance

In our efforts to stabilize the Metrics infrastructure, we will perform a maintenance on 13 of July. Once it is started, some lag can be expected for a few hours.

Maintenance will start at 07:30 am UTC

EDIT 07:30 am UTC: Starting maintenance

EDIT 08:16 am UTC: Maintenance is over, we are catching up with the lag

EDIT 08:30 am UTC: Queries are currently disabled to speed up recovery

EDIT 09:17 am UTC: our maintenance triggered a major compaction on our storage layer. To speed up recovery, query are still disabled

EDIT 16:20 pm UTC: major compaction is over. We are struggling to handle both read and write operations at the same time. We are working on it.

EDIT 20:23 pm UTC: queries are still disabled. We are testing new configurations to resolve the issue

EDIT 14 of July 9:22 am UTC: it's a brand new day, we are still working on it.

EDIT 14 of July 18:26 pm UTC: We are struggling to handle both read and write operations at the same time. We are working on it. Happy french national day.

EDIT 16 of July 17:35 pm UTC: We found a performance issue triggered when the dotmap on the Console is accessed. We disabled some macros used to retrieve data to allow other users to access metrics. Metrics and access logs are now accessible.

Tuesday 12th July 2022

No incidents reported

Monday 11th July 2022

Cellar [RETROACTIVE][PAR] Cellar unavailability

The service was having troubles handling most of the requests between 11:24 and 11:28 UTC. We will investigate further the issue. The Cellar service is currently operational.

Sunday 10th July 2022

Infrastructure Roubaix: intermittent network failures

Starting 09:31 UTC, we saw intermittent network failures on the Roubaix (RBX) zone hosted on OVH. Failures are both from the external and internal networks. Timeouts reaching your applications or add-ons might have happened.

Some applications are being redeployed for Monitoring/Unreachable because the monitoring couldn't see them anymore.

Things seem to be working fine again since 09:37 UTC. We continue to monitor the situation and will try to get more information from OVH.

EDIT 11:12 UTC: The issue has not occurred again. We will wait for any input from OVH and will add it here if we get any useful information.

Saturday 9th July 2022

No incidents reported

Friday 8th July 2022

Infrastructure Roubaix: an hypervisor has been lost

An hypervisor has been lost on the OVH Roubaix zone. We are investigating. Impacted services are FSBuckets and add-ons.

EDIT 15:32:00 UTC: The server is back online. We are making sure services are correctly restarted. Additional services were impacted: One application reverse proxy and one add-on reverse proxy were unavailable.

EDIT 15:48:00 UTC: We are still investigating the cause of the reboot. We opened a ticket on OVH services to know if they had any un-planned intervention for that machine.

EDIT 16:03:00 UTC: The machine is unreachable again. We are investigating.

EDIT 16:11:00 UTC: The machine is up again. We are starting to suspect a hardware issue.

EDIT 16:30:00 UTC: We will drop all services from the machine to avoid any other issues until we know more about the underlying issue. FSBuckets server will be moved out around 19:00 UTC.

EDIT 19:59:00 UTC: Unfortunately, FSBuckets are going to require more time to move to another server. So far the server is working fine but OVH suspects an issue with the power supply.

EDIT 23:58:00 UTC: The FSBuckets migration is starting. FSBuckets will be set into read-only and applications will be redeployed to use the new server.

EDIT 2022-07-09 00:28:00 UTC: Buckets are fully migrated. The server is now empty and will be investigated further by OVH. This incident is now over.

Thursday 7th July 2022

No incidents reported