Some systems are experiencing issues

Past Incidents

Tuesday 7th June 2022

Reverse Proxies [RETROACTIVE][PAR] An add-on reverse proxy was unreachable

An add-on reverse proxy was unreachable between 14:45 and 14:48 UTC. It has been restarted and is now serving requests as expected. Applications may have failed reaching their add-on during this time.

Pulsar Service instabilities

Our monitoring shows abnormal CPU usage on some Pulsar brokers, we are investigating.

EDIT: we stop some components which were increasing load of the cluster. it should be more stable now

[PAR] Unique IP service planned unavailability, scheduled 1 year ago

The unique IP service will undergo a maintenance period for 30 minutes on June 7th starting at 20:00 UTC. During this time period, the service will be unavailable. Applications using the service will encounter timeouts or various errors when trying to use the service.

Applications will automatically be restarted once the maintenance is over.

EDIT 20:05 UTC: The maintenance is beginning

EDIT 20:28 UTC: The downtime was reduced to a few minutes but multiple network cuts may have happened. Applications linked to this service are currently redeploying.

Monday 6th June 2022

Cellar Paris zone is experiencing network issues

[Times in UTC] 19:30: We are experiencing network issues in our Paris data center.

19:40: The culprit is a switch that half stopped responding. Turns out that it's not broken enough so its routes are automatically removed. Our DC contractor is moving to physically remove the switch. ETA is 30 minutes.

20:00: Cellar seems to be up again. We are still watching and waiting for a direct confirmation from our DC contractor.

00:00: Everything is back to normal

Sunday 5th June 2022

No incidents reported

Saturday 4th June 2022

Infrastructure Server lost

A hardware failure occurred on one of our server (hv-par4-001) Applications are being redeployed on other ones Addons are impacted

Friday 3rd June 2022

MongoDB shared cluster Mongodb shared cluster issue

After an anormal CPU load, one of the Mongodb did not restart.

EDIT: trying to repair database files EDIT: database filesystem repaired

EDIT 04/06: MongoDB process has restarted. Some customer perform expensive queries on the MongoDB cluster, which can cause an OOM of the process,

EDIT 06/06 10:31:06 UTC: mongodb-c2 is still experiencing issues, we are working on it.

EDIT 06/06 11:24:00 UTC: Because of a replication recovery bug not fixed by MongoDB on pre-SSPL version, we are working on making databases back from the previous backups made overnight. Everything should be back on in the afternoon. Users can setup new dedicated database with the previous backups for faster recovery.

EDIT 06/06 13:45:00 UTC: Restore process has began, it will take a few hours. We will keep you posted.

EDIT 06/06 15:01:00 UTC: We restored half of the customers. We are expecting full recovery in a few hours.

EDIT 06/06 17:01:00 UTC: An issue occured while restoring the databases. We are investigating.

EDIT 06/06 23:00:00 UTC: We restored all the databases that were not above usage quota. The cluster is now running and we improved how we export connection data so applications will behave better when connecting.

Current state:

  • DBs have been imported from backups. Backups that were above the free quota were not imported.
  • Connection URIs have been updated to include the whole replica set. This will simplify and stabilize how applications connect to the cluster.

Thursday 2nd June 2022

No incidents reported

Wednesday 1st June 2022

No incidents reported