Friday 3rd June 2022

MongoDB shared cluster Mongodb shared cluster issue

After an anormal CPU load, one of the Mongodb did not restart.

EDIT: trying to repair database files EDIT: database filesystem repaired

EDIT 04/06: MongoDB process has restarted. Some customer perform expensive queries on the MongoDB cluster, which can cause an OOM of the process,

EDIT 06/06 10:31:06 UTC: mongodb-c2 is still experiencing issues, we are working on it.

EDIT 06/06 11:24:00 UTC: Because of a replication recovery bug not fixed by MongoDB on pre-SSPL version, we are working on making databases back from the previous backups made overnight. Everything should be back on in the afternoon. Users can setup new dedicated database with the previous backups for faster recovery.

EDIT 06/06 13:45:00 UTC: Restore process has began, it will take a few hours. We will keep you posted.

EDIT 06/06 15:01:00 UTC: We restored half of the customers. We are expecting full recovery in a few hours.

EDIT 06/06 17:01:00 UTC: An issue occured while restoring the databases. We are investigating.

EDIT 06/06 23:00:00 UTC: We restored all the databases that were not above usage quota. The cluster is now running and we improved how we export connection data so applications will behave better when connecting.

Current state:

  • DBs have been imported from backups. Backups that were above the free quota were not imported.
  • Connection URIs have been updated to include the whole replica set. This will simplify and stabilize how applications connect to the cluster.