Some systems are experiencing issues

Past Incidents

Monday 6th July 2020

No incidents reported

Sunday 5th July 2020

No incidents reported

Saturday 4th July 2020

No incidents reported

Friday 3rd July 2020

MongoDB shared cluster MongoDB shared cluster high load issue

During 3 minutes the MongoDB shared cluster has experiencing high load. That prevent new users connection. We are upscaling it.

EDIT 14:32 UTC - the cluster had been upscale.

DNS resolution affecting certain ISPs

Some customers are experiencing issues with resolving our domains clever-cloud.com domains and their own domains pointing domain..clever-cloud.com.

At the moment, we know the problem is affecting customers of the French ISP Orange.

08:28 UTC: We found that Orange NS servers were indeed still using the faulty NS records from last night's incident. We have updated the zone on those name servers which should have never been used in the first place and hopefully Orange customers will be able to resolve our domains (and by extension their domains) properly.

08:42 UTC: Looks like the propagation is quite fast and this indeed fixed the issue for affected customers.

Thursday 2nd July 2020

Reverse Proxies Domain provider issues

Our domain provider briefly gave out an empty DNS zone file after a configuration change.

EDIT 20:04 UTC - fixed.

Wednesday 1st July 2020

MongoDB shared cluster High load on the MongoDB shared cluster

The mongodb shared cluster hosting free mongodb databases has a higher load than usual. It started going up at 15:25 UTC slowing reaching the point where it could not serve most of the requests as expected ~30 minutes ago. It is expected that requests would also fail since then because of timeouts or aborted connections.

The service has been restarted and we will monitor it closely, as well as adding monitoring to better catch this ramp up.

Dedicated databases are not impacted by this issue. If you are impacted, you can migrate your free plan to a dedicated plan using the migration feature. You can find it in the "Migration" menu of your add-on.

22:41 UTC: Load seems back to its normal state. The monitoring has been adjusted and we should then receive an alert at the start of the event instead.

23:19 UTC: The issue is back, the load is not as high as before but it might make the cluster slow.

23:54 UTC: Users impacting the cluster the most have been contacted to avoid this issue. Further actions will be taken later today if the issue persists.

2020-07-01 06:25 UTC: The node crashed due to a fatal assertion hit and restarted

06:38 UTC: The node is still unreachable for an unknown reason

07:48 UTC: The cluster is currently being repaired. For an unknown reason, nodes wouldn't listen to their network interfaces.

09:32 UTC: The repair is halfway through. The cluster might be able to be up again in ~1h30

11:50 UTC: The repair is done, the node successfully restarted. You should now be able to connect to the cluster. We are now re-starting the follower node for it to join back the cluster.

15:09 UTC: The leader node crashed again because of an assertion failure which means it is now unreachable again as mongodb reads its entire journal and rebuilds the indexes.

15:30 UTC: It usually takes 1h30 for mongodb to read the whole journal so it should be up again around 16:20 UTC.

16:34 UTC: It is taking longer than usual.

20:06 UTC: The restarts weren't successful. The secondary node successfully started at some point but was shutdown to avoid any issue with the primary one. We'll try starting it again.

2020-07-02 09:15 UTC: The first node has been accessible now and again but keeps on crashing due to user activity. The second node failed to sync to the first node so it cannot be used as primary right now. We are now trying to bring the first node back up without making it accessible to users so we can at least get backups of every database. Once this is done, we will update you on the next steps. This process will take a while as Mongo takes hours (literally) to come up after a crash.

12:00 UTC: The first node is finally back up (but incoming connections are shut off for now). We are now taking backups of all databases, you should see a new backup appear in your dashboard in the coming minutes / hours. Once this is done, we will start working on bringing the second node back in sync. Once the cluster is healthy, we will bring it back online.

14:30 UTC: Backups are over, customers who were using the free shared plan in production can create a new paid dedicated add-on and import the latest backup there. Meanwhile, we are now rebuilding the second node from the first one to make the cluster healthy again. Once it's over, we will bring the service back up (if everything goes well).

15:55 UTC: The second node is synced up and the service is available again. We are still monitoring things closely.

18:35 UTC: The service is working smoothly, no issues or anomalies to report.

Tuesday 30th June 2020

No incidents reported