An hypervisor has been lost on the Paris zone. We are investigating.
EDIT 06:04 UTC: The server experienced a hardware failure. It may not be able to come back. Applications on it were redeployed elsewhere. Custom services and add-ons are currently impacted.
EDIT 06:23 UTC: A public reverse proxy serving requests for domain.par.clever-cloud.com (22.214.171.124) was on this hypervisor. This IP was moved to another server. Between 05:23 and 05:35, it was unreachable.
EDIT 06:52 UTC: ETA for server to come back is 08:00
EDIT 07:46 UTC: Hardware has been changed, server will be rebooted.
EDIT 07:57 UTC: Server is back online, we are making sure all services are up.
EDIT 09:10 UTC: Everything is now back to normal, the incident is over. We will investigate further on the reason of the hardware failure.
Thursday 9th June 2022
MailsMails delivery issues
Our mail provider is currently experiencing issues. You may notice delays in receiving emails for notification, password forgotten, or account signup, billing and other services. You may also experience errors when clicking on links in those emails, like "Bad request".
EDIT 13:55 UTC: Our provider now indicates that emails should now be received with some delays.
EDIT 16:15 UTC: Email delivery should now be working fine again. Our provider's incident is over.
Wednesday 8th June 2022
Reverse Proxies[PAR] Random 503 unavailable errors on public reverse proxies
We are seeing an unusual amount of 503 errors on public reverse proxies, we are looking into it.
EDIT 21:28 UTC: The issue has been found and fixed. We are monitoring the situation.
EDIT 21:40 UTC: Everything seems to be back to normal. The issue was happening for a couple of applications starting around 16:30 UTC. We will investigate further on why its configuration was out of sync during that time period.
InfrastructureA hypervisor is down
16:13:00 UTC: A hypervisor has stopped responding. We are investigating why.
The system is redeploying the applications that were on it. Some reverse proxies are not responding.
16:24:00 UTC: At first look, it seems that a network error is making us see that hypervisor as down. No information yet on if it's a hardware or software network issue.
16:28:00 UTC: The hypervisor seems to be back up again. We are making sure everything on it is responding well.
16:40:00 UTC: Everything has been check and is responding correctly.
Some add-ons became unresponsive.
Logs were not served.
One public reverse proxy was unresponsive. Traffic should have been diverted to others. Applications may have been a bit slow.
Some custom services for customers were unresponsive.
DeploymentsDeployments are experiencing issues
Deployments are currently experiencing various issues, we are investigating.
EDIT 14:55 UTC: The problem has been identified and fixed. Deployments should now be working for the last 10 minutes. Sorry for the inconvenience.
Tuesday 7th June 2022
Reverse Proxies[RETROACTIVE][PAR] An add-on reverse proxy was unreachable
An add-on reverse proxy was unreachable between 14:45 and 14:48 UTC. It has been restarted and is now serving requests as expected. Applications may have failed reaching their add-on during this time.
Our monitoring shows abnormal CPU usage on some Pulsar brokers, we are investigating.
EDIT: we stop some components which were increasing load of the cluster. it should be more stable now
[PAR] Unique IP service planned unavailability, scheduled 3 weeks ago
The unique IP service will undergo a maintenance period for 30 minutes on June 7th starting at 20:00 UTC. During this time period, the service will be unavailable. Applications using the service will encounter timeouts or various errors when trying to use the service.
Applications will automatically be restarted once the maintenance is over.
EDIT 20:05 UTC: The maintenance is beginning
EDIT 20:28 UTC: The downtime was reduced to a few minutes but multiple network cuts may have happened. Applications linked to this service are currently redeploying.
Monday 6th June 2022
CellarParis zone is experiencing network issues
[Times in UTC] 19:30: We are experiencing network issues in our Paris data center.
19:40: The culprit is a switch that half stopped responding. Turns out that it's not broken enough so its routes are automatically removed. Our DC contractor is moving to physically remove the switch. ETA is 30 minutes.
20:00: Cellar seems to be up again. We are still watching and waiting for a direct confirmation from our DC contractor.