Metrics / AccessLogsMetrics and access logs storage layer unreachbility
Our monitoring has detected failure on the storage layer of metrics and access logs.
We have found that a storage node has lost several disk.
We have remove faulty disks and restarted the storage node.
EDIT 16:00 UTC : The storage layer is restarted and we are consuming the ingestion lag
Infrastructure[RBX] A hypervisor has rebooted
2023-06-07 08:56 UTC: A hypervisor on the RBX zone has rebooted.
09:00: the machine has fully rebooted, it is restarting all its VMs. Applications VMs are redeploying on other hypervisors.
09:31: the checks are done, everything seems to be running fine as of now.
We will investigate to understand why this hypervisor rebooted in the first place.
Tuesday 6th June 2023
Reverse Proxies[JED] Load balancers metrics show abnormal response status code
Monitoring of load balancers is detecting an abnormal amount of http 404 status. We are investigating.
EDIT 13:00 UTC : We have located the root cause, we are applying a fix.
EDIT 14:20 UTC : The issue is resolved
Monday 5th June 2023
No incidents reported
Sunday 4th June 2023
Infrastructure[RBX] lost connectivity with an hypervisor
We lost connectivity with an hypervisor on RBX. Applications have been redeployed but some databases may not be reachable. We are investigating.
EDIT 03:58 UTC: server is back online. All databases should now be reachable.