Thursday 8th June 2023

Wednesday 7th June 2023

Metrics / AccessLogs Metrics and access logs storage layer unreachbility

Our monitoring has detected failure on the storage layer of metrics and access logs. We have found that a storage node has lost several disk. We have remove faulty disks and restarted the storage node.

EDIT 16:00 UTC : The storage layer is restarted and we are consuming the ingestion lag

Infrastructure [RBX] A hypervisor has rebooted
  • 2023-06-07 08:56 UTC: A hypervisor on the RBX zone has rebooted.
  • 09:00: the machine has fully rebooted, it is restarting all its VMs. Applications VMs are redeploying on other hypervisors.
  • 09:31: the checks are done, everything seems to be running fine as of now.

We will investigate to understand why this hypervisor rebooted in the first place.

Tuesday 6th June 2023

Reverse Proxies [JED] Load balancers metrics show abnormal response status code

Monitoring of load balancers is detecting an abnormal amount of http 404 status. We are investigating.

EDIT 13:00 UTC : We have located the root cause, we are applying a fix.

EDIT 14:20 UTC : The issue is resolved

Monday 5th June 2023

Sunday 4th June 2023

Infrastructure [RBX] lost connectivity with an hypervisor

We lost connectivity with an hypervisor on RBX. Applications have been redeployed but some databases may not be reachable. We are investigating.

EDIT 03:58 UTC: server is back online. All databases should now be reachable.

Saturday 3rd June 2023

Friday 2nd June 2023

Metrics / AccessLogs Metrics/access logs storage layer issue

We are detecting some errors on our storage layer responsible for storing metrics and access logs data. We are investigating.

EDIT Lag has been catched up