Tuesday 23rd January 2024

Metrics [Metrics] Requests timeouts

We are currently observing requests timeouts on the Metrics cluster. The issue has been identified and we are working towards the resolution. No data loss is to be expected. Various graphs (grafana, console, ..) might not properly load or render with various errors.

Edit Tue Jan 23 17:59:56 2024 UTC: A faulty configuration has been applied to a node to investigate a memory-leak. The configuration backfired on the whole cluster, making it unhealthy. The configuration have been rollback. The storage layer is currently under healing mode. To speed-up the recovery, query have been disabled.

Edit Tue Jan 23 19:51:21 2024 UTC: cluster is now healthy and recovering lag, which should last a few hours. Query will be opened when lag is resorbed.

Edit Wed Jan 24 00:04:59 2024 UTC: datalag is now ok. We are still reloading metrics's metadata, so query is still not available. Should be up in a few hours

Edit Wed Jan 24 01:54:22 2024 UTC: metadata lag is now ok, query is back online