Tuesday 8th October 2019

Access logs Metrics ingestion delay + slow read queries

We are experiencing an issue on the Metrics service which is due to an error while adding capacity to the storage cluster. We are working on it.

10:26 UTC: The ingestion issue is fixed, the system is now catching up.

10:33 UTC: The ingestion delay is almost back to normal.

10:36 UTC: There is still a bit of a lag but it should come back to normal in a few minutes. Read performance is still a bit hit or miss but coming back to normal as well. We will reopen the incident if it does not.

11:06 UTC: The ingestion lag is increasing. We are investigating. This may take a while.

11:30 UTC: The cause has been identified and partially fixed.

11:37 UTC: Lag is now <5s ; we are currently working on fixing the issue in a more permanent way.

11:45 UTC: The issue is now fixed.