Metrics / AccessLogsHBase cluster supporting metrics down
(All times in UTC)
At 22:50 we got an alert saying access logs stopped being consumed.
At 22:53 we got alerts saying hbase region servers went down.
After investigation, the hadoop namenodes were all in standby. At 23:33, after various checks, we promote one back to active.
We then restarted all the hbase regionservers, then waited for the cluster to balance and heal up.
At 00:04 we restart the warp10 stores.
At 00:07 everything is back to normal.