Some systems are experiencing issues

Past Incidents

Friday 2nd June 2023

Access logs Metrics/access logs storage layer issue

We are detecting some errors on our storage layer responsible for storing metrics and access logs data. We are investigating.

EDIT Lag has been catched up

Thursday 1st June 2023

Cellar [RBXHDS] Cellar Load Balancers partially unavailable

2023-06-01 16:20 UTC : During the RBXHDS incident, one of the Cellar LB lost its configuration. The configuration of each LB was not correctly monitored. Only the whole service availability was.

2023-06-02 09:15 UTC : after customer complaints we found out about the LB misconfiguration and fixed it.

2023-06-02 09:28 UTC : monitoring checks have been added to catch this kind of issues right away.

Infrastructure [RBXHDS] Load balancers metrics show abnormal response status code

Monitoring of load balancers is detecting an abnormal amount of http 404 status. We are investigating.

EDIT 17:51 UTC : We have found the issue and the fix is passed. Everything is operating normally

Customer support Ticket center availability issue

We are currently aware of an issue impacting our Ticket center service. This may impact our customers to open, view and reply to the tickets opened with our support team.

EDIT 13:30 UTC: Our ticket center provider told us that the issue has been mitigated on their end and that it is now resolved. We keep monitoring the situation for now but we can indeed see that service are operating normally those last few minutes.

EDIT 14:47 UTC: We did not see any other issues. We consider this incident to be over.

Wednesday 31st May 2023

Access logs The metrics storage layer is unavailable

The monitoring detect errors on the metrics / access logs storage layer. We are investigating.

EDIT 11:46 UTC : We have found the issue and fixed it. We are recovering the lag.

EDIT 13:19 UTC: The lag has been consumed, everyhting is operating normaly

Tuesday 30th May 2023

No incidents reported

Monday 29th May 2023

No incidents reported

Sunday 28th May 2023

Infrastructure [Montreal] Multiple hypervisors are unreachable

An hypervisor on the Montreal zone is unreachable. One of the FSBucket servers of the zone is hosted on it and is therefore unreachable too. This might impact PHP applications as well as any applications using an FSBucket hosted on this server.

We are awaiting information from our infrastructure provider regarding this incident.

EDIT 19:53 UTC: It seems like multiple servers are impacted at the same time, we believe it to be an issue with a specific OVH rack or room. Multiple services on the zone are thus impacted. We are looking at ways to mitigate the issues.

EDIT 20:05 UTC: The servers are reachable again since a few minutes. We are currently making sure everything is fine. OVH incident can be followed here: https://bare-metal-servers.status-ovhcloud.com/incidents/k664s90jxfj0

EDIT 20:15 UTC: Servers in the impacted rack couldn't reach each other up until now. It could have prevented some services to correctly work. It seems like OVH fixed it before we could report it to them. We continue to making sure everything is working as expected.

EDIT 20:36 UTC: The incident is over. We are redeploying all the applications of the zone to be on the safe side.

Saturday 27th May 2023

No incidents reported