GitHub Status

Disruption with some GitHub services

Resolvedminor impactVendor link

Started

Mar 12, 1:54 AM

Resolved

Mar 12, 2:45 AM

Duration

51m

Update Timeline

ResolvedMar 12, 2:45 AM

Between 01:36 and 08:11 UTC on Thursday March 12, GitHub.com experienced elevated error rates across Git operations, web requests, and related services. During a planned infrastructure upgrade, a configuration issue caused newly provisioned Kubernetes nodes to run an incompatible version of etcd, which disrupted cluster consensus across several production clusters. This led to intermittent 5XX errors on git push, git clone, and page loads. Deployments were paused for the duration of the incident.<br /><br />Once the incompatible nodes were identified, they were removed and cluster consensus was restored. A validation deploy confirmed all systems were healthy before normal operations resumed.<br /><br />To prevent recurrence, we are adding programmatic enforcement of version compatibility during node replacements, implementing monitoring to detect split-brain conditions earlier, and updating our recovery tooling to reduce restoration time.

InvestigatingMar 12, 2:44 AM

We've identified the root cause and are working on resolving the underlying issue. Some users may have encountered intermittent failures and errors. We're continuing to see reduced error rates.

InvestigatingMar 12, 2:13 AM

We are investigating elevated error rates. Error rates are now decreasing and we're continuing to monitor the situation.

InvestigatingMar 12, 1:54 AM

We are investigating reports of impacted performance for some GitHub services.

Disruption with some GitHub services — GitHub Incident | DevHelm