Google started this week with a major crash that brought down Gmail, Drive, and all other working apps. As promised, Google now has a detailed explanation of the consumption and the steps it will take to prevent future events.
At a high level, the issue is related to an existing work updating Google’s account authentication system. As the effort progressed, previous components were “left in place. ”While keeping these old items that caused a usage error at 0, Google set a grace period to delay the effect.
That treatment solution went away and led automated systems to respond to the error as if it were real. As 0 appeared to have usage, the capacity of the identity management system was reduced. While safety checks were in place, they were not designed to address the specific problem.
The case began affecting users at 3:47 a.m. PT and engineers were notified minutes later. “Workplace apps have been downgraded” because they rely on the affected infrastructure to ensure that you are logged in, authenticated and authorized to view content, such as posts -d and documents.
At 04:08 the root cause and possible solution was identified, which led to the implementation of a quota in one datacenter at 04:22. This quickly improved the situation, and at 04:27 the same discount was applied to all datacenters, which returned error rates to normal levels before 04:33.
The company has implemented plans to review, improve and evaluate its systems to prevent such issues. Google has ended this comment on apology:
We would like to apologize for the breadth of impact this event has had on our customers and their businesses. We accept any event that has a significant impact on the accessibility and reliability of our customers, particularly events that span multiple categories.
The full technical explanation can be found here.
FTC: We use revenue earning affiliate links. More.
Check out 9to5Google on YouTube for more news: