Google explains why YouTube recently broke Gmail

Google explains why YouTube recently broke Gmail

Google says a bug in the automated quota management system that affected Google ‘s User ID Service is the global authentication system that affected most of Monday’ s consumer chain.

This worldwide system failure prevented users from logging into their accounts and authenticating to all Cloud services.

As a direct result, users were unable to access Gmail, YouTube, Google Drive, Google Maps, Google Calendar, and several other Google services for nearly an hour on Monday, December 14th.

On the go, users were unable to send emails via Gmail mobile apps or receive email via POP3 for desktop messages, while YouTube visitors saw error messages say “There was a problem with the server (503) – Tap on Return.”

Outcome impact and root cause

“On Monday, December 14, 2020 from 03:46 to 04:33 US / Pacific, credit distribution and account metadata failed for all Google user accounts,” Google explained. “As a result, we were unable to verify that user requests were validated and catered for 5xx errors on almost all verified traffic.

“Most proven services had a similar control plane effect: increased error rates across all Google Cloud Platforms and Google Workspace APIs and Consoles.”

The main reason behind the exit was a reduction in capacity for Google’s main identity management system due to a bug affecting the automated quota management system.

This caused issues in verifying that Google user requests were verified, and as a result errors were displayed on all authentication attempts.

Error messages during the trip
Error messages during the trip

Global identity management system

Google’s User ID Service, which has been at the heart of Google’s flagship since Monday, stores unique identifiers for all Google accounts and manages authentication credentials for both OAuth tokens and cookies.

It also stores user account data in a distributed database, which uses Paxos protocols to coordinate updates at the time of verification.

Because the User ID Service service rejects requests when old-fashioned data is found for security reasons, not all Google-facing services facing users who require Google OAuth access were available. after the service starts things off and starts issuing old-fashioned identities.

“Google uses an evolving set of automation tools to allocate a quota of different resources to manage services,” the company said in a journal summary report published today.

“As part of the ongoing migration of the User ID Service to a new quota system, a change was made in October to register the User ID Service with the new quota system, but parts of the previous quota system have been abandoned. instead reported a false statement. used for the Service User ID as 0.

“A period of grace over the implementation of quota restrictions delayed the impact, which eventually came to an end, introducing automated quota systems to exceed the quota allowed for ID service. to reduce user experience and promote this event. “

Although safety checks are set up to prevent unplanned quota changes, they have not been able to properly address the situation where zero loads were reported on a single service.

“As a result, the quota for the account database was reduced, which prevented the Paxos director from writing,” Google said. “Shortly afterwards, most of the reading works went out of date and as a result there were errors in looking for proof.”

Google said this massive outage also affected users and devices within the company, causing delays during the outage review and reporting on status updates.

Gmail affected by a second break within one day

Gage suffered a second outage for about 7 hours in total after the authentication issues were resolved Monday, a breach that affected a subset of Gmail users who had mail- delivery issues. d.

“The error message indicated that the email address did not exist, and as a result, the affected emails were never delivered,” Google said in another report published today. “Affected senders may have received a kick email created by a medium SMTP service.”

“In some cases, the full SMTP error message was called in the kick email. The behavior of these messages depended on the external SMTP messages connecting to the Google SMTP service.”

The reason for this second move was a continuous migration to update the default configuration system of Gmail SMTP access service.

“Configuration change during this migration shifted the service option format behavior until it incorrectly provided an invalid domain name, instead of the intended ‘gmail.com’ domain name, to Google ‘s built – in SMTP service,” said Google.

As a result, the service incorrectly changed the address of certain e-mail addresses that ended up in “@ gmail.com” to non-existent e-mail addresses.

“When Gmail’s user account service checked all of the non-existent email addresses, the service was unable to find a valid user, resulting in SMTP error code 550.”

Source