On February 5th, for Safer Internet Day, our team launched its first public-facing system, called Password Checkup. Password checkup allows users to check, in a privacy-preserving manner, whether their username and password matches one of the more than 4B+ credentials exposed by third-party data breaches of which Google is aware. This launch success vastly exceeded our wildest expectations, with over 650,000 users installing our chrome extension in the first three weeks following the release.
Accounts which are exposed via data breach are 11.6 times more likely to be compromised, in part because many Internet users reuse the same credentials on multiple sites. Password Checkup help users mitigate this threat through a one-click, install and forget Chrome extension that warns them at login time if the username/password used for that site was publicly exposed in a breach (as shown above).
Given the sensitive nature of checking credentials, we designed Password Checkup to ensure Google never learns the username or password checked. As we’ll explain later, this is accomplished with an innovative protocol that combines k-anonymity, private set intersection, and computationally expensive hashing. We decided to release Password Checkup as early as possible, as an experimental Chrome extension, so we could work with the security community to ensure that its protocol strikes the right balance between security, privacy, and performance.
This blog post recounts the history of Password Checkup, from its inception to its launch, by exploring the following topics in turn.
- Origin story: How Password Checkup came into being.
- Design principles: A discussion of the guiding principles behind Password Checkup
- Checking protocol: Explains how our innovative privacy-preserving protocol works, and some of the alternatives we considered.
- Implementation details: Provides key details on how Password Checkup is implemented, both on the server side and client side.
- Impact: Analyzes how Password Checkup helped improve users’ account security posture based on early post-launch metrics.
- Lessons learned: Reflects on the reasons behind Password Checkup’s success.
One of the ways we keep Google accounts safe is by proactively resetting reused passwords for accounts found in third-party data breaches.
While I can’t remember when Google started doing proactive resets of compromised passwords, I do remember vividly when we announced it publicly: it was in September 2014, when a dump of 5M “gmail credentials” surfaced on a Russian Bitcoin forum. Very soon after this leak, the press and our users started to worry that Google was breached, which was obviously not the case. To ease our users’ minds, with Borbala, Tadek and Mark we took the decision to go public and let them know that Google was not breached, and that behind the scenes we were watching for those dumps and were taking proactive steps to keep their accounts safe.
Proactive breached password reset is just one layer of Google defenses but an important one, because without it Google accounts exposed in third-party breaches would be 11.6 times more likely to be compromised. Over the last two years alone we were able to proactively re-secure over 110M accounts that had their passwords exposed in a breach.
Given the scale and effectiveness of proactively resetting passwords at Google it was clear that users would benefit of such protection across the Internet. This is why we started researching how to empower users to be notified in case any of their credentials was breached regardless of the site they were used for so they can reset them.
Designing Password Checkup
Drawing inspiration from the widely successful Safe Browsing malware and phishing API, we started researching how to develop a somewhat similar API to check for compromised passwords back in 2017. Talking to the Safe Browsing team and doing user research quickly led us to understand that if we were to be successful, we needed to create something that had a very clear value and a single purpose, and be privacy preserving. This section discusses the design decisions we made to achieve this objective.
Password Checkup goal
Very early on we converged on Password Checkup’s single purpose: to provide a privacy-preserving protocol for users to learn whether any of their online account credentials are included in a data breach, so they can proactively reset their passwords. We designed our protocol to support a range of potential applications, including detection at login time or through a password manager.
Building a system to achieve this goal was obviously easier said than done, and we ended up going through many iterations, removing non essential features and design elements. For example, for the initial release, we decided to not show a warning when a weak password is used because it was less actionable and diluted the extension’s usefulness and purpose. We also iterated with Stanford and our internal privacy team to come up with the best privacy privacy preserving protocol and make the right tradeoffs to ensure the system required zero configuration.
Having zero configuration means that we had to err on the cautious side and offer very strong privacy guarantees by default. While I am sure this was the right decision for our initial release, it has the serious downside that Password Checkup does not work on mobiles as it consumes too much resources (network, CPU, RAM). Moving forward there are tradeoffs, discussed in the protocol description section, that can be made if we later decide to bring Password Checkup to low-power devices.
API Design principles
The guiding principle behind our API design was to provably limit what information is learned as much as possible, to ensure users’ peace of mind. This led us to ensure the system will fulfill the following requirements.
Google learns nothing: Google doesn’t learn anything about the credentials tested or who performed the check. This means that the API must be anonymous, that most of the computation happens on the client side, and that the API relies on a privacy-preserving information retrieval protocol. Obviously this also implies that the extension should collect only the minimal amount of anonymous usage statistics needed to evaluate success and ensure the system works correctly.
Hackers learn nothing: Hackers can’t abuse the API to build a database of stolen credentials that they can weaponize. Including this requirement in the protocol design was essential, because having an anonymous API means we have no control over who can query it. This requirement is mostly fulfilled by using the very strong and slow hashing function (Argon2) and having the right tradeoff between computation speed and security, to ensure that CPU/GPU brute forcing of the API is prohibitively expensive. (More on this later.)
Later on, to ensure that Password Checkup will be simple, fully automated, and have a clear value proposal, we added the following design principles while refining the system.
Actionable information: Warning fatigue is a real problem, and leads users to ignore important messages. To avoid exhausting our users, we decided that the API will report a match only when the exact credentials are in a data breach.
Keep it focused: As a corollary to the previous design choice, while having a strong password is important, we decided against covering this use-case as it is not as clear-cut as telling users their credentials are breached, and that they are in immediate danger. We felt that covering this use-case (and a related one) would dilute the message.
Being transparent: Good cryptography is open-source cryptography (Kerckhoffs’s principle) as it allows experts to vet protocols and review them for potential flaws. Accordingly, the protocol used by Password Checkup is described in a research paper which is currently under submission. We also designed the protocol from the get-go with the Stanford Applied Cryptography Group (thanks to Dan for all the help!) to ensure we had an independent, external oversight to reduce the risk of designing something flawed.
Release early: The protocol security and privacy rely on making the right tradeoffs in terms of what is shared (anonymous dataset size), how much computation is required, and what information is retrieved. There is no perfect answer and the right balance for those tradeoffs requires a conversation with the security community. This is why we decided to go with on an early release of an experimental Chrome extension, to get the much-needed feedback.
How the Password Checkup protocol works
Password Checkup’s technical foundation is its innovative protocol that guarantees users that Google will learn nothing about credentials queried by a user. At the same time, the protocol ensures a proof of work to prevent abusive clients from treating Password Checkup as an oracle to brute force passwords. Note that the protocol and its underlying technologies are discussed in greater detail in the paper under submission mentioned above, so we can get the feedback from the academic community on it. If the following section doesn’t answer your questions on how the protocol works, and you want early access to the paper to help us improve Password Checkup, let me know.
Password Checkup combines the following techniques to protect users’ privacy and prevent API brute force attacks.
K-anonymity, which is used to help prevent user tracking. We adopted this over oblivious transfer to reduce the unfeasible computational overhead of querying over 4B+ records via a remote network – each of the requests to the API returns a blinded pool of breached credentials to the client, so it is impossible to know on the server side which one (if any) is a match.
Private set intersection is used to ensure the server doesn’t learn which credentials are checked and that hackers can’t use API responses to learn what credentials are in the database. Our private set intersection uses the elliptic curve NID_secp224r1.
Argon2: The slow hashing function Argon2 is used to prevent hackers brute-forcing the API, and to protect passwords at rest along with encryption. Argon2 was selected because it allows us to finely tune the computational cost required to protect against CPU bruteforce and the memory used to prevent GPU brute force independently.
Prior to any client lookup, we have to create a database that contains all the known breached credentials. The algorithm we use to do so reported the above works as follows.
Canonicalization: We normalize usernames to dedup the credentials and address the fact that sites use either the full email address or a username. This is done by lowercasing the username and stripping any email provider information. For example email@example.com becomes user1.
Hashing: each credentials (username/password) is hashed to a 16-bytes digest using the slow hash function Argon2 using 256MB memory and a time cost of 3. Using Argon2 ensures that the database is prohibitively expensive to bruteforce, either in GPU or CPU, due to its prohibitive computational (~1 sec) and memory costs. When ingesting a new data breach dump, we run a map-reduce on Google infrastructure to be able to do this one-off computation at scale. A 100M database requires roughly 1200 days of computation.
Blinding: Each of the hashed credentials is then blinded with a 224-bit secret key b by mapping the hash to the elliptic curve NID_secp224r1 and raising the resulting point to the power b. Blinding is mostly used to ensure requester anonymity and password secrecy via private set intersection, as detailed in the next section. The private set intersection protocol also serves as an additional layer of defense in the event that the Password Checkup database is breached.
Sharding: We use the first two bytes of the (unblinded) hashed credentials as sharding keys, to partition the database into evenly distributed buckets of randomly distributed hashed/blinded credentials. Sharding on the hashed credentials ensures client k-anonymity over all the valid credentials in the universe, because by construction as a cryptographic hash function is not invertible and leaks no statistical information about the data that was hashed. As a result, no-one knows or can infer which credentials will be hashed to a given prefix unless they know the credentials and hash them themselves, which is a very slow process thanks to Argon2.
All in all, the database structured in the format described above uses ~1TB of storage that is split into 65,536 shards.
Database construction tradeoff
Using the first two bytes of the (unblinded) hash causes a little bit of theoretical information leakage, because it reduces the universe of credentials the user is checking against an estimated 357k to 476k valid credentials, down from the 23B to 31B unique credentials that are believed to exist. These numbers are derived from previous research that estimates there are 3.9B Internet users, that each user has 6–8 passwords, and that users use the same username everywhere.
In practice, this tradeoff maintains a strong user k-anonymity while making the protocol practical by reducing the computation and network traffic to something reasonable, at least for desktops; the client still has to download 1MB for each lookup, which is already too much for a mobile. This is why increasing the k-anonymity by using a single byte prefix is possible in practice, and we believe that two bytes is the best possible balance between strong privacy and network usage.
At a high level protocol (see above), the lookup comprises three major steps:
- Client request: In this first step the client repeats the same hashing and blinding steps than we did on the server side to create the database. However, instead of using the server secret key b, the client generates its own blinding key for each request that is called a in the algorithm description above. On top of being needed to ensure privacy, requiring the client to use Argon to hash the credentials has the added benefit of making it very, very expensive for malicious clients to bruteforce the API by construction. This is very important, because the API is anonymous and so we have no other effective way to rate limit its usage.
- Server response: Upon receiving the request with the blinded hash and the prefix, the server does two things. First, it creates a double-blind H^ab by blinding the client-blinded hash H^a with its secret key b. Second, it retrieves all the credentials that match the prefix sent by the client. The double blinding is used to perform a Diffie-Hellman private set intersection that guarantees the client will learn nothing about the other credentials leaked that are returned as part of the response, while still being able to check if his or her credentials are part of the response. Technically this is possible, thanks to the commutative property of ECDH (Elliptic Curve Diffie Hellman). The size of the prefix and therefore of S’ is where the tradeoff between privacy and network/computation cost is made. Returning the entire database at each lookup, which is the privacy perfect case, is impossible as the database requires 1TB of storage. As discussed in detail earlier, using a prefix of two bytes is what we believe to be the best tradeoff between anonymity and network overhead.
- Client side verdict: Finally, the client is able to determine if the credentials are included in a password breach by completing the private set intersection protocol. This part, which involves unblinding Hab to get Hb and checking if Hb is in the shard S’, is done on the client side to keep secret which hash is matched, if any.
|Argon2 username hash||0.1s||0.3s||0.3s|
|Argon2 credential hash||4.4s||9.9s||12.7s|
|End-to-end API query||8.6s||18.5s||26.2s|
Overall, as reported in the table above, it takes a median time of 8.6 seconds for a client to perform a lookup according to our anonymous telemetry. Half of this time is spent performing the Argon2 hashing, while the rest is mostly spent downloading data. This clearly shows that increasing the size of the shards or computation difficulty would be problematic for performance.
The extension also employs Dropbox’s open-source password strength meter zxcvbn, to measure if people use a strong password when changing their compromised one. So far, password strength is not surfaced to users because, as discussed in the next section, telemetry suggests that people already do the right thing and use a stronger password than before when changing it. It is therefore unclear if providing this information is useful to users, and if it is, how to do it in the best possible way.
All that is left before wrapping up this post is to answer the most important question: does Password Checkup made the world safer?
The answer is yes! Of the 6M credentials checked every week, 85k of them are detected as compromised (1.8%) and users reset about 25% of those. Moreover, 94% of the new passwords are as strong or stronger than the compromised one, as can be seen in the chart above. It shows the strength of the compromised password and the strength of the new password, according to Password Checkup anonymous telemetry that reports the strength scores given by the strength meter embedded in the extension (zxcvbn) upon password change.
Reading through users’ feedback and press articles, it is clear that Password Checkup had such a strong start because it solves an important problem in a simple, automated, and privacy-preserving way. These qualities made it easy for the user and the press to see the value of spending a minute to install Password Checkup.
This quest for simplicity explains why, in my view, the main challenge we overcame was not the technical implementation (even though the crypto is clearly complex) but how to provide the best possible user experience to our users. It took many iterations to get this experience right and we axed many features along the way in order to make the system as simple as we wanted it to be.
In a world cluttered with many products, always being modified to do more, Password Checkup takes the opposite direction and embraces Saint Exupery, the author of “Le petit Prince”, for his philosophy about perfection: “Perfection is Achieved Not When There Is Nothing More to Add, But When There Is Nothing Left to Take Away”. This philosophical stance, supported by very strong technical execution, is what I think is the recipe behind Password Checkup’s success.
Password Checkup is far from perfect and there is a lot to do before it reaches its full potential. Hopefully, one day passwords will be a thing of the past, but until then proactively resetting compromised credentials will remain an important defense that users need.
Thank you for reading this blog post till the end! Don’t forget to share it, so your friends and colleagues can also learn about Password Checkup. To get notified when my next post is online, follow me on Twitter, Facebook or LinkedIn. You can also get the full posts directly in your inbox by subscribing to the mailing list or via RSS.