A decade of Have I Been Pwned

Troy Hunt is such a treasure. And for us web application developers, there is no excuse for not having protection against credential stuffing! While the best defense is likely two-factor [1], checking against Hunt's hashed password database is also very good and requires no extra work for users!

I don't have anything to back this up, but my guess is that the vast majority of compromised user accounts comes from credential stuffing/password re-use. It's really surprising to me when I hear that huge companies don't do this check.[2] It's simple, easy, takes about a day to set up.

If you're a young CTO or early-stage engineer working on a web app and have never been targeted with a credential stuffing attack, let me tell you: It's coming! It's just a matter of time before it's 1AM and your phone blows up; your site is getting hammered; you think it's DDOS, but then realize most of the hits are on your login page, then realize that and then realize with a horrible feeling that some % of those hits are getting through the login page. You'll be up all night dealing with it, and then you have to make breach notifications, and that really sucks.

Troy Hunt's free database will save you that heartache (probably). Just do it.

1. https://cheatsheetseries.owasp.org/cheatsheets/Credential_St...

2. Like 23andMe. https://news.ycombinator.com/item?id=37794379

About a decade back, I was at an event that had an FBI employee presenting. During his presentation, he had mentioned a story of a sys admin who had been arrested for taking a hashed PW database in his company, comparing the hashes against known compromised one's (perhaps from haveibeenpwned?), and forced a password reset for everyone who had reused a password that had separately been compromised and sent an email to each employee explaining this.

One of the employees was apoplectic at the actions of the sys admin and had accused him of violating her privacy by doing this. While I do not recall which party initiated legal action against the sys admin that led to his arrest (i.e. the employee or the company), the bottom line of the story was that the FBI employee (and, by extention, whichever judge was involved in adjudication the case) considered the act of a sys admin accessing password hashes placed under his care to be a criminal breach of privacy regardless of his intent being to improve his company's security against password stuffing attacks.

Assuming the FBI employee didn't just make the whole thing up (which I have no reason to believe - there are a lot of tech-stupid judges and, especially a decade ago, tech-stupid FBI employees), it might be prudent to pass this by your legal team before checking for password hashes for your employees being in haveibeenpwned.

I would love to get a proper source on this. Seems a bit crazy, and wouldn't this be thrown out on appeal?

Unfortunately I have no source to give. The FBI employee was just giving an example of illegal behavior he knew of. He didn't cite jurisdiction or the names of people involved. Hell - even if he did, I likely wouldn't have remembered it as this was roughly 8 years ago I was in the audience for this (I know I said roughly a decade ago in my prior post - but I checked a receipt for the event and it was in 2015).

Quite likely Randal Schwartz.

"In July 1995, Schwartz was prosecuted in the case of State of Oregon vs. Randal Schwartz, which dealt with compromised computer security during his time as a system administrator for Intel. In the process of performing penetration testing, he cracked a number of passwords on Intel's systems. Schwartz was originally convicted on three felony counts, with one reduced to a misdemeanor, but on February 1, 2007, his arrest and conviction records were sealed through an official expungement, and he is legally no longer a felon." -- https://en.wikipedia.org/wiki/Randal_L._Schwartz

Important aspect: he had been fired and cracked passwords while no longer an employee, to try to get rehired:

"Rather ill-advisedly, the Perl-programming guru (who's written several books on the subject) tried to prove his worth by running a password cracking package after he'd left in order to produce evidence that security practices had deteriorated since his departure. Instead of re-hiring Schwartz, as he hoped, Intel called in the police and he was charged with hacking offences."

https://www.theregister.com/2007/03/05/intel_hacker_charges_...

Wikipedia is a little light on the details. How much time did he end up serving, and were there any repercussions for the other parties involved?

Really hard to belief without anything else to go by. This sounds like old wives tales like people that add disclaimers saying they aren't laywers when they comment on the internet because someone once told them they heard someone got in trouble.

Does it sound that unbelievable for the 2010s? There was quite a discrepancy between how the internet/computers were generally being used and the legality.

Like https://www.eff.org/deeplinks/2016/07/ever-use-someone-elses... > Last week, the Ninth Circuit Court of Appeals, in a case called United States v. Nosal, held 2-1 that using someone else’s password, even with their knowledge and permission, is a federal criminal offense.

Also, the courts only just legalized white hacking last year. Before that violating the terms of service was also potentially a federal crime. https://www.spiceworks.com/it-security/security-general/news...

Jobs can ask if you have ever been arrested outside of CA. (Note: not convicted of a crime).

Also you are going to spend a long time being arrested before the appeal goes out.

"In California, a criminal appeal can take several months to several years. The length of time depends on the complexity of the case and how quickly it moves through the appeals process."

sys admin accessing password hashes placed under his care

Parent commenter never mentioned anything about comparing stored password hashes. What you do is block bad passwords at password set time by hashing the prospective password and comparing with HIBP. A prospective password you haven't accepted or stored or transmitted off the application server - common sense says that's not a privacy violation - and many giant companies including my employer do this routinely.

[Edit] Oh yea I remember HIBP has an online API. Don't use this. Take the HIBP dumps that they make freely available and compare locally. If not for reasons of privacy, for reasons of simplicity and removing an unnecessary external business/legal/software dependency.

Ideally, but what if you're a new hire and the passwords already exist?

Be satisfied with fixing the new passwords going forward. Or gracefully force a new password for everyone, if circumstances permit that (circumstances including decision making authority; if you are the new CTO or CISO, and you're paranoid about reviewing the existing hashes, you should strongly consider the batched graceful forced reset!)

You can set a flag on login to use the password in memory rather than stored.

That's how you get the whole company to love you as a new CTO - force everyone to change their password, including people who have a strong non-reused password.

Your job as a CTO isn't to be loved by the entire company.

We’re evaluating different options in this thread. The right move is based on the circumstances and your judgement. I would support a new leader with the courage to close a security hole, maybe respect them even if I don’t love them.

By the way, I don’t feel paranoid to flag bad passwords on login (perhaps triggering an email OTP and forcing a password reset), personally. I responded to this thread because a commenter made an unfounded implication about using HIBP data to reduce vulnerability to credential stuffing.

Oh yea I remember HIBP has an online API. Don't use this.

That's not the greatest advice IMO. The API gets updated data more frequently, doesn't require that you transmit the password or a useable hashed form, and it's dead simple to consume. I'd argue that it's more effort to maintain an internal store and synchronization infrastructure, and you're less likely to accidentally breach anonymity and leak a weak hash by using the API than you are rolling your own query against the raw data.

It's also used by hundreds of bigcorps and government agencies who have way more pedantic lawyers than you're likely to have. If they couldn't find a good reason not to use it I doubt yours will.

Those are good arguments for using an online service. But your conclusion is premature and certainly cannot be made blanket like that in favor of using the API.

Just as many arguments can be made for an offline check. Or against an online check. From added latency via required uptime to added dependencies.

My point being: no. "It depends"

The FBI feeds data into Troy Hunt's database and FBI Director Christopher Wray gave Troy Hunt a medal for his work [1].

The Open Web Application Security Project's Application Security Verification Standard recommends that you do a hashed password check [2].

For bigger companies, sure, go talk to legal, but for young startups, my feeling is it's not worth the $200 or whatever your counsel will charge to say it's ok. I personally did not ask anyone (am cto), I just added the check.

1. https://twitter.com/troyhunt/status/1674132801837477888

2. See OWASP ASVS 4.0 2.1.7 https://github.com/OWASP/ASVS/blob/master/4.0/en/0x11-V2-Aut...

The whole situation did seem pretty exceptional when I heard it and I felt like I was being exposed to an alternate reality where lawyers made security worse for everyone.

That said I struggle to believe the sys admin had competent representation.

They forced a password reset. You can use HIBT data in a way that's less disruptive.

not a crime

Tell it to the judge.

It is worth it, that $200 dollars gives you lots of credibility to stand on if something should arise and you need to prove diligence, which is not at all uncommon in these cases, if legal recourse is ever saught (unlikely if you do it from day 1, I think, but never the less)

Well this unlocked a new fear I didn't know I needed to have. I suppose this is the massive drawback to allowing dinosaurs to spearhead policy and govern laws.

For what it's worth, the average tech-smarts in the legal realm and within the FBI are significantly improved compared to 8 years ago. This is just from my personal observation.

That said, there are still tremendous gaps yet to be bridged with the understanding of many procecutors and lawyers as well as weird applications of the law that aren't intuitive to people whose life is technology.

For example (and I caveat this with IANAL): Did you know the physical medium you get Internet to your house determines what laws and processes the government can use to monitor your Internet traffic?

the physical medium you get Internet to your house determines what laws and processes the government can use to monitor your Internet traffic?

I did not! Do you have / know a good explanation of the details?

From my (non-lawyer) understanding, if you have a coax cable connected to a cable modem providing Internet to your residence, your privacy is governed by https://www.law.cornell.edu/uscode/text/47/551

Other means of Internet getting to your residence is covered by Title 3 of the ECPA which, historically, Feds have played fast and loose with getting data from.

Did you know the physical medium you get Internet to your house determines what laws and processes the government can use to monitor your Internet traffic?

That I did know, only because I was dumb enough to hitch my wagon to Comcast/Xfinity as a headend tech for years. Just affirmed the idea that all ISPs should be community owned.

Besides legal, I think it's important to realize that there is a very emotional response to discovering that your password is not good.

I know a company that started doing quarterly brute-forcing of passwords as a security check and the reaction to finding out that your password is not strong enough is....all sorts of emotions.

If you have a 10-12 character password that may have been strong at one point but now is not and your IT team is informing you, you're reaction is NEVER, oh thank you for helping me out. It's not stupidity, it's human nature to feel attacked.

When a 12 character gets bruteforced, my initial reaction is to blame the system for allowing so many password attempts!

Like imagine how many failed attempts must've happened for a 12 character password to get bruteforced. Alarms should have been raised way before it became an issue.

what if it was a crappy 12 character password like 123456789012 and got bruteforced in 2 tries?

also, at one point it was popular to use l33t speak for passwords so there are many crappy 12+ char l33t passwords floating around that are trivial to guess, no brute forcing required.

As part of fixing security problems 20+ years ago we put together a migration process that included cracking passwords. First off we created an interface for updating your password and that interface essentially ran through all the tests that the cracking software to better ensure you'd picked something good. Passwords were expired every 90 days (remember, this was 2001. The migration first set the expiration date so that people got used to the process and then, on occasion, we'd run the passwords through a brute force attack. To your point, the users were most unhappy when their password would get cracked and expired, but that's life. 2FA, keys, etc.. is really an improvement over what we've had for such a long time.

Why are good deeds punished so much by authorities?

This is a good way to disincentivize prosocial behavior.

> Why are good deeds punished so much by authorities?

The problem can be "who defines good deeds?" There are so many things which seem good when presented one way, but can be harmful when viewed another way. Obviously, as presented above this seems like "an obvious good", but context matters, snd clearly you don't get the whole context from a one paragraph summary.

Ultimately we have civil structures (government at every level) that tries to codify "good" and "bad". Life is seldom that clean though, so inevitably every regulation and law is good for some bad for others.

So, to answer your question, because "good" and "prosocial" are not universally true.

Accessing password hashes already in a DB is not the same as preventing, during account creation, the reuse of a password known to be compromised.

If I'm not mistaken it's all done using cryptographic schemes that leak neither the password nor the hash.

This is true. The story as written probably didn't happen with HIBP's database. Troy Hunt's database only includes SHA-1 hashes, and passwords in your own database will be hashed with a stronger algorithm (hopefully) and salted (hopefully), so you can't do a simple hash-to-hash comparison. The way to do a HIBP check is, when a user signs in, you hash their password in the way HIBP expects, and check that against either their API or against a local copy of HIBP's database, and if a hit is returned, you give them a nice message and direct them to the password reset flow. There's no easy way to use HIBP's data to identify users with compromised passwords until users actually try to log in.

Would it matter which hash function was used to create the password database.

But there's more than just the issue of discovering the passowrd itself.

What about the issue of discovering that a particular password hash comes from an employee at a certain company.

As I understand it, Tory Hunt downloads dumps of stolen passwords. He does not share the dumps. Instead he collects queries, like a search engine. Until people start sending him queries of hashes to check he does not necessarily know the locations of the people whose passwords were stolen.

However if he gets a series of hashes sent from some IP address belonging to a perticular corporation, then argubaly he now knows these are likely to be passwords belonging to employees at that corporation.

The API doesn't require the full hash, just a short prefix. They don't have enough information for your scenario to work.

https://www.troyhunt.com/understanding-have-i-been-pwneds-us...

Hmm. Interesting. Shitty outcome if true, but AD/Azure AD has an extension (3rd party if I recall) that automatically checks for breached passwords and lets the user know and forces them to change their password.

I certainly believe it a user was upset by it. We've gotten support tickets before from users accusing of of "snooping on their local machine" to find passwords... Like no, it was just in a breach, relax.

They're often now upset they've been called to task so it's just hard all around.

I'm pretty sure the password manager in Safari also checks this db, as I've been warned that some passwords have been discovered in breaches (even going back to the linked in breach).

I‘m flabbergasted how broken the system is.

It sounds like it was made up, should not be so hard to find the verdict.

Sorry, I don't understand the procedure. If the database contains hashed passwords (I haven't seen or download the database), how can you know you're using the same salt and method that the one in the datbase?

For example, let's say Tumblr was hacked and with it my password `hunter2`. Tumbler used some naive HMAC-MD5 method with a salt, but my site uses argon2 with (obviously) a different salt. Even though my password is the same (`hunter2`) the resulting hashed passwords will be different. How is this any effective preventing credential stuffing?

One can only implement a HIBP check when one has access to the user's unhashed password. So, at login, registration, and password reset.

Yes, exactly, so that's why I was asking, you mentioned the database was of hashed passwords. The database then contains the source passwords? And you're preventing the user from using one of those passwords?

Sorry, I still don't understand the procedure you mentioned and I'm genuinely curious.

Oh, I see the issue. The HIBP database is SHA-1 hashed with no salt. It was created from unhashed passwords. You can't download the unhashed version (you could of course compute it, if you really wanted to; but there's no need).

So, the procedure you need to implement is, on login/registration/pw reset, you SHA-1 hash the user's unhashed password and do a indexed lookup on your copy of HIBP's database. Or if you don't want to maintain that copy, you can use HIBP's API to do something similar.

Ah! Thanks a lot, it now makes sense. So at some point HIBP has the unhashed passwords, they obviously don’t make those public, good trick. How do you handle this from a UX perspective? Just tell the user that password is “not strong enough”?

Password managers that have HIBP integration are open about it - one says "this password appears in a list of compromised passwords"

The HIBP database only stores hashes of leaked passwords, but the source material is often (always?) plaintext passwords. If the hash of a password is in the HIBP database, the plaintext password is out there somewhere in a database of a malicious actor.

If the hash of a password is in the HIBP database, the plaintext password is out there somewhere in a database of a malicious actor.

My understanding this isn't true. These leaks are often just the password hashes.

There are some leaks where passwords are cracked and included in plaintext and there are some leaks where passwords are not cracked and included only as hashes. If the leak includes cracked passwords in plaintext then they will be added to HIBP and can be checked, otherwise they are not included and cannot be checked.

... and then realize with a horrible feeling that some % of those hits are getting through the login page.

The alternative is the exact same scenario, except that the percentage is several orders of magnitude lower, right?

The small subset of your users that explicitly opted-out of 2-factor authentication (if you allow that) and who try to choose "Password1!" with a second exclamation point when your site said "Error, your password has seen 83,000 times in password dumps, please use a unique password" will still get hacked.

Or is your expectation that no one will attack every user on your webapp with a credential stuffing attempt if they see that the probability of success is 0.001% instead of 1%?

Wait, a thousand fold decrease is not worth it?

Your numbers literally turns a scenario where 200,000 accounts are hacked into one where 200 are exposed. Or one where 30 hacked accounts turn into 0 hacked accounts.

There is a point where a difference in quantity becomes a difference in quality. I far prefer the latter scenarios.

Anybody (like GP) that doesn’t understand that this is entirely the nature of security work, should not be making any material decisions about security.

The number of times I’ve seen DEVELOPERS neglect to implement materially useful security measures because “they’re not technically perfect!” Is astounding.

The number of times I’ve seen purported security practitioners dismiss materially useful security measures because of some theoretical attack that nobody has ever seen in the wild in recorded history outside of stunt-hacking at Defcon is…probably higher

The bad feeling comes from knowing you could have reasonably done something to mitigate the harm. Don't let perfect be the enemy of good.

Remember that "identity theft" is marketing fluff. In a credential stuffing attack your business is the victim of fraud.

Yes, same scenario, but far fewer logins are successful. 3 orders of magnitude sounds right, but I don't know precise numbers. (Can others shed light?) Three orders of magnitude is a lot!

Besides 2FA, rate limiting your login endpoint (both by IP address and username) is a much more robust protection against this attack. Especially if you include temporary bans (e.g. “20 failed login attempts with the same IP, and/or same username, in the past minute = 15 minute ban for that IP and/or username”). A lot of API gateways, K8s ingresses, etc. make this dead simple, and if not it’s also super easy to add with a few lines of code and something like Redis to store counts of recent login attempts.

I do think checking against the HIBP DB is a good call too, but it doesn’t stop this attack overly well, rate limiting is a much better way to stop it.

Rate limiting definitely helps against credential stuffing in the form of trying a bunch of common passwords against random accounts.

But there's also "stuffing" with known breached username+password combinations – in which case it still helps, but I don't think as much? In the latter the attack is much more likely to succeed and there's a much smaller number of values being attempted, so the threshold of detection + blocking would have to be much lower...

The threshold is lower but in reality it still makes considerably more login attempts, many of them failed, than a normal client ever would. Credential stuffing attacks don't really limit themselves to a single account, even if it worked.

If you're a young CTO or early-stage engineer working on a web app

If you're working on a greenfield login/auth, please don't accept and store passwords in a database! Setup social OAuth, SSO, or magic link emails and make it someone else's problem.

If you do go down this route though, be sure to read up on what you're deploying, and understand what your libraries are doing (and more importantly, not doing).

You don't want to end up with a naive implementation of OAuth2 (like some big names had recently) which fails to check the audience parameter, and therefore lets anyone other service using the same SSO gain access to your users' accounts.

Recent HN post on this - https://news.ycombinator.com/item?id=38009291

I agree, and thanks for pointing that out, but between the two security failures, I'd rather have an incorrect OAuth2 implementation, which can be quickly fixed with no impact on existing customers, than credential stuffing, where I need to email customers apologizing for why I needed to reset their passwords.

... and then realize with a horrible feeling that some % of those hits are getting through the login page.

(Non sarcastic), why would you feel bad for users using 1234 as their passwords? Unless your website is aimed at vulnerable people, I consider this to be their responsibility.

As other comments have said these users will probably go the easiest route (1234websitename) to fix the error.

Any restriction you put on your password field reduces entropy, and safety for everyone (even if marginally so).

Because anyone that has ever been responsible for anything knows that there’s a difference between something being your fault and something being your problem.

Breach notification etc legislation in some jurisdictions will also require that you report successful widespread credential stuffing.

Even AWS with their “shared responsibility model” works with GitHub etc to ensure that programmatic access credentials aren’t accidentally exposed via public repositories. This isn’t credential stuffing, but it’s a blindingly accurate demonstration of the fact that drawing a line in the sand and saying “users, work it out from here!” and attempting to wash your hands of the situation is nothing more than the ill-informed pipe dream of someone that’s never had to deal with this stuff in reality.

Have you ever operated an online business? Poor password choice is practically harmful to business. Marginal reduction of entropy by blocking breached passwords, what's the practical harm from that?

1234websitename is objectively better than 1234.

I'll go with NIST on this one (yes, and have a minimum length too):

When processing requests to establish and change memorized secrets, verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised... If the chosen secret is found in the list, the CSP or verifier SHALL advise the subscriber that they need to select a different secret, SHALL provide the reason for rejection, and SHALL require the subscriber to choose a different value.

https://pages.nist.gov/800-63-3/sp800-63b.html#memsecret

I have memories of this site providing me with an excellent experience. Now it's just a cash-grab, asking for $169.50/year just to see 100 breached accounts!

I use unique email addresses (breach canaries) on every website to detect when sites leak my data. When I tried to search for my domain results with a previous domain ownership verification, I got hit with this error: "In order to search a domain with any more than 10 breached accounts on it, you need a sufficiently sized subscription"

To make matters worse, Troy includes public data compilations as 'breaches' which artificially inflates counts for the breached accounts quota. For example, when a compilation of public contact details scraped from GitHub leaked, Troy counted that as a breach. I explicitly listed my email address as public.

I'd be willing to pay $5-12/year. These rates are outrageous for such a low-overhead service.

You’re doing a weird thing (running your own email domain), doing an even weirder thing (using a different address per site), and then doing an even more weirder thing (scanning your personal domain for breaches) and your supposition is that very specific use case is a cash cow for Troy Hunt? Come on.

doing an even weirder thing (using a different address per site)

It should actually be considered best practice.

It is not.

Not yet. Best practices can change.

Best practices have two attributes, both of which represent consensus among practitioners in the field: that it produces an optimal outcome, and that it can reasonably be broadly adopted.

Email canaries clearly flunk the second test.

What is preventing you from from adoption?

It seems like Apple iCloud Private Email Relay may be Apple's own stealth way to introduce similar functionality, albeit with the useful spam/leakage data only available to Apple in this case.

I said cash-grab, not cash cow. A cash grab is something that has an unreasonably high profit margin. A cash cow is something that provides a significant portion of an entity's income.

I have no idea if HIBP is a cash cow for Troy. It may be, given these prices, but I don't know much about his other sources of income.

HIBP didn't start out as a cash-grab, but it is one now. Troy could have chosen to price it reasonably to cover the costs of the service. This pricing is clearly taking advantage of HIBP's popularity as the de-facto breach list site.

I can't tell whether this is a joke or not.

I would bet you that over 99.99 percent of HIBP's users do not pay for the service. Troy's time has value, so working on a service that provides no income is not really something you can expect a person to do. Troy decided to create an enterprise subscription service to get a bit of revenue from something he's created. It's not cheap, but it's not something you're meant to buy unless you're a company looking to monitor your employee email addresses. This service is pretty cheap in that regard, actually.

I really do not understand why you feel that you're being ripped off here. This is just a lack of product-user fit, his pricing structure simply doesn't work for you because you use email canaries. But for a company with 100 people, this pricing is entirely reasonable, if not something incredibly cheap.

What you're paying for is everyone who doesn't pay for the service, the time he takes to add new breaches to the service and the time he takes to develop the service.

Just because their pricing model doesn't fit you doesn't mean it's a cash grab. Is this too hard to understand?

EDIT: Also, I assumed this was a common understanding, but product pricing is based on the value it gives to the person buying. For a company of 100 people, do you think paying $160/yr is worth breach monitoring? I think for any IT department, this would be a no-brainer.

The cost of this service doesn't scale with number of 'breached accounts per domain'. Ideally, Troy should charge per domain and only choose a modest profit margin.

I do the exact same thing. Every site, service, and contact gets a personal something@mydomain email address to reach me.

Somewhere in the distance, the Count from Sesame Street says "TWO... ah ah ah..."

I do the exact same thing minus the breach scanning. It’s not uncommon.

Encountered the same. My hope is that there's a pricing scheme for people like us; may be worth reaching out.

I’m in the same boat — not a company, just an individual doing the separate-email-per-site thing.

(UPDATE): I’ve posted a suggestion to the UserVoice community, which it appears Troy actively monitors.

If the several (dozens?) of us with this use case upvote it, it may catch his attention.

https://haveibeenpwned.uservoice.com/forums/275398-general/s...

Troy responded: https://haveibeenpwned.uservoice.com/forums/275398-general/s...

Basically he suggested doing a monthly subscription for just one month periodically as a way to reduce the cost.

Another option is to read the notification email, and if it’s for Acme Corp, to remember that the associated email must be acme.com@mydomain.com, and then manually check that.

You can download the DB from the DarkNet and run it locally so you don’t have to pay. The only downside is, you have to manage this db yourself and frequently update it. But it is similar. I have seen lot of these (cash-grab) services pop up offering API DB access for a cost.

this is the way. you never want to alert someone to the fact your address is possibly vulnerable in the first place

another example of "everything as a service". Bullshit freemium model where a service starts out good, and then once they start getting usage: put the functionality behind paywall and degrade the free service to the point of being useless. Facebook, as bad as it is, has not succumbed to this: 'free Facebook' is still as functional as it was in 2010, but more tracking obviously.

but more tracking obviously

That's a price of its own.

I do the same thing. I wondered why I hadn't received any breach notifications for a long time, I don't remember seeing a notification of this change.

For those privacy focused, here's how you can remove your information from their public search ability.

https://haveibeenpwned.com/OptOut

Seems slightly useless to opt-out of HIBP when your email is still in Collection #1 or Anti Public Combo List or whatever.

Anyone wanting to do something nefarious with the emails will surely download the full source lists rather than try and scrape the aggregation site.

he's become a very tempting target in himself that gets more tempting with each additional database added

I wonder if he actually deletes the data...

This is a call for service providers in these dumps to move to Passkeys faster, not for the data to be redacted or censored. You want to decay the value of the credentials as rapidly as possible once exposure has been determined. This aligns with NIST guidance around secrets rotation.

Once a breach is determined, all of these passwords should be invalidated immediately and require a password reset if you're so behind you're not offering Passkeys or SSO. Rate limiting will slow credential spraying attacks, but the only way to eliminate them is to use SSO ("Login with") or Passkeys. You are negligent as a provider if you are not invalidating leaked credentials in a timely manner.

https://pages.nist.gov/800-63-FAQ/#q-b05

“Verifiers SHOULD NOT require memorized secrets to be changed arbitrarily (e.g., periodically). However, verifiers SHALL force a change if there is evidence of compromise of the authenticator.”

Users tend to choose weaker memorized secrets when they know that they will have to change them in the near future. When those changes do occur, they often select a secret that is similar to their old memorized secret by applying a set of common transformations such as increasing a number in the password. This practice provides a false sense of security if any of the previous secrets has been compromised since attackers can apply these same common transformations. But if there is evidence that the memorized secret has been compromised, such as by a breach of the verifier’s hashed password database or observed fraudulent activity, subscribers should be required to change their memorized secrets. However, this event-based change should occur rarely, so that they are less motivated to choose a weak secret with the knowledge that it will only be used for a limited period of time.

This is a call for service providers in these dumps to move to Passkeys faster

Does it really matter? I think all of my accounts use 20char autogenerated passwords from google that are unique for each account. So if one is breached, it’s just breached. Seems to have the same protection as a passkey.

You are the outlier. This is not the norm. Passkeys do this for the broad public, with the keys backed up to ecosystem cloud storage and defended by strong security systems at Apple and Google. Lets not argue passkey sovereignty in this thread, there are efforts ongoing to make them exportable so you can manage them in password managers. I agree it is a valid concern to prevent ecosystems holding users hostage.

Long strings in password managers was a shim until Passkeys got here, because passwords suck. This is a well worn path in enterprise with PKI. Passkeys are PKI for the Average Joe. Folks here will always have esoteric auth use cases, but you design for the average on this topic (consumer auth).

https://passkeys.directory/

https://passkeys.2fa.directory/us/

https://bitwarden.com/blog/a-closer-look-at-password-statist...

19% of respondents said they used “password” as their password (!!!)

52% use easily identifiable information in their passwords, such as company/brand names, well-known song lyrics, pet names, and names of loved ones

Best practices are still diluted by bad habits, with 85% reusing passwords across multiple sites and 58% relying on memory for their passwords

A majority (68%) of respondents manage passwords for 10+ sites or apps and yet 84% of respondents reuse passwords

More than half of respondents forget and reset their passwords on a regular basis

Around a quarter (20%) were affected by breaches and a majority (80%) were prompted to reset their passwords

Over half (56%) are excited about passwordless options, and 50% are using or would use ‘something you are’ forms of passwordless authentication

People using passkeys are the outlier too.

Google (top site for internet traffic) is defaulting to Passkeys: https://blog.google/technology/safety-security/passkeys-defa... (Gmail has over 1.8 billion active users as of 2023; 22.22% of the world's population uses Google's mail service, so this is material for Passkey uptake)

Amazon: https://www.aboutamazon.com/news/retail/amazon-passwordless-...

Uber: https://help.uber.com/riders/article/using-passkeys-to-sign-...

Ebay: https://www.ebay.com/help/account/signing-account/signing-ac...

Github: https://github.blog/2023-09-21-passkeys-are-generally-availa...

Link by Stripe: https://app.link.com/

Docusign: https://www.docusign.com/blog/docusign-customers-can-upgrade...

Tiktok: https://newsroom.tiktok.com/en-us/passkeys-fido-alliance (TikTok has over 1.677 billion users globally, out of which 1.1 billion are its monthly active users)

Google's Titan key now supports Passkeys if you need a secure hardware authenticator: https://www.wired.com/story/google-titan-security-key-passke... | https://store.google.com/us/product/titan_security_key?hl=en...

These stats are supports passkeys, not how many users use them.

I like passkeys, they’re nice.

But I think you want to compare passkey users to complex password users.

I know lots of “normies” and they all just accept whatever their iPhone does. Which is creates a high entropy unique password for each site.

As someone who is not at all excited about passkeys, I think they are just moving the average user into an existing enterprise. The enterprise being whatever Big Tech Company you trust the most. Then you gotta pass through one of the "trustworthy" tech companies to access anything, which is simultaneously great and also a huge ask as most of them are data vacuums.

As someone who has to defend against credential spraying in a consumer IAM system at a fintech (which leads to financial and identity fraud), I am very excited about Passkeys. Perspectives will be driven by incentives and desired outcomes. I have the Cloudflare dashboard for our properties live and keep an eye on threat actors in realtime, as well as our identity provider dashboard around realtime Passkey uptake (at which point passwords are invalidated and unable to be downgraded back to). Providing a government credential can be used to bootstrap account recovery if all passkeys are lost.

If you have concerns about Big Tech treating Passkeys in an anti competitive fashion, I would strongly encourage you to file a complaint with the FTC when that evidence is observed (as I mention in another comment here [1]). We need these primitives to deliver a better digital experience but also need to defend against fuckery using legal and regulatory mechanisms.

[1] https://news.ycombinator.com/item?id=38502886

absolutely, password managers have deprecated have i been pwned. It probably does more harm than good now.

Wouldn’t it be hashed?

You have to solve a Google ReCaptcha though, which privacy-focused folks won’t like.

Also, just FYI, “The controller of the domain your email address is on will still see you in domain searches.”

To merely use the service, you have to solve a seemingly infinite cycle of hcaptchas from cloudflare, this recaptcha is much more decent to users from 3rd world countries!

Do they have a domain-level opt-out?

Not to mention all the other weird variations including haveibeenburned.com, haveigotpwned.com, haveibeenrekt.com and after someone made the suggestion following the revelation that PornHub follows me, haveibeenfucked.com

That is honestly pretty hilarious of a side effect of media fame!

haveibeenfucked.com

Years ago we had friends, a couple in which the wife was pregnant. They were actually a bit embarrassed that “everyone will know that we ‘did it’”. A level of squeamishishness I could not have imagined!

Haha, yes, for the same reason it used to be rude to ask "when are you expecting?", especially if newly married

"Still practicing" is as good an answer as any

(too late to edit...) P.S. Not in the sense of "when will you have a baby", in the sense of "was the baby conceived before the marriage"

That reminds me of my early twenties being embarrassed when my wife told her parents we are trying to get pregnant.

When people would ask us, “when will you have a second baby?” we started to answer, “oh, we figured out what causes it!” which usually shut down that line of questioning.

They were actually a bit embarrassed that “everyone will know that we ‘did it’”.

Most people would be more embarrassed if the wife is clearly pregnant and nobody thinks they “did it”.

I remember reading about something like that in a book, think they were called Mary & Joseph.

The alternative sounds worse, no?

I've always loved pornhub's blog: https://www.pornhub.com/insights/

It just proves that rules of the internet work. If it exists, there's a porn version of it.

That last one would be an interesting repository of revenge porn, although it's illegal to distribute it in a lot of jurisdictions.

Although, maybe it could be a database of facial recognition hashes from revenge porn, and you can upload a similar hash of your own face to see if you're online somewhere?

he shouldn't have mentioned goatse, or told me not to google it. my curious brain took me to a rabbit hole where several times i wished i didnt have eyes.

what's the learning curve of a GenZer? When boomer says don't look this up on the internet, it's not because they want the information for themselves. It's because they already have something burned into the memories, and are hoping to save you from the same.

It's human nature to be curious and nothing to be ashamed of. And if someone really doesn't want you looking into something, they won't even mention it.

Which is why the story of the Apple is so bizarre.

Doesn't look good does it?

Let me be the one to tell you not to look up tubgirl or meatspin. The turn-of-the millennium internet was a dirty place.

make lemon parties great again

too true and for some reason there was a rule like the worse the content the faster it would download/torrent/kazaa/limewire/etc lol

When you know how old someone is by the fact they don't know any of those things.

You get kind of immune to it eventually. At this point I don't even get fazed by goatse, if anything I'm more impressed by just how far that guy managed to stretch.

i never seen it but now inferred enough to know I don't want to

...and THAT'S how easy to scam someone.

Oh you sweet summer child.

In the past decade, I wonder how many stalker victims have discovered HaveIBeenPwned as the easy directing tool that their abuser used to discover and invade their accounts and privacy..

Yes, yes, I know, the site maintains that it's the victim's responsibility, prior to any bad actor taking advantage of the service, to sign up and then disable their information from showing up in the search.

Because the shock and awe of other users seeing 'Your info is out there!' immediately after entering their address, instead of after some kind of email verification, is more important than user safety.

E-mail verification would make the service much more costly to run, maybe not sustainable.

If you’ve got someone’s email address, do you really need HIBP to figure out where it’s used? Wouldn’t Google give you a lot of results already?

Yes, you enter an email and you get a list of the hacks it's been found in, along with the type of data that's in the data breach.

At that point you just go grab the data breach from some other location and pull the data. Including passwords, which will often be reused across other services and give you access to other accounts. The breach data (or subsequent accounts accessed from the passwords) may have a lot of additional data, like location, personal preferences, message contents, etc..

It's _much_ more impactful than simply Googling someone's email address.

“Just go grab the data breach from some other location”

Do you have examples you can point to where this has happened? I would be interested in cases where this kind of approach has been taken, as my intuition is that Troy’s service wouldn’t make this significantly easier or cheaper.

It would definitely make this significantly easier and cheaper.

Imagine you want data on a specific individual. You can locate and obtain ALL available data breaches, then search through each one individually looking for references to your target. OR, you can enter their email into a search and get back a list of which SPECIFIC data breaches have information on your target AND the types of information included in the breach. This narrows the scope of effort by orders of magnitude.

How does that not make it significantly easier and cheaper?

Can you elaborate how could HIBP be used to abuse? It just checks your email is in a leak.

It points anyone who has an email address to specific leaks that have that email address. These leaks contain additional data beyond the email address, like passwords, location data, message contents, etc.

This enables bad actors to quickly and easily figure out which leaks they need to go look at to find further data related to the email address they searched on.

It'd be a great service if it hadn't spent the last decade deliberately ignoring a fundamental principle of security: Assume there will be bad actors.

It takes a bit of cognitive dissonance to leave the problem unaddressed for so long. eg, if you assume there are no bad actors, there's no need for the service to exist. And if you assume bad actors DO exist, the service needs to protect its core functionality better against being an engine for abuse.

I agree. It's kind of worse because it gives attackers 90% of the information they need, but it doesn't tell the victims what data was leaked. I want to see the hash so I can figure out which password was compromised.

It is worth noting the user awareness impact HaveIBeenPwned has had.

On the other hand, I feel like SpyCloud does not get enough credit for having a dataset 30x bigger and working directly with companies to actually mitigate credential reuse. If you've ever been prompted at login to a major website or received an email asking you to reset a password because it was used in multiple places, there is a good chance SpyCloud is behind it.

On the other hand, I feel like SpyCloud does not get enough credit for having a dataset 30x bigger and working directly with companies to actually mitigate credential reuse

Not sure if it is your intent, but this implies that HIBP does not work directly with companies to mitigate credential re-use. It does. Some examples would be their partnership with 1Password and other password managers, Firefox, their partnership with the FBI, UK & Australian governments, etc.

It would be like comparing HN to Facebook, they are both technically operating social media sites but I don't think anyone would argue they are doing so at the same scale. HIBP ultimately is a hobby project that was really good at bringing the publics attention to the problem and booting Troy's social profile.

How many governments, password managers, web browsers, and law enforcement agencies does one need to be partnered with to transcend "hobby project" territory?

Hrm, a free service that helps millions of people vs a company that requires a subscription for any use. I don't think you're comparing apples to apples, and the latter is not useful at all for most people.

I'd bet that SpyCloud has less customers than the number of people who have integrated Troy's free pwned password data to actually mitigate password reuse.

Blackmail scammers have been using pwned password databases to craft some pretty convincing phishing emails ("I have installed RAT on your system and have been watching you through your webcam, proof I hacked you: <old password from pwned passwords> -- send $1800 of BTC to this address and don't go to the police. Maybe use a password manager next time."). Do people get caught in these scams?

I assume most get blocked by spam filters. I've only noticed them when they get past SPI/DKIM filtering and I have to train more. They seem pretty clever.

I appreciate the service and enjoy Troy Hunt's posts. HIBP is great.

A family member called me in distress asking if the email is legic. It had his full postal address and last 4 digits of a credit card (not an old password). People do believe it.

Caught, I can't say. I have heard of users becoming alarmed and running to IT for reassurance.

I got one or two of these emails, and couldn't for the life of me guess on what sites I had used the (pretty weak) passwords.

I mean, I once got an email with my password in plain text and it was pretty disturbing. I did a quick search and realized that that password wasn’t used in anything I cared about so, I just went about my day, still disturbed but knowing that it’s an old password.

I can’t imagine how it would feel if I didn’t use a password manager and couldn’t quickly see where was that password used instead of wreaking my brain trying to remember!

Anybody been in more than 20 data breeches?

All my stuff's been locked down with a password manager/2fa so I'm not worried, but having been on the internet for ages it's pretty funny at this point.

My childhood-turned-throwaway email has been involved in over 30 breaches according to the site. A lot of them are forum related.

One of my accounts is apparently in 32 breaches

I use a separate email address with every site, so I can be confident that each the below is a wholly independent breach. I've had email addresses (and sometimes other data including name, password or password hash, etc) breached from:

    Adobe.com
    Bit.ly
    Bytargentina.com
    Cafepress.com
    Chegg.com
    Contentful.com (via Apollo)
    Dailymotion.com
    Disqus.com
    Dropbox.com
    Edmodo.com
    Facebook.com (via Zynga)
    Gawker.com
    Invisionapp.com (via Apollo)
    Kickstarter.com 
    Last.fm
    Linkedin.com
    Linux-mag.com (via QuinStreet)
    LiveAuctioneers.com
    Monster.com (via Apollo)
    Myfitnesspal.com
    Parkmobile.com
    Streeteasy.com
    Teespring.com
    Ticketfly.com
    Ticketmaster.com (via Ticketfly)
    Tumblr.com
    Xbmc.com (via Kodi)

I really like his informative posts. I remember reading about how he used k-anonymity to check passwords against the pwned file without having to transmit the passwords and it led to me studying that and later using it for some professional projects.

I sometimes think what I would have done had I never read his posts about checking without transmitting real PII.

The k-anonymity is such a clever trick, I remember being impressed by the simplicity and efficacy of it back when I read about it also.

I'd also like to call out the one who Troy says suggested him [1], Junade Ali who goes into more details about this in his post about it [2]

Not because Junade would have invented it (apparently that was Pierangela Samarati, Latanya Sweeney and Tore Dalenius. [3]) but because his blog post on it is a really great explainer of it using concepts software developers are familiar with.

[1] https://www.troyhunt.com/ive-just-launched-pwned-passwords-v... [2] https://blog.cloudflare.com/validating-leaked-passwords-with... [3] https://en.wikipedia.org/wiki/K-anonymity

Reminds me I really need to do something about those checks notes 99 compromised accounts in my password manager.

Has he written much about the specifics of the divorce? I assume it was about 50% split of assets and figuring out who owns what that made it lengthy and expensive?

I get on my jet ski and I do whatever the fuck I want

Is Troy Hunt a Mobius variant?

And this isn't complete and utter bullshit?

My e-mail address (after requiring JS and cookies) was allegedly found in an collection of avatars ...

I'm sure it's now in a collection I rather not have it in.

No matter what anyone says, I am not going to type my password into a site called “Pwned Passwords” lol