The fundamental issue is that links without any form of access control are presumed private, simply because there is no public index of the available identifiers.
Just last month, a story with a premise of discovering AWS account ids via buckets[0] did quite well on HN. The consensus established in the comments is that if you are relying on your account identifier being private as some form of security by obscurity, you are doing it wrong. The same concept applies here. This isn’t a novel security issue, this is just another method of dorking.
The problem is links leak.
In theory a 256 hex-character link (so 1024 bits) is near infinitely more secure than a 32 character username and 32 character password, as to guess it
https://site.com/[256chars]
As there's 2^1024 combinations. You'd never brute force it
vs
https://site,com/[32chars] with a password of [32chars]
As there's 2^256 combinations. Again you can't brute force it, but it's more likely than the 2^1024 combinations.
Imagine it's
https://site,com/[32chars][32chars] instead.
But while guessing the former is harder than the latter, URLs leak a lot, far more than passwords.
Dorking is the technique of using public search engine indexes to uncover information that is presumed to be private. It has been used to uncover webcams, credit card numbers, confidential documents, and even spies.
The problem is the website administers who are encoding authentication tokens into URL state, not the naive crawlers that find them.
I wonder if there would be a way to tag such URLs in a machine-recognizable, but not text-searchable way. (E.g. take every fifth byte in the URL from after the authority part, and have those bytes be a particular form of hash of the remaining bytes.) Meaning that crawlers and tools in TFA would have a standardized way to recognize when a URL is meant to be private, and thus could filter them out from public searches. Of course, being recognizable in that way may add new risks.
We already have a solution to this. It’s called not including authentication information within URLs
Even if search engines knew to include it, would every insecure place a user put a link know it? Bad actors with their own indexes certainly wouldn’t care
How do you implement password-reset links otherwise? I mean, those should be short-lived, but still.
You could send the user a code that they must copy paste onto the page rather than sending them a link.
Hopefully using POST not GET. The GET links get logged in the HTTP server most of time. Just another great way to store your 'security credential' in plain text. Logs gets zipped and archive. Good luck with any security measure.
I mean of course the idea was to put it in a form that is sent using POST, but even then, it's a single-use reset code so once it shows in the log it's worthless.
This makes a large assumption about application logic that is often incorrect.
t. security auditor/researcher.
It certainly does. Security usually comes at the cost of convenience and can incur confusion.
In this example, where best practice may be to use one time tokens, you will end up with users who click on the secure link again (from their email) in the future to access the secure site and they’ll be frustrated when they have to go through the secure link generation dance again.
Of course you can mitigate this with sessions / cookies, but that is also a security compromise and not device portable.
It’s easy to say that these are minor uxp concerns, but enforcing a high level of security may have a significant user cost depending on your demographic. I have a demographic that skews older and non technical and they are pretty loud when they complain about this stuff… meanwhile they are also more likely to reuse passwords and forward emails with secure links in them!
Some people will always find something to complain about. I feel like it’s completely reasonable to give a “sorry this link was only valid for 5 minutes and is now expired, request a new code here” message. State it in the email that originally contained the link and state it again on the page when they click it afterwards. This is incredibly common practice and very unlikely to be the first time someone has seen this workflow. If they want to complain further, direct them to a password manager and remind them there’s probably one built into their browser already
No one reads this stuff. I’m not saying this to be argumentative. I have a large user base and I know from experience.
As you said, short lived codes. And the codes don’t contain any PII. So even if the link does get indexed, it’s meaningless and useless.
A short-lived link that's locked down to their user agent/IP would work as well.
Actually, there are cases where this is more or less unavoidable.
For example, if you want a web socket server that is accessible from a browser, you need authentication, and can't rely on cookies, the only option is to encode the Auth information in the URL (since browsers don't allow custom headers in the initial HTTP request for negotiating a web socket).
Authentication: Identify yourself
Authorization: Can you use this service.
Access Control/Tokenization: How long can this service be used for.
I swipe my badge on the card reader. The lock unlocks.
Should we leave a handy door stopper or 2x4 there, so you can just leave it propped open? Or should we have tokens that expire in a reasonable time frame.. say a block of ice (in our door metaphor) so it disappears at some point in future? Nonce tokens have been a well understood pattern for a long time...
Its not that these things are unavoidable its that security isnt first principal, or easy to embed due to issues of design.
And that are single-use.
(Your password reset "magic link" should expire quickly, but needs a long enough window to allow for slow mail transport. But once it's used the first time, it should be revoked so it cannot be used again even inside that timeout window.)
Put a timestamp in the token and sign it with a private key, so that the token expires after a defined time period.
If the URL is only valid for the next five minutes, the odds that the URL will leak and be exploited in that five minute window is very low
Also, it would allow bad actors to just opt out of malware scans - the main vector whereby these insecure URLs were leaked.
So there was an interesting vector a while back where some email firewalls would reliably click on any link sent to them that was abused by spammers.
Spammers would sign up for services that required a click on a link using blabla@domainusingsuchservice
The services bots to check phishing would reliably click on the link, rendering the account creation valid.
One particularly exploitable vendor for getting such links clicked was one that shares the name with a predatory fish that also has a song about it :)
SharkGate?
Why coy about naming them?
Barracuda. And for plausible deniability so they don’t have as much of a chance of catching a libel suit. Not sure how necessary or effective that is, but I do understand the motivation.
We already have robots.txt in theory.
I didn’t think robots.txt would be applicable to URLs being copied around, but actually it might be, good point. Though again, collecting that robots.txt information could make it easier to search for such URLs.
Yeah - that's just red-flagging "interesting" urls to people running greyhat and blackhat crawlers.
It can be OK to put authentication tokens in urls, but those tokens need to (at a bare minimum) have short expirations.
When would this ever be necessary? URL session tokens have been a bad idea ever since they first appeared.
The only things even near to auth tokens I can reasonably see stuffed into a URL are password reset and email confirmation tokens sent to email for one time short expiration use.
Outside of that, I don't see any reason for it.
"presigned" URLs[1] are a pretty standard and recommended way of providing users access to upload/download content to Amazon S3 buckets without needing other forms of authentication like IAM credential pair, or STS token, etc
Web Applications do utilize this pattern very frequently
But as noted i previous comment these do have short expiry times (configurable) so that there is no permanent or long-term risk on the lines of the OP article
[1]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-...
You are right about short expiry times but another catch here is that if pre-signed URLs are being leaked in an automated fashion, these services also keep the downloaded content from these URLs around. I found various such examples where links no longer work, but PDFs downloaded from pre-signed URLs were still stored by scanning services.
From https://urlscan.io/blog/2022/07/11/urlscan-pro-product-updat...
Indeed, the only valid operation with the magic URL is exchanging the URL-based token with something else (your PDF, a session token, etc.) and then expiring the URL, so by the time the scanner gets around to it the original URL is invalid.
That seems ripe for race condition class problems.
Interesting. I haven't built on s3, and if I did my first instinct would probably have been to gate things through a website.
Thanks for sharing your knowledge in that area.
They're useful for images when you can't use cookies and want the client to easily be able to embed them.
"public search engine indexes"
Then it should be the search engine at fault.
If you leave your house unlocked is one thing.
If there is a company trying everyone's doors, then posting a sign in the yard "this house is unlocked", has to account for something.
A plain URL is an open door not a closed one. Most websites are public and expected to be public.
Isn't that the point of the post?
There are URL's that are out there 'as-if' public, but really should be private.
And some people argue they should be treated as private, even if it is just a plain URL and public.
You can't blame the search engine for indexing plain URLs. Listing a closed-but-unlocked door is a bad analogy.
Well. You also can't charge joe blow with a crime for browsing URL's, that happen to be private but accidentally made public.
Just by looking, you are guilty. That is wrong.
You've been appropriately downvoted for a terrible take.
Imagine if you left your house unlocked it would be broken into seconds later. Even worse, the people that broke into it live in a different country with no extradition law and you'd never figure out who they are anyway.
In this case your insurance company would tell you lock your damned doors and the police may even charge you under public nuisance laws.
Yeah, it is a terrible take. It's a bad situation.
Just like charging people for a crime for accessing private material, simply by browsing a public URL.
Maybe Better take:
It is like someone being charged for breaking and entering, simply by looking at a house from the street, when the door was left open. Your guilty by simply looking, and seeing inside. But you were just walking by, you saw inside before realizing it was a crime, now your guilty.
If you are going to charge people for accessing private sites, potentially by accident, by simply being provided a public URL from a search engine. Then shouldn't the search engine have some culpability?
Or. Better. Change the law so the onus is on the site to protect itself.
That isn't an inherent problem with having a secret in the url. The problem is the url was leaked somewhere where it could get indexed.
And sometimes it isn't practical to require a POST request or a cookie.
And the risk of a url leaking can be greatly mitigated if the url is only valid for a short period of time.
Technically you're right -- after all sending an authentication as a separate header doesn't make any difference.
or Sends the same data over the wire.However software treats URLs differently to headers. They sit in browser histories, server logs, get parsed by MITM firewalls, mined by browser extensions, etc
using https://user:pass@site.com/endpoint or https://auth:token@site.com/endpoint
Would be better than
https://site.com/endpoint/user/pass or https://site.com/endpoint/?auth=token
As the former is less likely to be stored, either on the client or on the server. I don't do front end (or backend authentication -- I just rely on x509 client certs or oidc and the web server passes the validated username)
For better or worse, basic auth in the URL isn't really an option any more, (e.g. see https://stackoverflow.com/a/57193064). I think the issue was that it reveals the secret to anyone who can see the URL bar, but the alternative we got still has that problem and also has the problem that the secret is no longer separable from the resource identifier.
You won't find a specific link, but at some point if you generate millions of urls the 1024 bits will start to return values pretty quick through bruteforce.
The one link won't be found quickly, but a bunch of links will. You just need to fetch all possibilities and you'll get data.
1024 bits seems a bit too much for the birthday problem to be a thing.
I looked at [1] to do the calculation but (2^1024)! is a number too large for any of my tools. If someone has a math shortcut to test this idea properly...
[1] https://en.wikipedia.org/wiki/Birthday_problem#Calculating_t...
This isn't the birthday problem. That would be the chance of two random links overlapping. The birthday problem scales with n^2, while trying to guess links scales with m * n, number of guesses multiplied by number of links.
(Well, before you apply the logistic taper to it. So you wanted an approximation? There you go. Until you get the chance of a hit to be quite high, it's basically equal to guesses * valid links / 2^1024.)
The chance is less than guessing a random 128 bit username and random 128 bit password. And then guessing a completely different username and password on the very next go.
You'd get far more return on investment breaking bitcoin wallets.
2^1024 is 10^308
Lets say there are 12 billion links per person, and 8 billion people. That's 100 billion billion, or 10^20 links.
10^20 / 10^308 is zero.
Lets say you can test 10 trillion links a second, and started when the big bang happened, you'll have tested 10^30 links so far.
The number of links you'll have found so far is zero.
Stirling’s approximation?
2^1024 ≈ 10^300. There's only ≈10^80 atoms in the whole known universe. And we haven't even done the factorial.
I'm not sure your math checks out. With 1024 bits of entropy and, say, 1 trillion valid links, your chances of any one link being valid are 1/2^984
So test a million links - your probability of finding a real one is (1-1/2^984)^1000000. That's around 1/10^291 chance of hitting a valid URL with a million tries. Even if you avoid ever checking the same URL twice it will still take you an impractical amount of time.
All this is fine and dandy until your link shows up in a log at /logs.
The same can almost as easily happen with user-submitted passwords.
Passwords usually don't show up in server logs if submitted correctly.
We call 128 bit random data “universally” unique ids. 1024 bits won’t ever get close to returning any random hits.
Not even close. 1024 bits is a really, really big address space.
For the sake of argument and round numbers, let's say that there are 4.2 billion (2^32) valid URLs. That means that one out of every 2^992 randomly generated URLs is valid. Even if you guessed billions of URLs every second, the expected time to come up with a valid one (~2^960 seconds) is still many orders of magnitude greater than the age of the universe (~2^59 seconds).
Passwords are always private. Links are only sometimes private.
Well-chosen passwords stored properly are always private. Passwords also tend to have much longer lifetimes than links.
Yup. There’s a reason putting credentials into url parameters is considered dangerous.
You can easily rate-limit an authentication attempt, to make brute-forcing account access practically impossible, even for a relatively insecure passwords.
How would you do that for the URLs? 5 requests to site.com/[256chars] which all 404 block your IP because you don't have a real link? I guess the security is relying on the fact that only a very a small percentage of the total possible links would be used? Though the likelihood of randomly guessing a link is the same as the % of addressable links used.
I don’t think you realize how exponentially large the possible combinations of 256 characters would be. In fact it doesn’t need to be anywhere near 256 characters. 64 hexadecimal characters would suffice.
Which alphabet did you take as a basis to reach 2^256 combinations?
Binary?
No. In theory they are both totally insecure.
Did you do that just to upset me?
There's probably details I'm missing, but I think the fundamental issue is that "private" messages between people are presumed private, but actually the platforms we use to send messages do read those messages and access links in them. (I mean messages in a very broad sense, including emails, DMs, pasted links in docs, etc.)
URL scanners are not scanning links contained within platforms that require access control. They haven't guessed your password, and to my knowledge no communications platform is feeding all links behind authentication into one of these public URL scanning databases. As the article acknowledged in the beginning, these links are either exposed as the result of deliberate user action, or misconfigured extensions (that, I might add, are suffering from this exact same misconception).
If the actual websites are configured to not use the URL as the authentication state, all this would be avoided
The suggestion (in both the article and the parent) is that the platforms themselves are submitting URLs. For example, if I send a link in Discord[0] DM, it might show the recipient a message like “warning: this link is malicious”. How does it know that? It submitted the url to one of these services without your explicit consent.
[0] Discord is a hypothetical example. I don’t know if they have this feature. But an increasing number of platforms do.
Where in the article does it suggest this? The two bullet points at the very top of TFA is what I cited to discredit this notion, I read it again and still haven't found anything suggesting the communication platforms are submitting this themselves.
Falcon Sandbox is explicitly mentioned - which is a middleware that can be installed on various communication platforms (usually enterprise): https://www.crowdstrike.com/products/threat-intelligence/fal...
Microsoft has "safe links": https://learn.microsoft.com/en-us/microsoft-365/security/off... - Chrome has its own thing, but there are also tons of additional hand-rolled similar features.
My main annoyance is when they kill a one-time use URL.
Do you know if safe links is guilty of the issue in the OP?
I suspect not because Microsoft is using their own internal system.
However, it likely exposes the content internally to Microsoft.
They do 100% break Salesforce password reset links, which is a major PITA.
I thought I read it in the article but I may have unconsciously extrapolated from and/or misread this part:
“I came across this wonderful analysis by Positive Security[0] who focused on urlscan.io and used canary tokens to detect potential automated sources (security tools scanning emails for potentially malicious [links])”
I don’t see any mention of messaging platforms generally. It only mentions email and does not suggest who might be operating the tooling (vendor or end users). So I seem to have miscredited that idea.
[0] https://positive.security/blog/urlscan-data-leaks
The article says "Misconfigured scanners". Many, many enterprise communication tools have such a scanner, and if your IT team is using the free plan of whatever url scan tool they signed up for, it's a good bet that these links may end up being public.
Bit of a tangent, but I was recently advised by a consultant that pushing private Nix closures to a publicly-accessible S3 bucket was fine since each NAR file has a giant hash in the name. I didn't feel comfortable with it so we ended up going a different route, but I've continued to think about that since how different is it really to have the "secret" be in the URL vs in a token you submit as part of the request for the URL?
And I think for me it comes down to the fact that the tokens can be issued on a per-customer basis, and access logs can be monitored to watch for suspicious behaviour and revoke accordingly.
Also, as others have mentioned, there's just a different mindset around how much it matters that the list of names of files be kept a secret. On the scale of things Amazon might randomly screw up, accidentally listing the filenames sitting in your public bucket sounds pretty low on the priority list since 99% of their users wouldn't care.
I wrote about putting secrets in URLs a few years ago: https://neilmadden.blog/2019/01/16/can-you-ever-safely-inclu...
Question in the Waterken-Key flow with token in the URL fragment the URL looks like HTTPS www.example.com/APP/#mhbqcmmva5ja3 – but in the diagram its hitting example.com/API/#mhbqcmmva5ja3 Is this a type-o OR are we mapping APP to API with the proxy so the user thinks they are going to the APP with their Key. Or does the browser do us for us automatically when it sees app in the URL and then stores the key in window.location.hash. I am confused and might just find the answer on Google but since you appear to be the author maybe you can answer the question here.
Oops, that’s a typo.
I'm not sure I grok this. Do you mean, for example, sending a token in the POST body, or as a cookie / other header?
One disadvantage to having a secret in the URL, versus in a header or body, is that it can appear in web service logs, unless you use a URI fragment. Even then, the URL is visible to the user, and will live in their history and URL bar - from which they may copy and paste it elsewhere.
In this case it's package archives, so they're never accessed from a browser, only from the Nix daemon for binary substitution [1]: https://nixos.wiki/wiki/Binary_Cache
Extremely different. The former depends on the existence of a contract about URL privacy (not to mention third parties actually adhering to it) when no such contract exists. Any design for an auth/auth mechanism that depends on private links is inherently broken. The very phrase "private link" is an oxymoron.
<https://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypert...>
Is there a difference between a private link containing a password and a link taking you to a site where you input the password? Bitwarden Send gives a link that you can hand out to others. It has # followed by a long random string. I'd like to know if there are security issues, because I use it regularly. At least with the link, I can kill it, and I can automatically have it die after a few days. Passwords generally don't work that way.
There is a difference in that people intuitively know that entering passwords gives access. Also, it may be different legally as the user could reasonably be expected to know that they are not supposed to access something.
This is a valid argument. However, I'd say that there are two standard practices with links that are a big advantage: giving them a short life, and generating extremely hard to guess URLs. I was a Lastpass customer before their security problems came out. I had many passwords that I made years ago but don't use the service any longer. I moved more into the URL camp at that time. Who knows how many passwords I made 15 or 20 years ago that today are no longer secure.
If there’s a live redirect at least there’s the option to revoke the access if the otherwise public link is leaked. I think that’s what sites like DocuSign do with their public links. You can always regenerate it and have it resent to the intended recipients email, but it expires after some fixed period of time to prevent it from being public forever.
Yes, the difference is in what all our tools and infrastructure presume to be more or less sensitive.
Sending a GET request to a site for the password-input screen and POST'ing the password will get very different treatement than sending the same amount of "authorization bits" in the URL; in the first case, your browser won't store the secret in the history, the webserver and reverse proxy won't include it in their logs, various tools won't consider it appropriate to cache, etc, etc.
Our software infrastructure is built on an assumption that URLs aren't really sensitive, not like form content, and so they get far more sloppy treatment in many places.
If the secret URL is short-lived or preferably single-use-only (as e.g. many password reset links) then that's not an issue, but if you want to keep something secret long-term, then using it in an URL means it's very likely to get placed in various places which don't really try to keep things secret.
Worked for a company which ran into an S3 bucket naming collision when working with a client - turns out that both sides decided hyphenated-company-name was a good S3 bucket name (my company lost that race obviously).
One of those little informative pieces where everytime I do AWS now all the bucket names are usually named <project>-<deterministic hash from a seed value>.
If it's really meant to be private then you encrypt the project-name too and provide a script to list buckets with "friendly" names.
There's always a weird tradeoff with hosted services where technically the perfect thing (totally random identifiers) is too likely to mostly be an operational burden compared to the imperfect thing (descriptive names).
What would encrypting the project name accomplish? Typically if you’re trying to secure a S3 bucket you’ll do that via bucket settings. Many years ago you had to jump through hoops to get things private, but these days there’s a big easy button to make a bucket inaccessible publicly.
The point is that in some cases the name of the project might itself be considered sensitive in some way, so preventing people testing bucket names by trying to create them helps prevent it, but doesn't completely lock you out of being able to associate the bucket back to its internal name, and allows the names to be deterministic internally - i.e. someone spinning up a test environment is still getting everything marked appropriately, deterministically, and uniquely.
legend.