"Just use the normal session mechanism that comes with your web framework and that you were using before someone told you that Google uses jwt. It has stood the test of time and is probably fine."
You don't need to be Facebook or Google to have more than one service in your infrastructure that needs to authenticate a user's existing session without forcing the user to log in again. Sharing the session across multiple services is its own distributed systems problem with numerous security implications to be aware of and bearer tokens might be a good alternative.
If all you have is a single monolith web app that is the identity provider, makes all authentication decisions etc then yes, you don't need JWTs probably. There is a huge gap between that and being Google/Facebook.
Apart from that, Google and Facebook don't even use JWTs between the browser and backends after the initial login but actually do have some sort of distributed session concept last time I checked.
Thank you. This middle ground between hyperscaler infrastructure and super simple web apps is where most of my career has been spent, yet the recent trend is to pretend like there are only two possible extremes: You’re either Facebook or you’re not doing anything complicated.
It has an unfortunate second order effect of convincing people that as soon as they encounter something more complicated than a simple web app, they need to adopt everything the hyperscalers do to solve it.
I wish we could spend more time discussing the middle ground rather than pretending it’s some sort of war between super simple or ultra complex.
I still don't think people in the middle need JWTs.
If we're talking about a web session, time-limited randomly-generated session tokens that are stored in a DB still work fine. If you really need it, put a caching layer (memcached or redis or valkey or whatever) in front of it. Yes, then you've created cache invalidation problems for yourself, but it's still less annoying than JWT.
If we're talking about authenticating API requests, long-lived randomly-generated auth tokens stored in a database work fine, generally. (But allow your users to create more than one, and make rotation and revocation easy. Depending on your application, allowing your users to scope the tokens can also be a good thing to do.) Again, put a caching layer in front of your database once you get to the scale where you need it. You probably won't need it for a while if you're sending your reads to read-only replicas.
(Source: worked at Twilio for 10 years; we definitely eventually ran into scaling problems around our user/auth DB, and our initial one-auth-token-is-all-you-need setup was terrible for users, but these problems were fixed over time. Twilio does use JWTs for some things, but IMO that was unnecessary, and they created more headaches than they solved.)
I'm not saying no one ever needs JWTs, but I think they're needed in far fewer circumstances than most people think, even people who agree that JWTs should be looked upon with some skepticism. If you need to be able to log people out or invalidate sessions or disable accounts, then JWTs are going to create problems that are annoying to solve.
(One possibly-interesting solution for JWT-using systems that I haven't tried anywhere is to do the reverse: don't cache your user/auth database, but have a distributed cache of JWTs that have been revoked. The nice thing about JWTs is that they expire, so you can sweep your cache and drop tokens that have expired every night or whenever. Not sure how well this would work in practice, but maybe it's effective. One big problem is that now your caching layer needs to be fail-closed, whereas in a system where you're caching your user/auth DB, a caching layer failure can fall back to the user/auth DB... though that may melt it, of course. I also feel like it's easier to write logic bugs around "if this record is not found, allow" rather than "if this record is not found, deny".)
Honestly, do you even need support for revoke? If you have a token whose lifetime can be measured in 2-3 minutes, I don't think the abuse potential is huge, especially when some other security measures are in place.
Thing is, token refresh service can be stateless, but adding a revoke service basically kills JWTs main advantage, since every time we check its validity, we need a query to see if its been revoked.
Revocation is needed because you want to disable access to an intruder in the very second you detect unauthorized access using a stolen token. Same for certain kinds of banned users who must lose access immediately.
But since such a revocation list is going to be short (usually 0 entries, dozens at worst), it's trivial to replicate across all the auth service nodes (which can as well be worker nodes) or keep it in Redis replicated per DC, with sub-millisecond lookup times.
Things get harder if you want a feature like logging out other sessions, or just an explicit logout on a shared computer (think about business settings: stores, pharmacies, post offices), you may have to have larger revocation lists. This may still not be a problem: a million tokens is a few dozen megabytes, again, a per-DC replicated Redis cluster would handle it trivially and very cheaply.
I still feel like the need for revocation kills the simplicity of JWT and thus the reason for its existence.
I'm of a more gradual opinion regarding this - say you operate a movie streaming service and control access to movies via JWT. It's not a problem if an attacker has access for two more minutes than intended.
If you are talking to a single client, I think checking the remote IP address and encoding it in the token might work to see if the token is not stolen, but don't quote me on that.
It's a complicated problem. I don't see why it should have a simple solution.
I get that this has conceptual appeal, but I doubt this makes any difference in real life. Unless you have some very sophisticated infrastructure, it takes many minutes to discover the issue and then many more minutes even to decide what to do about it. A few extra minutes to cut off access is probably not going to make a big difference one way or another.
All you really need for revocation in a revocation service are two fields: user id + inb (issued not before) and a bloom filter.
To revoke a token:
1. issue a new token to the revoker that is issued at current time (if business rules require revoker to be logged in).
2. set user inb to current time - 1 second with a TTL of longest issue * 1.5.
3. Add user to bloom filter.
4. upload bloom filter to s3, every service downloads this every 5 minutes.
5. Then on request, check bloom filter. If the user id is in the bloom filter, check with revocation service that inb > issued time.
This is probably less than five hundred lines of code and pretty easy to maintain.
Often overlooked middle-ground that vastly simplifies your revocation logic: Just have a single field "not-issued-before" timestamp assigned to each user account. Instead of revoking a single token, you have a "log out from all devices" logic - i.e. you revoke all tokens at once based on their "iat" claim (issued at). No need for revocation lists alltogether, you just make sure any tokens "iat" is never before the "not-issued-before" associated with the user. Sure, this is not as perfect UX as being able to revoke individual tokens, but token revocation in generall is something only a fraction of your uses is ever going to need.
Yeah, this works very well. A nice "log me out of everywhere, including this device" link is often all you need on the settings page.
It also makes e2e testing very easy since you should be logged out after pushing that button.
How about invalidate the user’s refresh token and the public signing key, which forces everyone to refresh and then logs out the hacked account. If it’s really serious, lock the account before doing this so the user can’t login again.
But yeah, if you have a revoke service, might as well just use session keys.
Edit: typo
No you don't, thats why i.e. even big players like Amazon with their AWS Cognito service (OAuth/OpenID Connect) don't even support revoking access tokens (only refresh tokens).
This works fine for a single service but you’re replying to a thread about the middle ground of multiple services. It’s an anti pattern to have every service talk to the same database just to authenticate every request.
By the time you add a caching layer you’re truly better off using an off the shelf oidc id provider and validating the id token claims.
Bullshit.
Do you want to expand on that? Because having a single point of failure certainly seems like a horrible practice when that single point goes down.
You’re already talking to stateful systems to do anything meaningful. A in-memory cache on top of session retrieval is so trivial and adds so few microseconds that it’s imperceptible even at large volumes of traffic.
If you’re having trouble with that, you’ve got bigger issues. Any regular work queries will take longer, and so it’s not even a meaningful area of concern if you broke down a request from end to end on a flame graph.
Yeah, so? They don’t have to be talking to the same system, and in fact it was literally what you called bullshit to originally.
That does absolutely nothing to changing the fact that a SPoF is still an anti-pattern that should be avoided.
For that matter…
Also does absolutely nothing to change that fact. You have done nothing to actually elaborate on why it’s totally not a horrendous idea to have everything communicate to the same database. Just because there’s a caching layer does not mean that fresh data wouldn’t be available if a SPoF goes down, which, once again, is the whole point here.
In my experience for medium sized services it’s still better to have everything talk to the same authentication database.
Postgres has insanely good read performance. Most companies and services are never going to reach the scale where any of this matters, and developer time is usually the more precious resource.
My advice is always, don’t get your dev team bogged down supporting all this complicated JWT stuff (token revocation, blacklisting, refresh, etc) when you are not Facebook scale / don’t have concrete data showing your service really truly needs it.
+1
For mostly-read flow like authentication, a centralized database can scale really well. You don't even need postgres for that.
If you have mutable state, JWT can't help you anyway.
JWT start make sense only when you are doing other hyperscaler stuffs and you can reuse part of those architecture
Funny, people used systems like JWT in the late 1990s. Back then you couldn’t really trust the session mechanism in your language because inevitably these had bugs and would toss your cookies for “no reason at all”.
I was inspired by https://philip.greenspun.com/panda/ circa 2001 to develop a complete user management framework based on that kind of cookie which had the advantage over other systems that the “authentication module” it took to get authentication working in a new language was maybe 40-100 lines of code. Software like PHPNuke that combined second or third rate implementations of apps all in the same codebase was the dominant paradigm then, the idea that you could pick “best of breed” applications no matter what language you were using was radical and different.
I used the framework for 10+ projects, some of which got 350,000+ active users. As an open source project it was a complete wash. Nobody got interested in user management frameworks (as opposed to writing your own buggy, insecure and hard-to-use auth system in a hurry) until around 2011 or so when frameworks based on external services all of a sudden popped up like mushrooms. Seemed like the feature I was missing was “needs to depend on an external service that will get shut down with the vendor gets acquired”
Alternatively, just don't worry about token revocation and all that complicated stuff? So you have a window of 5 minutes (or whatever your access token expiry is) that you can't revoke - is that a big deal?
A simple JWT implementation isn't that complicated, but you have to accept some limitations.
"have a distributed cache of JWTs that have been revoked. The nice thing about JWTs is that they expire, so you can sweep your cache and drop tokens that have expired every night or whenever."
Every cache has TTL, so you just set the TTL of the entry to the expiration date of the token you are caching. No need for nightly cleanups.
I'm not sure cache was the right word in the parent post-- you don't want to use a cache (at least one with LRU/bounded size) to store revocation without a backing store, or else the revocation could get pushed out of the cache and become ineffective. The backing store (likely a DB) would require such cleanups once the revocation record is no longer relevant.
I would challenge your assumption. Unless you absolutely need to have 100% durable, consistent revocations for some reasons, something like memcached is perfect here as the worst case scenario in case of a failure is a slight, temporary degradation in security without any visible user impact or operations nightmare (ie restoring backups). This assumes that your token lifetime is reasonably short (at least for access tokens), refresh tokens are a different story but only need to be tracked at the authn service, not globally.
If the revocation use case is soft, then totally fair. But if the application is potentially dangerous and the user says "Sign out all devices", I think that should be a deterministically successful operation. Similarly, if there is a compromised account in an organization, I'd like to be confident that revoking all credentials was successful.
Revocation of tokens can be done for a simple logout operation, in which case the stakes are low, but more often it is the "pull the fire alarm and get that user out", and in that case it should be reliable.
Potentially you take both at once: use something like DynamoDB as the storage layer that also supports TTL natively.
My understanding was that this is the ENTIRE benefit of JWTs (over plain session token): they allow you go from a allowlist to a blocklist, which is more efficient at really large scales because you only have to store expired sessions (until their time-limit expires) rather than every session).
And if you're not doing this then there's no point in using JWTs (which will be the case for most people).
Are there any other benefits I'm missing?
I don’t know of any companies that even do this. As far as I know, most use cases store nothing, except for of course the client storing the response.
Why not just stick your auth token in the cache. It's supposed to expire anyways.
Back in the day we used memcached for our primary store for all sorts of ephemeral things. Including user sessions.
Items are evicted from caches all the time for non-expired reasons. Memcached, in particular has "slabs" (spaces for objects of a certain size) and once those slabs are full, items are evicted to make space for new items.
As you point out, in most use cases a random token will be fine and it all comes down to how and where it is stored.
But that also means that you can have JWTs that are used as "random token" for most of your app, cost to produce them isn't high, and only make use of the additional capacities for instance when
- when you want to check signatures (e.g. reject before hitting your application layer)
- store non sensitive base64 data that you want before restoring the session
Creating and handling JWT is only as costly and complicated as you want it to be, so there's IMHO enough flexibility to have light use with very few penalities for it.
"If we're talking about a web session, time-limited randomly-generated session tokens that are stored in a DB still work fine. If you really need it, put a caching layer (memcached or redis or valkey or whatever) in front of it. Yes, then you've created cache invalidation problems for yourself, but it's still less annoying than JWT."
You just (somewhat handwavingly) described what Google and Facebook are doing. You might not need to build this globally highly available distributed session store, JWTs might be an ok solution for your use case too (because you are not Google or Facebook) - or not. It depends on what your requirements are. AuthN across services is somewhat complex in any case, I don't think there is an easy way around it without making tradeoffs somewhere. JWTs are a great tool to consider here.
You don’t need jwts to pass internal permissions. We don’t, but we still extract claims from a jwt token at the beginning of a user flow. Then later we only use the claims to determine which resource a user has access to.
It’s not necessarily easier than just passing the jwt, but with our internal setups where when you first pass through the authorisation system, our traffic on your behalf is secure it doesn’t really warrant a reason to decode your token multiple times rather than simply passing your access permission claims.
We do still pass your jwt between isolated “products” where your access request doesn’t pass through dapr, but rather back through our central access gateway and then into the other “product”. A product is basically a collection of related services which are restricted to be a business component. Like a range of services which handles our solar plants, and another business component which handles our investment portfolios, and so on.
It's shocking how often this advice isn't followed. We often see it with non-tech companies who nonetheless deliver services over the internet.
How is this better than JWT if we have 30 microservices called from front-end?
100% this. I am tired of you don't need microservices, you don't need JWT, you don't need Kubernetes, you don't need ElasticSearch, you don't need IAM, you don't need Redis, you don't need Mongo and everything should stay in one SQL database.
Things are being used not because they exist, because people want to be fancy or because they don't have something better to do. Things are being used because they solve problems and do so with least effort possible.
"Things are being used because they solve problems and do so with least effort possible" in an ideal world sure in the real world there are many factors that influence technical decisions often having nothing to do with actual problem being solved
Having worked at many places over the last 30 years, yes, there is definitely "resume-driven development" where people pick something they want to put on their resume to solve a problem regardless of its suitability to the task in hand.
There's also "blinker-driven development" where people pick the solution based on their own personal set of hammers rather than, again, something more suitable.
(There's loads of these though - e.g. "optimisation-driven development" where the solution MUST GO BRRRRR even if the problem could be fixed by Frank typing in "Yes" once a week. "GOF-driven development" where everything has to rigidly fit into a GOF pattern regardless of whether it actually does. "Go-driven development" where everything has to be a interface and you end up reading a method called Validate which calls an interface method Validate which calls an interface method Validate which calls an interface method Validate and you wake up screaming every morning because why just wtf why please help me pleasehelp)
If I'd find myself in a place where they do "GOF-driven development" or "Go-driven development" I'd search for another job ASAP.
I don't say what you describe doesn't happen, but it's my impression that most people try to adopt solutions that minimize costs and development time (which also translates to money). 99% of the time it's not "do the best thing to satisfy solve this problem" but it's "solve this problem as fast as possible, without adding additional costs and using as few developers as possible".
Agree with that but from my experience that's more like 20% of the time. The rest is the various kinds of bullshit development where people are padding resumes, having boss's pet hobby horse forced on them, latest shiny flimflam, etc.
(A decent chunk of that 30 years has been contracting and that tends to be at places with problems which might be biasing my sample set.)
Well in big orgs people get shuffled around teams so even if you joined a team that is aligned with the way you feel things should be done you might end up in totally different env. after a period of time.
chuckle ... how far did you go into the validate rabbit hole
Thankfully it was only 3 interfaces down.
The whole codebase is riddled with the same kind of layering but we do now have guidance about doing stuff like that ("DON'T") and a plan to simplify some of the worst offenders (like the multi-layer `Validate` hell hole.)
Or people in this industry are geeks and curious and like to try new stuff and technologies just for fun?
That's the case for the majority of people I've worked with.
At least on different schemas.
That and don't let one concern access data from another, or you'll have to coordinate schema changes between those different concerns.
You have to be pretty big before "store the session information in Redis" doesn't work anymore.
or you have crappy code that can only handle a dozen RPS. [facepalm]
Or just have infrastructure that needs to validate the session in different parts of the continent (world).
And most of these session mechanisms are easy to work with as middleware in common web app frameworks making it pretty simple to stick with simpler sessions if everyone can get to the session store. Everyone way over complicates authn and sometimes barely even think about authorization. I have seen many a web app with poor JWTs implementation and abusable authz get broken. Sometimes the apps warranted the JWT implementation but it is a lot harder than many devs think.
JWT's really can span this middle ground. They're helpful in answering the who-are-you question without resorting to elaborate db work. Even middle-ground monoliths are often deployed across more than one independently operating web server (say, JVM processes) and JWT's ensure that each server answers the who-are-you question with the same answer - the code is the same, and although the process space is different on each web server, the answer is the same. So chained requests, with API.REQ.1 to Server 1 and API.REQ.2 to Server 2 will actually work. Maybe session mechanics will work, but what if you don't actually have a session and just a bunch of API requests?
Querying a database for a session id isn’t elaborate work. It’s also trivial because as TFA mentioned, literally every major web framework ecosystem with has a solution for this.
God, how hard is SELECT * WHERE …, seriously.
You need to share a session across websites? Wow! Connect to the database holding the sessions.
Boring.
Ever heard of speed of light? If you really think that "just connect to the same db" was an easy solution to the problem you describe in the general sense then you haven't done so in a moderately complex system yet. It can be a good solution for a very limited set of circumstances, but that's about it.
It’s bullshit. A majority of high volume systems can do this just fine. This is just engineering wank.
Standard solution is query a session table on from a single location, and once you actually start to need to trim request time, it’s not even the first place you look.
It's not a trend. Those on the extreme ends of the spectrum are always the most vocal.
Specially since the middle is way larger than people think.
This is a perfect example of "it depends" being the right answer.
Should a project use sessions or JWTs? One isn't right or wrong, it all depends on the context of the project.
I mean, to be fair, the article literally calls out a fairly reasonable checklist.
Do you maintain a database of JWT session tokens for refresh and revoke?
Do you have a real session that you load for every user every request anyway?
If the answer is 'yes', then the answer to 'use JWT' isn't 'it depends'.
It's no.
The author here seems to be arguing that you should effectively never use JWTs. That, in my opinion, is a mistake.
JWTs have absolutely been over-hyped for the last 8-10 years, but they do have a use and you don't have to be at the scale of Google for it to be the right approach.
Software isn't as simple as creating a checklist of a few basic categories and saying there is always a right or wrong answer. The answer should be "it depends" because there are many more factors at play when deciding something as fundamental as authentication and authorization.
Ok, then show us an example scenario where using JWTs makes sense, while the author's checklist says it doesn't.
I may not have been clear there. I'm not saying the author's list is wrong, I'm saying its incomplete.
For starters, the blog post leads with the premise that the answer is always "No". Even ignoring that as playful banter, the checklist is way to broad to be universally true.
These all assume that I authorization data and that the user data lives in the same database. I may store my own user object with a unique ID that references an authenticated user, while the author flow is actually managed by an entirely different service (whether first party or third party). I also may have my user data stored in a separate database, say for legacy reasons or data security/integrity requirements. Not every database is crested equal, its totally possible that one database is a better fit for user data while a totally different database is needed for application data.
Coming up with a list of broad scenarios and claiming that if you check any one there is never a reason to pick JWTs over sessions is misguided and overly simplistic. Context always matters.
JWT makes it possible to distribute the same access token across multiple systems, but so do stateful tokens. The security implications when you're using JWT for this solution are much higher than with database tokens. Let's look at this for a moment:
JWT Security Issues:
- Inherent design issues (alg=none, algorithm confusion, weak ciphers like RSA)
- Cannot be revoked.
- Key rotation and distribution is necessary to keep token safe over a long period of time
- Claim parsing needs to be standardized and enforced correctly in all services (otherwise impersonation is possible)
Database Tokens Security Issues:
- Timing Attacks against B-Tree indexes [1]
- Giving direct access to the database to all microservices is risky
The security issues with databases are ridiculously easy to solve: To prevent timing attacks, you can just use a hash table index, split the token into a search-key part and a constant-time-compare part or add an HMAC to your token. To prevent direct access to the database by all your microservices, you just wrap it with an API the verifies tokens.
The JWT security issues are much harder to solve. To prevent misconfiguration or misuse and standardize the way claims are used across your organization, you probably need to write your own library or deploy an API gateway. To counter the lack of revocation support, you either need to use very short lived access tokens (so your refresh token DB will still get a lot of hits and you would still need to deal with all the scaling issues) or set up a distributed revocation service (not easy at all). Setting up seamless key rotation also requires additional infrastructure that is not part of the hundreds of JWT libraries out there.
It's really easy to get a JWT solution that just works and scales easily, but if you really care about security — especially if you care about security! — JWTs are not necessarily easier than stateful tokens. They're probably harder.
Last time I checked (which was today for Google), neither Google, nor Facebook is using JWT for their access or refresh tokens. The only place I saw JWT with the ID Token in their Open ID Connect Flow, and they can't really avoid that even if their wanted, since this is mandated by the spec.
Facebook and Google don't need JWT. Scaling and distributing a read-only token database to handle a large amount of traffic is easier — not harder! — for these companies. Stateless tokens can be useful for them in certain scenarios, but even then, if you're at Google or Facebook's scale, why would you opt for JWT over an in-house format that is smaller, faster and suffers from less vulnerabilities?
[1] https://www.usenix.org/legacy/event/woot07/tech/full_papers/...
I don't think keys need to be distributed per se, rather made available at a URL that can be served by the same service that issues the tokens? You could call that distribution, but that's probably not what you meant. I agree that a lot can go wrong, but isn't that also true for home growing a distributed database tokens solution (I surely have seen some monsters in the wild). So can't the problems with both solutions be mitigated using some good libraries?
"Made available at a URL" is one possible distribution mechanism, yes. But this only works for asymmetric keys. If you publish symmetric keys (e.g. for HS256) at a shared URL... Well, now everyone can get these keys and forge tokens to their heart's content.
Even with asymmetric tokens and a key distribution URL, you still have to make sure the clients periodically update their list of keys — this is not something you get built-in with every JWT library. And you still have to setup a mechanism for generating the keys and distributing them between the various instances of your auth server. This is not so hard nowadays with cloud KMS services, but setting up this solution on our own infra was quite painful.
My point is not that a database solution is without its own issues. At my $DayJob we're also using a mix of stateful and stateless tokens (with distributed revocation lists) and we had to deal with issues with both of them. But stateful tokens on a database rarely suffer from security issues — the issues we had were always related to scalability and performance. The mitigations for these issues are also different: they almost always have to do with optimizing the infrastructure (scaling out the database, adding a cache) rather than using a library. In fact, when we use stateful tokens in a distributed scenario (and I'm sure this is true for almost everyone out there), all token handling is centralized in the auth service, so libraries are not strictly necessary. At most, client libraries would be very thin wrapper around HTTP API calls.
As per the JWT spec: “ Finally, note that it is an application decision which algorithms may be used in a given context. Even if a JWT can be successfully validated, unless the algorithms used in the JWT are acceptable to the application, it SHOULD reject the JWT.”
There is no reason you can’t keep a list of valid “session” identifiers and check the JWT is valid against that as part of verification. Then the only state you need to store server-side is the identifier. You get the exact same benefits of server-based session stores without needing to store the entire session on the server - just the identifier.
Just use elasticcache for sessions duh. Its a very very old pattern.
Agree, and he even mentions in the article "If you process less than 10k requests per second, you’re not Google nor are you Facebook."
There is a huuuuuge gap between services handling 10k req/s and Google/Facebook.
I think one big upside with JWT that he doesn't mention is that if you have some services geographically distributed, then having decentralized auth with JWTs is quite nice without having to geographically distribute you auth backend system.
So, yes, if you have a monolith or services colocated, or have some kind of monolothic API layer, then no, perhaps JWT does not make sense. But for a lot of distributed services, having JWTs makes perfect sense.
And you don't have to introduce JWT revocation for logout, if you have short token expirations, you can accept the risk of token leakage. If the token is valid for like 30 seconds or 1 minute, you would probably never be able to notice that a token has been leaked anyway.
Why would there be any relationship between user sessions and microservices? Are you exposing them directly to the internet?
It's pretty rare to have more then 1 client facing API even for large apps. Whether it's a monolith, an API gateway, Apollo federation or whatever.
What you do to authenticate between the BFF (for lack of a better name) and other services is a different matter.
Bearer tokens are simple until you have to invalidate them. Then you get all the complexity of both solutions.