If you had asked me to make a wild guess as to how Cloudflare stores internal headers and then removes them, I would have come up with some options:
- An entire separate dictionary or other data structure.
- One single header containing all internal metadata.
- All headers have a prefix, and the internal ones start with I and the external ones start with E.
- All internal headers start with “CFInt”.
I would not have come up with a scheme in which headers in a particular list are internal. (What if someone else uses this name? What if something forgets to sanitize? What if different simultaneously running programs disagree in the list? What if the Connection header names a Cloudflare-internal header? What if the set-difference algorithm is annoyingly slow?)
The web is already full of obnoxiously ambiguous in-band signaling and header naming, and I find it bizarre that a company with Cloudflare’s scale uses such a tedious and error-prone mechanism internally.
Ive worked at several huge corporations in IT security, where we care about headers a lot, and they all use headers in a manner similar to CloudFlare.
Including using proxies at the edge to strip out internal headers bidirectionally- yes, inbound too.
That doesn't make it better, it makes it worse.
Are you on a corporate network? Do you use a firewall at home?
You’re on enclaves all the time. This is just a different one. Separate networks per class of traffic used to be de rigeur before Cloud. Now it’s all munged together.
I understand that security needs to be pragmatic, but because security does it, doesn't make it right.
It isn't about being in the enclave, it is having to keep track of what headers you set vs external. It is fragile and error prone and it will absolutely break someone else when there is a name collision.
All security exploits are a subset of bugs.
And what would you have them do instead? SSL to every service? Reauthenticate on every single call? What’s your clever solution?
There’s probably a dozen things you can terminate at a boundary that would cost so much to run through the entire system that it would bankrupt a company.
And then there’s tracing, which also uses headers.
one of the listed options was to begin all the Cloudflare internal header names with "CFInt"
That’s still header filtering. And it should be X-CF- by the way.
X-CF- seems to be used by some other software. And Cloudflare has plenty of documented headers that don’t start with X.
The whole point is that a genuine user header should never conflict with a Cloudflare internal header, and a browser should never see an internal header.
Having re-read the thread, I totally agree. What you have listed is table stakes. I'd also say that internal headers would also be encrypted to a narrow protection domain.
If all internal headers were prefixed with X-CF-, you could strip them all via SIMD that had no knowledge of any specific header. Hell, you could probably do it on the NIC.
Please reread the entire thread. We are talking about someone who thinks the header solution is stupid. You are both splitting hairs. Stay on target.
It took me about 10 years to hear about it, but the IETF tried to deprecate the X- prefix back in 2012: https://datatracker.ietf.org/doc/html/rfc6648
Good to know.
(Also sibling is right, I spaced on X-CF- being a header sent to CF customers’ servers. I don’t used cloudflare but cloudfront does the exact same thing)
Maybe I'm dumb or missing something, but why not just do what Google does -- convert all inbound HTTP requests into Protobufs, separate out the internal stuffs from the external stuffs, run it through your myriad of internal services, and then on the way out you still have your nicely delineated protobufs which you can convert back into HTTP. Why are we mucking about with headers in the first place?
Configure all machines in a way that they can survive on some random untrusted public wifi. This should be the obvious default stance for laptops and phones.
But even for workstations wired to a trusted LAN it still makes sense because you never know which of the various tunnels might assign and expose some IPv6 address to the internet.
For servers you might be able to make an exception if you have a vigilant IT and the people with admin access aren't actively trying to circumvent security, but even then that better not be your only layer of security.
The first large scale piece of software I worked on was for telcos pre smart phone. We used internal headers to offload authentication and terminate SSL. We also had to pressure F5 to fix about half a dozen bugs in BIG-IP to do so. Bugs that should in no universe have existed in version 9 of a product.
I used to joke that F5 owed me and my coworker 3 months of salary for all the free QA we did for them.
It helps if you realize that BIG-IP 9.0 was essentially a from-scratch rewrite of BIG-IP 4.5. Among other major redesigns, the data plane was moved from BSD kernel space to Linux user space. Internally, the joke was that it would be two times better when we were done (4.5 * 2 = 9.0). It probably was, but not on day zero.
On day sixty it couldn’t do SSL termination and cookie based traffic shaping didn’t work.
I was a little bummed it was “just” a Linux box but that’s pretty common today. I hadn’t discovered dd-wrt yet and wouldn’t for a few years.
Yeah, me too, but systems grew over time and grew and grew and we were using HTTP headers for all sorts of stuff. This optimization makes the situation better, but the solution (which is underway) is to use a different mechanism for IPC and get rid of the use of HTTP headers completely.
I’m certainly familiar with systems growing over time and outgrowing their original schema.
Is this the same phenomenon that resulted in Cloudflare Tunnels apparently being domain names? Or why domains that are proxied by Cloudflare show up as “CNAME” in the DNS panel? (The latter one seems extra strange — wasn’t the reverse proxy service the very first service that Cloudflare offered? There must be some history here.)
All new products are full of scaffolding that has to be removed when robustness becomes a higher priority than existence. Where we fail is not calling it that. Instead we just call some code “good” while our thicker skinned coworkers call the rest “fine”. It’s not fine. I don’t want to be working overtime the week of thanksgiving because you can’t think past the horizon.
There is a very fine line between "good enough" and "not good enough" in any product beyond a certain complexity. Finding the pieces that cross that line can be hard and improving "good enough" parts is (sadly) mostly a waste of time in commercial settings.
You have to backdate starting a project so it happens before things stop being good enough. And take into account bad estimates and the five other deadlines between then and now.
Often it’s better to clean as you go. If you have the time and the inclination, shore up one of the least good enough things.
It’s not unheard of to open up new feature sets via refactoring. Something that previously would have taken an “impossible” amount of time now becomes profitable, not just possible.
I miss working at a place where that was encouraged. Now if I do that the testers complain that it’s not in scope for the ticket and to split the changes into another ticket with a separate test plan. The rot thus continues.
I think this is about when I discovered zone defense for performance optimization. QA doesn’t want to chase your code changes halfway across the codebase, and that’s fair. At some point it starts looking like you’re changing things just to change them, even if it ends up being a few percent better.
But if you limit yourself to one part of the interaction, they don’t get as mad. A dozen changes in the signup system piss them off less than five changes across the app.
This is why some bugs never get fixed (that is if you define a bug to include inefficient code and not just code that breaks something).
And when that happy day dawns, slightly sadly, this nice optimisation will evaporate!
Maybe the real treasure was the optimizations we made and discarded along the way.
Indeed. I am very happy with some of the tricks that I developed (speeding up, and indeed allowing, Solaris C++ linking by prefiltering on MD5 hashes of canonicalised .o files -> nm, anyone?) even though they were lost without trace (yep, Lehman Brothers, where are my Imakefiles now?)...
They're probably as critical as ever at Barclays :)
I really hope not, given that that very old Solaris compiler should have been laid to rest many many years ago. Yes, some of my stuff had kinda made it into BarCap (even onto my desktop dev machine!) when I turned up there later, and even some at Nomura when I rocked up there again IIRC!
My likely overly cynical concern here is that this suggests trie-hard will soon end up abandoned, as you're making it sound like it is a stop-gap solution for you.
For anyone not in the loop, the above poster (JGC) is Cloudflare's CTO himself :)
You chiming in on this post makes me slightly bitter at having gone with CloudFront (I have everything on AWS so it seemed the obvious choice) instead of Cloudflare.
I feel like I can point out lots of similar problems to the other solutions you suggest (heck, I thing some of those problems you list even apply to those other solutions).
The list approach has some downsides, but it also has a bunch of upsides. I feel like when people like to point out potential flaws of these approaches, they're ignoring the history and difficulties that comes with Cloudflare's scope. An enumerated list is the simplest and most flexible of the approaches, and it also doesn't require any a priori agreement on the structure of a header key - this is probably important when you think about the sheer number of teams at Cloudflare, potential technology acquisitions, etc.
That’s more of an indictment of how little effort is spent aligning things. It’s not that hard to tell every team that any headers have to start with ‘CFint’ and then enforce that.
* Headers that are originally private, but are later changed to public
* Headers that are intended to be public, but then are changed to private
* Headers that were created in the very early days of the company, which are so deeply integrated into everything that renaming them is basically impossible (or at least very hard and risky)
* Headers that are set by some appliance/third party solution not controlled by CF, but which are technically internal and must be stripped.
* Newly acquired company has its own set of internal headers. Management is not amused when a huge, disrupting refactor is suggested by engineers.
And this is just a tip of the iceberg of possible problems when your scale is big enough.
So because everything is a Byzantine labyrinth due to scale we're all going to agree to remember to add internal headers to strip to another service?
Maintaining the header to strip list is just another piece of technical debt to keep paying down.
Having a policy to strip everything under a prefix seems much less error prone.
How many years do you have to spend in software before you stop uttering those four cursed words?
It takes just as much experience to acknowledge that the cure is worse than the disease.
To be a little bit blunt, this just means you have never built software in a very large company that may have a lot of existing/completely independent software, or have made technology acquisitions after the fact. That is, if you are starting from "time 0" of course it's very easy to say "internal headers need to start with CFInt", but that's rarely the case.
I think if you work for such a very large company the only reason you are ok with it is because nobody at the top says ‘no more’.
If they do, it suddenly becomes very easy to get everything aligned, because there is likely a vast collection of bodies just collecting a paycheck that can be more productively assigned to manually renaming everything.
Lists, extensible protobufs, etc are indeed great in their extensibility. The issue I would see (if I worked at Cloudflare) isn’t to using a list — it’s that the list is the same list that’s used for externally visible HTTP headers.
I don't see that as a problem at all:
"Hey any teams out there, before you can add internal headers, you need to register them with some service." Then the list is loaded from static storage at startup time and/or refreshed on some periodic basis.
Other ideas:
In fact, if you preseed, you are basically combining these ideas but fixing how many internal headers are on each request. At that point, you can use a linked hash table that preserves creation order and just remove the first N from the final list that you send back to clients.While Python provides data structures like this out of the box, designing a big system to require a special, nonstandard data structure to identify private metadata seems like a mistake to me.
I'm not sure I follow? Does rust not have the equivalent of a LinkedHashMap from Java? All I'm proposing is to start each map of headers off with a sentinel value for all of the headers you don't intend to send back to the client. Then, have a method that iterates over all values, for the places you need to send to internal servers, and another (probably the default?) that starts iteration after all of the internal items, which should just be a different pointer. At this point, there is no need to inspect each header by name, as that was effectively done at compile time when you put the protected headers at the front of the list.
Is this somewhat special? I mean, sure. But it wasn't long ago that a hash table was already in the special territory of data structures.
Edit: I should add that I'm not certain this idea would be better. Just more ideas on what you could do.
There’s a third party crate for this. C++’s STL also doesn’t have it by default.
The creation-order-preserving hash map is basically two data structures in one, it’s more complex and slower than a normal hash map, and IMO it’s rather bizarre and I have never really understood why anyone wants one.
I would be surprised if it was dramatically slower, all told. Though, fair that I have not benchmarked in a long time, here.
My main "idea" here is to stop from having to check all of the headers on a "per header" basis. Yes, you can make each check faster, as the TRIE does. You can also remove the entire reason to check individual headers by starting a traversal after where the internal headers are.
You could also go with something like making sure the first header is "InternalHeaderCount: N" where you then keep the internal headers segregated form the others by a simple count. (This assumes you have solid control over what is adding the headers internally, of course.)
(I also forgot to give kudos on the blog title. Any Die Hard reference is a good reference. :D )
It's especially unfortunate that the intuitive name, `std::map`, is that ordered and generally least useful option of the standard library's three hash map containers.
My last job did rely on ordered-ness of `QMap`, because the elements are listed in a Qt GUI and it could confuse users if those widgets randomly rearranged themselves. That's the only use-case I've personally encountered for an ordered hash map.
Former employee here; the interesting (horrifying?) fact is that you can set some of these internal headers (I remember a notable bug with cf-cache-status) in workers (the serverless offering) and cause all kinds of bad things to happen :)
It seems to me it should be trivial to strip anything internal that comes out of a worker right?
Trivial unless someone wants to modify CF internal headers as part of their solution…
https://xkcd.com/1172/
Haha thank you
Or perhaps even, insert yet another header with just the list of internal headers being added to the request, assuming this happens at a single place, otherwise a recipe for disaster.
I have a slightly different example of this, where a rpc framework used in my company disallows the service owner from modifying certain headers (say request identifier), and will instead create a duplicate header with a suffix. In that scenario at least, I can see this as a fairly reasonable tradeoff, as the goal is to control certain headers from being modified, not because they are platform internal but because there are certain assumptions associated with those headers that are followed company-wide.
I'll go check what mechanism is used for this matching.
Or the same but with a list of headers which AREN'T internal.
You'll probably have a custom header-adding function that people should always use instead of the regular one. And this way, if someone forgets to use it, their header will get stripped.
You can think of a header escaping the internal network as something that needs to be authorized. This is a deny by default approach.
Ah yes - perfect.
This doesn’t work if the front end is older than an internal node and a header gets added, at least not without extra logic to patch everything up and handle conflicts.
The HTTP Connection header works like this, and one should generally assume that almost nothing implements it correctly.
Couldn't you bypass this by pre-fixing the ones that aren't yours? Or prefixing your internal ones with something unique enough?
This might of course introduce quite a bit of overhead, but the clean solution would arguably be to not mix requests to begin with, the external requests you are processing are the payload of the requests you are sending around internally. This would also allow you to cook up an optimized encapsulation protocol that might be more efficient to process than parsing and modifying HTTP. This admittedly comes from someone with essentially zero knowledge of what exactly Cloudflare is doing to the requests they receive.
X years ago the manager of the cache team raised the lack of data plane and control plane separation as an issue and we came up with various plans to fix things, but I guess nothing has happened since