I worked at Firebase for many years and the concerns with security rules have always plagued the product. We tried a lot of approaches (self expiring default rules, more education, etc) but at the end of the day we still see a lot of insecure databases.
I think the reasons for this are complex.
First, security rules as implemented by Firebase are still a novel concept. A new dev joining a team adding data into an existing location probably won’t go back and fix rules to reflect that the privacy requirements of that data has changed.
Second, without the security of obscurity created by random in-house implementations of backends, scanning en masse becomes easier.
Finally, security rules are just hard. Especially for realtime database, they are hard to write and don’t scale well. This comes up a lot less than you’d think though, as any time automated scanning is used it’s just looking for open data, anything beyond “read write true” as we called it would have prevented this.
Technically there is nothing wrong with the Firebase approach but because it is one of the only backends which use this model (one based around stored data and security rules), it opens itself up to misunderstanding, improper use, and issues like this.
To be honest I've always found the model of a frontend being able to write data into a database highly suspect, even with security rules.
Unlike a backend where where the rules for validation and security are visible and part of the specifications, Firebase's security rules is something one can easily forget as it's a separate process, and has to be reevaluated as part of every new feature developed.
Yeah, I've never understood how this concept can work for most applications. In everything I build I always need to do something with the input before writing it to a database. Just security rules are not enough.
What kind of apps are people building where you don't need backend logic?
I think I missed where writing to the database precludes backend logic. Databases have triggers and integrity rules, but beyond that, why can't logic execute after data is written to a database?
Because once it is written to the database, it can be output somewhere before you execute your logic. IE, explicit language, child porn, etc. You generally want to check for that BEFORE you write the data.
You're saying it's impossible to have public write access to a table without also providing public read access?
"it can be output somewhere before you execute your logic" is a design choice that is orthogonal from whether you execute your logic before or after input into the database.
You generally don't want to write child porn to disk, if you can help it.
First of all, most database records couldn't fit child porn, unless it was somehow encoded across thousands of records, in which case you couldn't realize it was child porn until after you've stored 99% of it.
Sure though, by putting "child porn" in a sentence, you can make anything seem bad. Tell me this, would you rather your application middleware was in the "copying child porn" business? ;-)
Actually, the more I think about it, the crazier this seems. You're going to store all the "child porn" you receive in RAM until you've validated that it is child porn?
I don’t get your tone or why you seem shocked that binary data can be stored in a database. Postgres and MySQL both have column sizes for binary data that can hold gigabytes.
Second, you generally need to hold the entire image in RAM to create the perceptual hash needed to check that the image is/isn’t child porn.
My tone is shocked, because what you're describing seems totally removed from any system I've seen, and I've implemented a ton of systems. For performance reasons, you want to stream large uploads to storage (web servers, like nginx, are typically configured to do this even before the request is sent to any application logic). You invariably want to store UGC data that conforms to your schema, even if you're going to reject it for content. There's a whole process for contesting, reviewing and reversing decisions that requires the data be in persistent storage.
I think you misunderstood what I said. Yes, Postgres, MySQL and a variety of other databases have column sizes for binary data that can hold gigabytes. What I wouldn't agree with is that most database records can hold gigabytes, binary or otherwise. Heck, most database records aren't populated from UGC sources and not UGC sources where child porn is a risk.
But okay, let's assume, for arguments sake, most database records are happily accepting 4TB large objects, and you're accepting up to 4TB uploads (where Postgres' large objects max out). Do all your web & application servers have 4TB of memory? What if you're processing more than one request at once, do you have N*4TB of memory?
At least all the systems I've implemented that receive data from users enforce limits on request sizes, and with the exception of file uploads, which are typically directly streamed to the filesystem before processing, those limits tend to be quite small, often less than a kilobyte. Maybe someone could write some really terse child porn prose and compress it down to fit in that space, but pretty much any image would have to be spread across many records. By design, almost any child porn received would be put in persistent storage before being identified as such.
This is one of many reasons that you generally want to stream file uploads to storage before performing analysis. Otherwise you're incredibly vulnerable to a DoS attack on your active memory resources. Even without a DoS attack, you're harming performance by unnecessarily evicting pages that could be used for caching/buffering for bytes that won't be served at least until you've finished receiving all the file's data.
[Note: Many media encodings tend to store neighbouring pixels together, so you can, conceptually, compute a perceptual hash progressively, without loading the entire file into active memory, which is often desirable, particularly with video content.]
Thought about it some more... this whole scenario makes sense in only the narrowist of contexts. Very few applications directly serve UGC to the public, and a lot of applications are B2B. You're authenticated, and there's a link to your employer (or you if you're self-employed). Uploaded data isn't made visible to the public. Services are often limited to a legal jurisdiction. If you want to upload your unencrypted child porn to a record in Google's Firebase database, you go ahead. The feds could use some easy cases.
Many apps where every user has his own data, which just needs to be synced between devices.
Curious as to which apps if there are any you can point to?
A typical notes app.
It's a really good question
context: I have a near-100% naive perspective. Mobile dev whose built out something approximating Perplexity on Supabase. I have to use edge functions for ex. CORS, but by and large, logic is all in the app.
Probably because the client is in Flutter, and thus multiplatform & web in one, I see manipulating the input on both the client and server as code duplication and error prone.
I think if I was writing separate native apps, I'd push everything through edge functions, approximating your point: better to have that sensitive logic of what exactly is committed to the DB in one place.
Writing directly to Firebase is rarely done past the MVP stage. Normally it's the reading which is done directly from the client. Generally writes are bounced through Cloud Functions or a traditional server of some form. Some also "fan out" data, where a user has a private area to write to (say a list of tweets) then they get "fanned out" to follower's timelines via an async backend process which does any verification / cleansing as needed.
Are you suggesting that it's essentially too easy for a dev to just set and forget? That's a pretty interesting viewpoint. Not sure how any BaaS could solve that human factor.
You could either do away with the model of the frontend writing to the DB and ask customers to implement a small backend with a serverless component like AWS Lambda or Google Cloud Functions.
Barring that, perhaps Firestore could introduce the concept of a "lightweight database function hook" akin to Cloudflare workers that runs in the lifecycle of a DB request, thus formalizing the security requirements specific to the business requirement and causing the development organization to allocate resources to its upkeep.
So while a security rule usually gets tested very lightly, you'd see far more testing in a code component like the one I'm suggesting.
Firebase has triggers.
Say you add a super_secret_internal_notes field. If you're writing a traditional backend, some human would need to explicitly add that to a list of publicly available fields somewhere (well, hopefully). For systems like Firebase, it's far too easy to have this field be created by frontend code that's just treating this as another piece of data in a nested part of a payload. But this can also happen on any system, if you have any JSON blob whose implicit schema can be added to by frontend development alone.
IMO implicit schema updates on any system should be consolidated and lifted to an easily emailed report - a security manager/CSO/CTO should be able to see all the super_secret_internal_notes as they're added across the org, and be able to immediately rectify security policies (perhaps even in a staging environment).
AFAIK Firebase doesn't do this - while there are pretty audit logs, there's not an automatic rollup of implicit schema changes: https://firebase.google.com/support/guides/cloud-audit-loggi...
(Also, while tongue in cheek, the way that the intro to a part of Firebase's training materials https://www.youtube.com/watch?v=eMa0hsHqfHU implicitly centers security as part of the launch process, not something ongoing, is indicative of how pervasive the issue is - and not at all something that's restricted to Firebase!)
Generally agreed on improved audit logs of some formed helping.
Re training materials, this is one of the mitigations we launched to attempt to pull security to front of mind. I do not really think this is a Firebase problem, I think average developers (or average business leaders) just don't, in general, think much about security. As a result, Firebase materials have a triple burden - they need to get you to think about security, they need to get you to disrupt the most "productive" flow to write rules, and they need to get you to consistently revisit your rules throughout development. This is a lot to get into someone's head.
For all the awesomeness of Firebase's databases, they're both ripe footgun territory (Realtime Database specifically). Our original goal was to make the easiest database to get up and running with, which I think we did, but that initial ease comes with costs down the road which may or may not be worth it, that's a decision for the consumer.
I think it's more like there's more surface area to forget when you have humans handling so many concerns, and it's not likely the part that's changed the most so it's a likely candidate for being "pushed out of the buffer" (of the human).
In a more typical model, backend devs focus more on security, while not needing to know the frontend, and vice versa.
Eventually, humans will forget, set or not.
We tried to contact google, via support to try to help or for them to help disclose the issues to the websites. We got no response other then a response telling us that they will be creating a feature request on our behalf if we wanted instead of helping us, which is fair as I think we'd have to escalate pretty far up in Firebase to get the attention of someone who could alert project owners.
One of the things we fought for, for years after acquisition was to maintain a qualified staff of fulltime, highly paid support people who are capable of identifying and escalating issues like this with common sense.
This is a battle we slowly lost. It started with all of support being the original team, then went to 3-4 fulltime staff plus some contracts, to entirely contractors (as far as I’m aware).
This was a big sticking point for me. I told them I did not believe we should outsource support, but they did not believe we should have support for developer products at all, so I lost to that “compromise.” After that I volunteered myself to do the training of the support teams, which involved traveling to Manila, Japan and Mexico regularly. This did help but like support as whole, it was a losing battle and quality has declined over time.
Your experience is definitely expected and perhaps even by design. Sadly this is true across Google, if you want help you’d best know a Googler.
I suspect it is going to end up being Google's downfall, or at least, be part of it.
They simply don't know humans. Their repeated failures at building social networks is good enough evidence. They always try to have the human out of the loop, which, to be fair, worked for them in the early days, as their search engine was better than those that relied on human-made directories. But now it is becoming ridiculous. It is a company of bots, for bots. And when they need humans for some reason, they take away most of the value they can add with rigid frameworks, basically treating them like bots. They pay hundreds of thousands not for people who are competent and trustworthy to provide the best service, but instead, to people who write bots to provide mediocre service.
I believe that at some point, a startup who understand humans will eat them up, bit by bit, by feeding on dissatisfied customers who don't want to deal with stupid bots.
I really doubt that this will be google's downfall, theyre too big to fall right now. I think it will be laws.
The bigger they are, the harder they fall; is a saying for a reason. There is no such thing as “too big to fail” otherwise the East India Trading Company would still be in operation.
Sometimes. IBM was still considered big when Buffet invested in them early in the 2010s. And it took almost a decade worth of bad performance for him to finally exit. It might be slowly sliding into irrevelance but its stock hasn't completely tanked -- during or after that period.
Thank you, and good job for isolating the root cause and solution.
Deregulation is an op designed to prevent the people from toppling dragons.
That explains a lot.
This begs the question, isn't this a security vulnerability after all?
very well spoken arguments for a fundemental need for structural diversity, not monoculture, on the net
I don't see the comment arguing for that at all, and I don't think the analogy to crop monocultures being more vulnerable to pests really holds.
There are good reasons we deride "security through obscurity" as valid, and just because "structural diversity" makes automated scanning harder doesn't mean it can't be done. See Shodan.
The idea as I (who is not GP) see it is not that diversity makes scanning harder, it’s that it makes the blast radius smaller. Notably, though, that means we have to be talking about diversity of implementations, not just deployments—numerous deployments of just a few pieces of software can be problematic in their own ways, and of course there have been bugs with huge consequences in Apache, MSRPC, or—dare I say it—sendmail since the very earliest days.
"security through obscurity" is red team trash talk mostly.
It's not that difficult to build the scanner into the firebase dashboard. Ask the developer to provide their website address, do a basic scanning to find the common vulnerability cases, and warn them.
Firebase does that, the problem is "warning them" isn't as simple as it sounds. Developers ignore automated emails and they rarely if ever open the dashboard. Figuring out how to contact the developers using the platform (and get them to care) has been an issue with every developer tool I've worked on.
Looking at https://firebase.google.com/docs/rules/basics, would it be practical to have a "simple security mode" where you can only select from preset security rule templates? (like "Content-owner only" access or "Attribute-based and Role-based" access from the article) Do most apps need really custom rules or they tend to follow similar patterns that would be covered by templates?
A big problem with writing security rules is that almost any mistake is going to be a security problem so you really don't want to touch it if you don't have to. It's also really obvious when the security rules are locked down too much because your app won't function, but really non-obvious when the security rules are too open unless you probe for too much access.
Related idea: force the dev to write test case examples for each security rule where the security rule will deny access.
I've been an advocate for Firebase and Firestore for a while — but will agree to all of these points above.
It's a conceptual model that is not sufficiently explained. How we talk about it on own projects is that each collection should have a conceptual security profile, i.e. is it public, user data, public-but-auth-only, admin-only, etc. and then use the security rule functions to enforce these categories — instead of writing a bespoke set of conditions for each collection.
Thinking about security per-collection instead of per-field mitigates mixing security intent on a single document. If the collection is public, it should not contain any fields that are not public, etc. Firestore triggers can help replicate data as needed from sensitive contexts to public contexts (but never back.)
The problem with this approach is that we need to document the intent of the rules outside of the rules themselves, which makes it easy to incorrectly apply the rules. In the past, writing tests was also a pain — but that has improved a lot.
at Steelhead we use RLS (row level security) to secure multi-tenant Postgres DB. Coolest check we do is create a new Tenant and dbdump with RLS enabled and ensure the dump is empty. Validates all security policies in 1 fell swoop.
The security rules where I fell off my love with Firebase, not that there is anything wrong with the security, but until the point of having to write those security rules, the product experience felt magical, so easy to use, only one app to maintain pretty much.
But with the firebase security rules, I now pretty much have half of a server implemented to get the rules working properly, especially for more complex lookups. And for those rules, the tooling simply wasn't as great as using typescript or the likes.
I haven't used firebase in years tho, so I don't know if it has gotten easier.
Firebase needs something like RLS (row-level security). It needs to be real easy to write authorization rules in the database, in SQL (or similar), if you're going to have apps that directly access the database instead of accessing it via a proxy that implements authorization rules.
Firebase attracts teams that don’t have the experience to stand up a traditional database - which at this point is a much lower bar thanks to tools like RDS. That is a giant strobing red light of a warning for what security expectations should be for the average setup. No matter what genius features the Firebase team may create this was always going to be a support and education battle that Google wasn’t going to fully commit to
It also makes portability a pain. Switching from an app with Firebase calls littered through the frontend and data consistency issues to something like Postgres is a lengthy process.
I view the issue as more of a poor UX choice than anything else. Firebase's interface consists entirely of user-friendly sliders and toggles EXCEPT for the security rules, which is just a flimsy config file. I can understand why newer devs might avoid editing the rules as much as possible and set the bare minimum required to make warnings go away, regardless of whether they're actually secure or not. There should be a more graphical and user-friendly way to set security rules, and devs should be REQUIRED to recheck and confirm them before any other changes can be applied.