return to table of content

How Figma's databases team lived to tell the scale

bjornsing
63 replies
23h49m

One thought that comes up: Wouldn’t it be easier to have each customer in their own (logical) database? I mean, you don’t need transactions across different customers, right? So you’re essentially solving a harder problem than the one you’ve got.

Not sure postgres (logical) databases would scale that well, but don’t see a principal reason why it couldn’t. Has anyone explored this further?

ummonk
8 replies
20h22m

This seems to be an architecture Cloudflare is aiming to support with their SQLite service. One database per customer, each database located in the customer’s primary region.

jasonwatkinspdx
5 replies
20h12m

I think there's quite a few people chasing similar ideas, like Azure's Durable Entities.

I've been calling it the Lots of Little Databases model vs the Globe Spanning Gorilla.

Like the Spanner paper points out, even if your distributed database semantically appears like a single giant instance, in practice performance means developers avoid using distributed joins, etc, because these can lead to shuffling very large amounts of intermediate results across the network. So the illusion of being on a single giant machine ends up leaking through the reality, and people end up writing workarounds for distributed joins like async materialization.

If we give up the single machine illusion we get a lot of simplification, at the cost of features devs were unlikely to use anyhow. I see having consistent distributed commit but without cross shard joins as a really interesting alternative.

And besides scalability I like the extra security rope of fine grained partitioning from the start.

I'll write a blog post along these lines if I get anything worthwhile done.

themoonisachees
2 replies
19h3m

An advantage worth noting is that having actually separated databases means you physically can't make these expensive operations, so a junior dev can't write incredibly inefficient code that would bring down your entire infra.

ummonk
0 replies
14h51m

Also makes it a lot harder for devs to do some footgun and leak data across domains.

aledalgrande
0 replies
14h32m

Bring down the infra or foot you a 6 figures bill at the end of the month

ummonk
0 replies
14h45m

Interesting. Durable Entities strikes me as closer to Cloudflare's Durable Objects (both in name and in design as actors backed by persistent storage).

shalabhc
0 replies
16h5m

"Lots of Little Databases" reminded me of https://www.actordb.com/ which does lots of server-side sqlite instances, but the project now looks defunct.

fauigerzigerk
1 replies
9h49m

That would be fantastic. Unfortunately it's not true. D1 doesn't support the one database per customer approach unless you have just a handful of customers that you can set up manually.

You have to create each database manually using wrangler or the website. Then you have to create a binding for each database in wrangler.toml so that the database becomes accessible as a variable in your Workers code. Then you have to change your Worker source code to do something with that variable. Then you redeploy.

The issue is that Workers cannot create or list databases. There's no API for it.

ummonk
0 replies
40m

They have said they're working on some sort of solution (which they intend to have available by the time D1 exists beta) that will allow some sort of dynamic bindings specifically to address this use case.

In the meantime, though it's not a great solution, one can create and query databases through the REST API instead https://developers.cloudflare.com/api/operations/cloudflare-...

tharkun__
8 replies
19h59m

Yes, we've been doing that at my place basically since the start. Each tenant is a schema in postgres. Works perfectly fine on the one hand, i.e. your tables don't grow to 'infinity' just because you're adding more and more tenants. If there's a particular tenant that has lots of data, only that tenant's indexes and tables grow huge and become slower because of that particular reason etc. If a tenant leaves, you keep the schema around for some time, so they can come back and then at some point you just drop the schema!

It does mean having to upgrade each schema individually, which also makes it both easier and harder. Easier because the tables are smaller, so any schema changes that require things like say a table lock, are locking for a smaller amount of time and won't affect more than the one tenant at a given time. It also means that you can get into an inconsistent state of course, where some of your tenants have all the latest DB upgrades, while it failed on another subset.

At some point Postgres's internal tables become a bit of a "problem", as you want to run as many of these updates in parallel as you can for speed, which could lead to contention on Postgres' administrative side. You'll also still need to shard across multiple actual RDS instances, because you still have many tenants running against a single physical piece of hardware that will show its limitations if too many large or active tenants happen to be on the same shard.

And then you have the problem of keeping a record of which physical RDS instance (i.e. shard) the tenant is on. Your application code will need to look that up (and cache that info for some time ;)) and you have some choice there as well. I.e. do you shard those as well and juggle load balancing as basically a 1:1 mapping to shards or do you have your application layer connect to all database shards and handle any tenants? One is more complicated I would say while the other could run out of connections depending on how you need to scale the application layer and what kind of RDS instance you have.

Syntaf
5 replies
19h7m

I recall a popular rails gem[1] once upon a time that provided multi-tenancy via postgres schemas.

As it turns out, even the company the initially developed the gem ended up ditching the approach due to some of the issues you outlined above.

Managing separate schemas feels like one of those nefarious decisions that make things simple _initially_ but get you into a world of hurt when you need to scale. The company is since defunct but they have an article where they discuss why they ditched the approach [2], TL;DR it's too difficult to maintain and scale

[1] https://github.com/influitive/apartment#tenants-on-different... [2] https://web.archive.org/web/20201108191323/https://influitiv...

tharkun__
3 replies
18h9m

Let's address these one by one based on our experience (the part of the journey that I've been there at least as the implementation of the solution predates me but I live with it).

Migrations

    Things slow down past 100 [...] No one wants friction in their deployment process, especially as we’re attempting to deploy daily or more frequently.
We have more than a thousand schemas per database shard. That is why I said you want things to run in parallel for speed, yes. However, we deploy way more than daily and it's not really an issue in that sense. Schema updates are not that frequent but you do of course need to take into account that you will have to make the schema updates as a separate PR, wait and check that it worked and then deploy your actual change making use of the changes in application code. Which honestly isn't much different from ensuring that your BE changes and FE changes are compatible or made in the right order so that you don't get failed requests because an old FE happens to call a new BE node or vice versa :shrug:

Database resources

    "Need too large of an instance" r3.4xl
We seem to have m5.large. That has 2 virtual cores. Some r5.larges etc. Their r3.4xl has 16?! So not sure what kind of load pattern they have :shrug:

Client Memory Bloat

    Ruby ActiveRecord something
Yeah well, we don't have that, so not sure what to say :shrug: Definitely not a generalized reason to say "can't do, is too complicated for scaling".

Record Identification

    One major drawback of schemas as tenants is that your sequence generators will reside independently in each tenant. 
I respectfully disagree. This is a great advantage because it means just because you get more and more tenants (some of which churn and you throw away their data anyway) your identifiers don't grow past limits as easily. In fact, in most cases the identifiers never run out of runway at all.

They complain about "what if you need to join this data somewhere else, then you need to also add the tenantId". Yeah, so? We also have a data warehouse we we do just that. Not a problem at all. We also have other services than our main one, which do use different database technologies where we use tenant as part of the key (for loads that actually benefit from being in a NoSQL type DB) and there we do not have sharding other than what the NoSQL database does by itself so to speak by dividing the keyspace.

That's it. End of article. Basically, we have none of these issues. They don't mention their actual scale. The only number they mention is the "hundreds of schemas". We have more than ten times that number per physical shard and have tens of thousands of tenants total. Again :shrug:

nightpool
2 replies
16h48m

That's still 2 orders of magnitude smaller than the scale of Figma—they would need to somehow manage millions of Postgres schemas. I don't think it's a realistic possibility

tharkun__
1 replies
1h25m

https://news.ycombinator.com/item?id=39711815 Shopify seems to have millions of customers. Granted, they have MySQL and it might not use database schemas and something more MySQL specific. In general though ...

nightpool
0 replies
3m

"We can attach a shop_id to all shop-owned tables and have it serve as a sharding key. Moving a shop from one shard to another involves selecting all records from all tables that have the desired shop_id and copying them to another MySQL shard. For this post, it’s helpful to think of each pod as a MySQL shard."

Sounds like this is just normal sharding. Completely different from having a totally separate schema like GP proposes (with that approach you don't use sharding keys at all).

cqqxo4zV46cp
0 replies
19h0m

This is the conclusion I came to when faced with the same quandary.

neeleshs
0 replies
18h16m

This is a very common approach and scales quite well. I worked for a company that had thousands of customers and each had their own schema. A single master database that kept track of which customer is on what physical db cluster, and this was globally replicated (EU,ANZ, NA).

Certainly needs a bunch of tooling, but worked well. Some apps were stateless and could connect to any physical cluster. Some others were sticky and only connected to a subset.

Similar architecture in my current company as well and we serve nearly a thousand customer instances served across 4 physical clusters.

We do have some basic tools to provision new customers on the emptiest cluster, move customers from one cluster to another etc

paxys
7 replies
20h32m

This works great until (1) your largest customer outgrows the largest available DB (happens sooner than you'd think for large companies) or (2) you do need transactions across different customers, say to facilitate some kind of sharing. Going all-in on the isolated tenant strategy means when you hit one of these cases it's a nightmare to unwind and rearchitect your entire DB layer.

jasonwatkinspdx
2 replies
19h50m

I'd respond by saying (1) is more rare than you're asserting.

There is a huge long tail of companies with datasets that won't fit on a single machine but can be handled by a dozen or so, and where no single customer or service is near the limits of an individual node.

These customers are poorly served at the moment. Most of the easy to implement SaaS options for what they need would be hugely costly vs a small fleet of db servers they administer. Meanwhile, most of the open source options are cargo culting Google or Facebook style architecture, which is a huge burden for a small company. I mean do you really want to run K8S when you have 10 servers in total?

I think there's a lot of interesting stuff happening in this end of the market that's not trying to be a mini Google, like Fly.io.

As for (2), I think a middle ground is supporting cross shard transactions but not joins. This works well enough for VoltDB et all.

bastawhiz
1 replies
17h53m

that won't fit on a single machine

It's rarely an issue with the number of bytes and more an issue with hot shards. Whatever shard Google is on (that is, Google the Figma customer) is surely going to be one _hell_ of a hot shard. They have more designers/engineers/PMs actively using Figma than most startups have _users total_. You don't need more than one really really hot customer for this to become more than a hypothetical problem.

When you start to think about it that way, suddenly you need to seriously consider your iops (if you're on RDS) or how much redundancy that physical machine's SSDs have (if you're running it on your own boxes).

ndriscoll
0 replies
13h37m

Google still only has ~180k employees, and obviously not all of them use figma, and obviously not all of their figma users are performing actions simultaneously. I'd be surprised if it broke 10k QPS (would an org like Google even have 10k peak active user sessions? Seems doubtful). Human generated traffic tends to just not reach that large of scales unless you're trying to fit the entire planet on one instance.

RDS can be absurdly limited with IOPS, granted, but a modern laptop for example ought to be up to the task. Realistically you could probably be fine even on RDS but you might need to pay through the nose for extra IOPS.

zarzavat
1 replies
19h38m

Figma is more or less a desktop application that happens to run in a web browser.

If I use Photoshop to edit a .psd file I don’t think “man that psd file should really be stored in a single planet-sized database of all psd files in existence”. It’s just a file on my computer.

Figma requires a little bit more intermingling of data than Photoshop, it has multiuser support for one, so a pure local storage based approach wouldn’t work. But, at its heart it’s still based on the document model. When you open a Figma document it’s its own isolated little universe, the connections with resources outside the document are limited, and that matches user expectations.

joegibbs
0 replies
8h44m

Agreed - it's not like you'd ever have to do SELECT * FROM Shapes WHERE 'type' = 'circle'. Could they have stored each document as a file, stored references to the files in the database, opened the file in the backend when someone opened it in the frontend, and written it back when they've stopped editing it?

winrid
0 replies
20h12m

A figma customer won't exceed the requirements of an i3.metal...

jddj
0 replies
19h51m

Can you give an example of when a single customer has outgrown the largest available DB?

jakewins
5 replies
23h42m

The problem - conceptually - is made much simpler this way; we make use of this at work.

However you will still have shared resource problems - some rogue query destroys IOPS in one tenant now ends up bringing down all tenants etc. There are in theory databases that solve this as well, but my experience has been that at that point what you buy into is a bad version of resource sharing - ie what an operating system does - and you’re better off using OS mechanisms

In other words: yes, but you still have noisy neighbours, and may be better off running lots of small fully separated DBMSes than a big logically separated one

hacker_newz
4 replies
20h44m

If tenants are on separate databases how would that be an issue?

sgarland
2 replies
20h19m

Because database, in Postgres terms, doesn’t mean physical node. It’s more akin to a VM than anything else. The term for an installation of Postgres is database cluster, which can contain N databases, each of which can contain M schemas.

albert_e
1 replies
18h50m

Thanks! Is there a good primer to this terminology that clarifies these terms on various popular database and cloud platforms?

It seems there is good potential for confusion unless we use the same terms consistently when discussing architecture and design across teams.

Even the term RDS (Relational Database Service) is sometimes said to be inaccurate since it is a "Relational Database SERVER as a Service"

A few related terms that cause confusion:

"Schema" could refer to a Database Schema or in some contexts, a single table's logical data structure. (Or a single data set's data structure -- or a standard for one, like JSON Schema)

Data Catalog products like "AWS Glue Data Catalog" which only store the metadata or schemas of the table they crawl ... refer to the entities they store as "Databases" and "Tables" (but no "Schemas") and documentation includes guides talk about "creating a database"[1] and "creating a table"[2] in AWS Glue. There has to be a better way to refer to all these entities without using the word schema with so many meanings -- or calling both physical tables and their metadata as "tables". Otherwise this is needlessly confusing and hard for beginners.

---

EDIT: even more madness. This StackOverflow discussion [3] has more examples of confusing usage:

On Oracle .... A Schema is effectively a user. More specifically it's a set of tables/procs/indexes etc owned by a user. Another user has a different schema (tables he/she owns) however user can also see any schemas they have select priviliedges on.

MySQL: database/schema :: table

SQL Server: database :: (schema/namespace ::) table

Oracle: database/schema/user :: (tablespace ::) table

[1]: https://docs.aws.amazon.com/glue/latest/dg/define-database.h...

[2]: https://docs.aws.amazon.com/glue/latest/dg/tables-described....

[3]: https://stackoverflow.com/questions/298739/what-is-the-diffe...

sgarland
0 replies
18h0m

Not AFAIK. In MySQL, the only time “cluster” is used is to refer to NDB Cluster, which is a distributed DB product. Schema means a logical grouping of tables, the same (more or less) as Postgres.

As to schema itself, yes, it’s heavily overloaded, and you just to grok it from context. I can talk about a table’s schema or a DB’s schema, or as you mentioned, JSON schema. Although the latter is helped by simply not using JSON in a relational DB.

You must remember that SQL is an ancient language, and the relational model is even older. There are going to be oddities.

jakewins
0 replies
20h24m

Separate logical databases, within the same RDBMs, so sharing CPU, disks, RAM etc

eatonphil
5 replies
23h44m

I imagine it only gets you so far. What do you do about customers like Walmart or Oracle? Hundreds, if not thousands, of users all leaving hundreds of comments on Figma files every day. If you want good latency without giving up strong consistency (which the article says they want) you'll need to keep sharding.

willsmith72
1 replies
23h21m

A single db can handle that load easily

paulryanrogers
0 replies
17h37m

Do you mean an RDBMS running on a single, big-iron, bare-metal server?

jeremyjh
1 replies
22h1m

They were running their whole business on one RDS instance 4 years ago. Do you think they now have one customer larger than all their customers combined 4 years ago?

eatonphil
0 replies
21h58m

Figma’s database stack has grown almost 100x since 2020

The first sentence from the article seems to suggest its possible?

infecto
0 replies
21h6m

I bet it gets you further than you imagine. Entirely depends on the backend services and what they touch but in this scenario you would be deploying/scaling that service based on the customer seat size. I suspect that even for large enterprise customers, the users actively touching Figma are not reaching he thousands but I am happy to be wrong.

After all, Stackoverflow is running off of a handful of machines.

menthe
4 replies
18h44m

This, seriously. The long-term maintenance, tribal knowledge & risks associated with this giant hack will be greater than anything they'd ever have expected. Inb4 global outage post-mortem & key-man dependency salaries.

There's no virtually no excuse not spinning up a pg pod (or two) for each tenant - heck even a namespace with the whole stack.

Embed your 4-phases migrations directly in your releases / deployments, slap a py script to manage progressive rollouts, and you're done.

Discovery is automated, blast / loss radius is reduced to the smallest denominator, you can now monitor / pin / adjust the stack for each customer individually as necessary, sort the release ordering / schedule based on client criticality / sensitivity, you can now easily geolocate the deployment to the tenant's location, charge by resource usage, and much more.

And you can still query & roll-up all of your databases at once for analytics with Trino/DBT with nothing more but a yaml inventory.

No magic, no proprietary garbage.

nightpool
3 replies
16h51m

Figma has millions of customers. The idea of having a Postgres pod for each one would be nearly impossible without completely overhauling their DB choice.

menthe
1 replies
14h10m

You are making a major conflation here. While they do millions of users, they were last reported to only have ~60k tenants.

Decently sized EKS nodes can easily hold nearly 800 pods each (as documented), that'd make it 75 nodes. Each EKS cluster supports up to 13,500 nodes. Spread in a couple of regions to improve your customer experience, you're looking at 20 EKS nodes per cluster. This is a nothingburger.

Besides, it's far from being rocket science to co-locate tenant schemas on medium-sized pg instances, monitor tenant growth, and re-balance schemas as necessary. Tenants' contracts does not evolve overnight, and certainly does not grow orders of magnitude on week over week basis - a company using Figma either has 10 seats, 100 seats, 1000, or 10,000 seats. It's easy to plan ahead for. And I would MUCH rather having to think of re-balancing a heavy hitter customer's schema to another instance every now and then (can be 100% automated too), compared to facing a business-wide SPOF, and having to hire L07+ DBAs to maintain a proprietary query parser / planner / router.

Hell, OVH does tenant-based deployments of Ceph clusters, with collocated/coscheduled SSD/HDD hardware and does hot-spot resolution. And running Ceph is significantly more demanding and admin+monitoring heavy.

nightpool
0 replies
4h29m

Reported where? Does that include a monolithic "free tenant" that would be larger than thousands of their other tenants put together? Every Figma user has their own personal workspace along with the workspace of any organizations they may be invited to

asguy
0 replies
14h20m

Thousands of directories over thousands of installations? It’s not that far fetched.

danpalmer
3 replies
21h23m

store billions of databases

This is sort of true and sort of false. When you think of a "database", if you're thinking of a Postgres database, you're way off the reality of what "database" means here.

FoundationDB has a concept called "layers", and essentially they have created a layer that looks like a separate database on top of a layer that is separately encrypted groups of keys. They don't have billions of FoundationDB clusters or machines, and at the infra level, i.e. the instances of the FoundationDB server software, it's unaware of individual "databases".

A closer analogy would be like having billions of tables, but even that isn't accurate because in relational databases a table is usually a notably more static concept than data in a table. The closest analogy would be that each of the billions of users has a bunch of rows with a user-id field on them, and there's a proxy that filters everything such that you can view the table as if it only had one user's data in it.

To be clear, FoundationDB is awesome and Apple have done some really cool stuff with it, but it's less crazy/impressive than it sounds.

dexwiz
2 replies
20h37m

This sounds like a pretty standard multitenant datastore. Everything has a user/group Id on it, and a logical layer that locks a connection to a specific group.

pests
0 replies
16h30m

FoundationDB is really just a set of structures and tools to build any type of database you want on top of a solid foundation.

"FoundationDB decouples its data storage technology from its data model. FoundationDB’s core ordered key-value storage technology can be efficiently adapted and remapped to a broad array of rich data models. Using indexing as an example, FoundationDB’s core provides no indexing and never will. Instead, a layer provides indexing by storing two kinds of key-values, one for the data and one for the index."

https://apple.github.io/foundationdb/layer-concept.html

Then existing standard layers like the Record layer, providing "(very) roughly equivalent to a simple relational database" providing structured types, index, complex types, queries, etc.

https://github.com/FoundationDB/fdb-record-layer

Or one for documents, which speaks the MongoDB wire protocol

https://github.com/FoundationDB/fdb-document-layer

danpalmer
0 replies
19h50m

Yeah, the advantage or difference here is that these "layers" are a common design pattern with FoundationDB, several ship in FDB by default, and you're encouraged to make more, so the database certainly has better support than just adding a column for TenantID, but still you're right that it's not too out there.

aeyes
3 replies
23h18m

I have pondered about this for quite some time and came to the conclusion that it would make schema migrations more difficult to handle. I think Shopify is using an approach which is similar to what you are describing. The advantage is that you don't end up with hot shards because you can move around large customers independently.

In practice there isn't a big difference, they just colocate several customers according to their sharding key in the same logical database.

jamesfinlayson
1 replies
20h11m

I remember Postgres table spaces being used to separate customers at a previous job - I can't remember how migrations were handled (pretty sure they were applied per table space) but I don't think it was a problem (at our scale anyway).

vmfunction
0 replies
19h7m

There is usually not much point in making more than one tablespace per logical file system, since you cannot control the location of individual files within a logical file system. However, PostgreSQL does not enforce any such limitation, and indeed it is not directly aware of the file system boundaries on your system. It just stores files in the directories you tell it to use.

seems like the limitation is the logical file system. Which probably will work for most users.

yard2010
0 replies
19h59m

I worked in a place that had a database for each tenant and the schema migrations were a real pain. Every time everything goes smoothly except these few databases that have an edge case that screws the whole migration.

icedchai
2 replies
19h54m

I worked at a place that did this with MySQL. Every tiny, trial account got their own database. Every huge, actual customer got their own database. Migrations were kinda painful. I would think carefully about doing it this way.

freefaler
1 replies
19h23m

we've been doing this for 20k databases with mysql for the last 10+ years. It solves more problems than it creates. Migrations are trickier, but you get sharding, data isolation and easier backups that way.

icedchai
0 replies
19h14m

I'm not saying it's always a bad idea, you just need to think about what you're doing. This was closer to 15 years ago now. We had to develop a bunch of our own tooling, and make our own modifications to frameworks that are now ancient history.

petervandijck
0 replies
19h46m

Ongoing product development with migrations, tweaking indexes etc. becomes really hard. Every small database tweak now has to be deployed over 1000s of databases.

jimbokun
0 replies
18h13m

If customers are not of similar size, and you have a lot of customers, managing all the different databases can be a big headache.

Having a more granular partition key makes it easier to shard the data more evenly.

infra_dev
0 replies
22h43m

Nile is a Serverless Postgres that virtualizes tenants/customers. It is specifically built for SaaS companies similar to Figma https://www.thenile.dev/. I am the CEO of Nile.

erikpukinskis
0 replies
17h33m

Im curious for anyone who has done this: what’s the point of going all the way from one-db to one-db-per-customer? Why not just split the customers to 2 databases, then to 3, etc? Seems like the same level of system complexity but you avoid the lots-of-databases scaling problem.

You probably don’t even need to be able to migrate people between shards… just put everyone on one db until you hit 50% utilization, then spin up a fresh db and put all new customers on that one. Repeat whenever you run out of databases under 50%.

cavisne
0 replies
20h18m

Spanners interleaved tables seem like a similar solution, ie you interleave customer data so it all ends up on the same set of hosts for performance, while still having the ability to create transactions across customers.

N_A_T_E
0 replies
22h57m

It sounds like they actually did something like this. Their shard key selection could be customer, project, folder or something in their data model at a reasonably high logical level in their hierarchy.

GordonS
0 replies
22h23m

I've done something like this before, with each customer getting their own schema within a single Postgres instance.

nosefrog
27 replies
1d

Coming from Google, where Spanner is this magical technology that supports infinite horizontal sharding with transactions and has become the standard storage engine for everything at Google (almost every project not using Spanner was moving to Spanner), I'm curious how Figma evaluated Cloud Spanner. Cloud Spanner does have a postgres translation layer, though I don't know how well it works.

It seems like they've (hopefully only temporarily) given up real transactional support with their horizontal postgres scheme?

umvi
12 replies
23h57m

Never a good idea to rely on Google proprietary tech (unless you are Google)... it could be sunset at any time without warning. I use GCP but I try my best to stay Google agnostic (avoid GCP-only offerings, etc) so that I can move to AWS if Google pulls the rug out from under me.

rockostrich
7 replies
22h59m

GCP products have a much better track record than Google consumer products when it comes to support since there are usually enterprise customers with multi-year contracts worth tens, if not hundreds, of millions of dollars using them.

callalex
3 replies
22h36m

That’s what AppEngine customers thought.

weitendorf
1 replies
22h28m

I worked on AppEngine (well, Serverless Compute) at Google for over 4 years and left on Friday. Did something happen to AppEngine in the last week?

callalex
0 replies
22h16m

I’m talking about the pricing disaster that happened in 2017/2018 where user prices went up from 10x-100x because Google wanted to kill the product without actually killing it.

tempnow987
0 replies
19h14m

Didn't they also hammer folks using google maps in a business / cloud context?

I was also an old app engine user, but bailed ages ago (original app engine).

jerrygenser
1 replies
20h53m

IoT is one example of a big backbone service that was sunset.

endisneigh
0 replies
20h38m

It had barely any usage though from what I can tell from searching about it.

Not that it’s any solace to those affected.

marcinzm
0 replies
20h3m

AI Platform is a recent example that’s been deprecated.

weitendorf
3 replies
22h12m

I'm biased having worked on GCP, but I think GCP actually has a very good track record of not sunsetting entire products or removing core functionality. When I worked on AppEngine, I would often see apps written 10+ years ago still chugging along.

It is true though that GCP sometimes sunsets specific product functionality, requiring changes on the customers' part. Some of these are unavoidable (eg committing to apply security patches to Python 2.7 given that the rest of the world is mostly not upstreaming these patches anymore), but not all of them.

I would certainly use Cloud Spanner externally now that I've left the company, and IMO spanner has such a compelling featureset that it's a strong reason to use GCP for greenfield development. The problem though is that it's only available on GCP, and could become expensive at large scale.

nojvek
1 replies
17h33m

My previous employer was a GCP customer. Google did pull occasional shenanigans totally breaking us with random upgrades without notifying us. Their support wouldn’t acknowledge it was their fault until we were mad at them.

My newer employer is AWS. Their offerings are a lot more stable and support is helpful.

If you want to build a serious business I would avoid GCP. Google doesn’t really give a shit at winning Cloud. Ads is their core business.

weitendorf
0 replies
16h15m

Ugh I'm so sorry, you should definitely have been notified in advance about breaking changes, unless they were accidental bugs and regressions. That was something we took pretty seriously for the products I worked on.

I have definitely enjoyed using AWS support. But I've also encountered some broken/janky functionality (eg using custom domain + api gateway + websockets + Lambda for the apparently niche-task of having a real website use the Lambda-websockets integration) on AWS that I don't think would have made it to production at Google. I also really dislike how they handle project-level logging compared to how it's done on GCP.

Ultimately I do think some GCP products like Bigquery, Cloud Run (I'm biased here), and Spanner as well as general DevEx/reliability factors are compelling enough for GCP to be worth serious consideration, even if AWS offers better support.

jerrygenser
0 replies
20h52m

It is true though that GCP sometimes sunsets specific product functionality, requiring changes on the customers' part. Some of these are unavoidable (eg committing to apply security patches to Python 2.7 given that the rest of the world is mostly not upstreaming these patches anymore), but not all of them.

A good example is probably IoT. I've heard first hand anecdotes of very difficult migrations off this service.

shalabhc
2 replies
23h49m

Global consistency is expensive, both latency-wise and cost-wise. In reality most apps don't need global serializability across all objects. For instance, you probably don't need serializability across different tenants, organizations, workspaces, etc. Spanner provides serializability across all objects IIUC - so you pay for it whether you need it or not.

The other side of something like Spanner is the quorum-based latency is often optimized by adding another cache on top, which instantly defeats the original consistency guarantees. The consistency of (spanner+my_cache) is not the same as the consistency of spanner. So if we're back to app level consistency guarantees anyway, turns out the "managed" solution is only partial.

Ideally the managed db systems would have flexible consistency, allowing me to configure not just which object sets need consistency but also letting me configure caches with lag tolerance. This would let me choose trade-offs without having to implement consistent caching and other optimization tricks on top of globally consistent/serializable databases.

foota
0 replies
21h58m

While it doesn't help much with the costs of replication, Spanner can be configured with read only replicas that don't participate in voting for commits, so they don't impact the quorum latency.

Reads can then be done with different consistency requirements, e.g., bounded staleness (which guarantees data less stale than the time bound requested).

See https://cloud.google.com/spanner/docs/reference/rest/v1/Tran... or https://cloud.google.com/spanner/docs/reads#read_types and https://cloud.google.com/spanner/docs/create-manage-configur...

opportune
2 replies
22h48m

My perspective from working both inside and outside of Google:

The external spanner documentation doesn’t seem as good as the internal documentation, in my opinion. Because it’s not generally well known outside of G, they ought to do a better job explaining it and its benefits. It truly is magical technology but you have to be a database nerd to see why.

It’s also pretty expensive and because you generally need to rewrite your applications to work with it, there is a degree of lockin. So taking on Spanner is a risky proposition - if your prices get hiked or it starts costing more than you want, you’ll have to spend even more time and money migrating off it. Spanner’s advantages over other DBs (trying to “solve” the CAP theorem) then become a curse, because it’s hard to find any other DB that gives you horizontal scaling, ACID, and high availability out of the box, and you might have to solve those problems yourself/redesign the rest of your system.

Personally I would consider using Cloud Spanner, but I wouldn’t bet my business on it.

nojvek
1 replies
17h31m

If you really have that much data and traffic, the $ costs start to add up to multiple engineer comp costs. At that point it’s cheaper to move to something you have good control over.

I.e sharding at application layer and connecting to the DB instance replica where the customer data is hosted.

weitendorf
0 replies
15h44m

Depends. The cost may pay for itself but the engineers you have already may have higher ROI things to do. It's also nice to have operational stuff managed for you. Personally I'd be happy to pay extra for the kinds of problems Spanner solves to free myself up to do other things (to a point, ofc).

sharding at application layer and connecting to the DB instance replica where the customer data is hosted.

Spanner does global consistency/replication. If having good performance per-tenant globally is a concern, this helps a lot, and is hard to implement on your own. It can also ultimately save you money by limiting cross-region traffic.

hipadev23
2 replies
1d

The problem is nobody outside Google trusts them to run or operate anything.

Edit: To the Googlers downvoting these comments. Your behavior only reinforces our views.

szundi
1 replies
1d

Google means: good chance discontinued after you have worked out the bugs and have a stable system at last

groestl
0 replies
23h18m

This is definitely not the case with their core cloud products.

almost every project not using Spanner was moving to Spanner

This even includes Datastore. Even Datastore moved to Spanner.

echelon
2 replies
1d

Why would anyone marry themselves to Google? That sounds like the most boneheaded move possible.

First, you should never be beholden to a single vendor for the most critical technological underpinnings. You're backing yourself into a corner.

But more importantly, Google can't even figure out how to prioritize their cloud efforts. That's not a good partnership to be in for anyone except Google.

I wouldn't care if my solution was 10x worse than Cloud Spanner from a technology perspective. It'd be 1000x better from a strategy perspective.

You can hire engineers to do consistency at scale. It's your core competency and you can't just handwave and outsource that, lest you wind up stuck in a crevasse. Hire smart engineers and do the work yourself. It'll pay off.

Thaxll
1 replies
21h12m

You can't, that's why spanner only exists a Google. That tech is that good, not found anywhere else.

echelon
0 replies
19h52m

But you don't need it. There are a limitless number of ways to deal with the problems spanner solves. And by choosing your own, you can focus on the subset that matter to your case and not be tied down to Google.

zenbowman
0 replies
21h2m

I wouldn't say "infinite", its still susceptible to read hotspotting; and while fine-grained locking enables generally higher write throughputs, you can still get in a situation where interconnected updates end up being pretty slow.

That said, its way better than anything else I've used in my career.

ksb
0 replies
1d

Ping time from AWS data centers to GCP ones

adastral
23 replies
23h47m

I see they don't mention Citus (https://github.com/citusdata/citus), which is already a fairly mature native Postgres extension. From the details given in the article, it sounds like they just reimplemented it.

I wonder if they were unaware of it or disregarded it for a reason —I currently am in a similar situation as the one described in the blog, trying to shard a massive Postgres DB.

_boffin_
7 replies
23h43m

how "massive" is massive in your case?

dijit
5 replies
23h36m

I've had CitusDB running across 68 bare metal machines (40 vCPU, 768GiB ram, 20TiB of storage each + 40GiB network links) and it ran decently well.

Not sure what your definition of massive is, I think Spanner would easily beat it.

Also, it's very use-case dependent, you can't "just use" Citus for everything, it's not quite as flexible as a bog-standard pgsql install due to the way it's sharding, you have to be a tad more careful with your data model.

VHRanger
2 replies
16h28m

Is there a reason there's comparatively little storage in your machines in relation to RAM or even CPUs?

Do your machines do compute heavy loads or something?

For a DB I'd expect a lot more storage per node

dijit
1 replies
14h15m

NVMe SSDs aren't so large unfortunately.

a 1U server has capacity for 8 drives, we used 2 slots for the OS (RAID1), 2 slots for the WAL volume (2 slots) leaving only 4 slots in RAID10.

So I'm already cheating a little and claiming WAL storage was part of total storage.

VHRanger
0 replies
5h35m

Shouldn't you try to get something with PCIe bifurcation in this case?

I doubt you're saturating the PCIe bus bandwidth on any of them?

I imagine your DB is extremely high performance, though!

skunkworker
1 replies
21h33m

What is your definition of "decently well", and is your primary cluster (without replicas) above 1PB?

simonw
0 replies
20h48m

They said 20TiB * 68, which I think is 1.5PB.

adastral
0 replies
23h36m

Around ten heavily-updated (50-400k updated rows/min) tables ranging between 500M and 5B rows, with a couple tables over 40B rows each (5TB each IIRC).

junto
5 replies
23h25m

Before clicking on the article I assumed it was Citus, and was surprised when it wasn’t.

Maybe because CitusData was bought by Microsoft around the same time, so Microsoft could create “Azure Cosmos DB for Postgres Cluster”, yet another one of Microsoft’s typical product naming crapshoots.

victor106
4 replies
22h22m

yet another one of Microsoft’s typical product naming crapshoots.

Well said. I haven't seen any company as terrible as Microsoft at naming things. Anyone know why?

salynchnew
1 replies
21h8m

Naming things is hard.

At a previous employer, I saw several cool-ish open source projects instantly doomed to obscurity by picking a name that either completely duplicated the name of an existing OSS project or were guaranteed to have terrible SEO for another reason.

However, Microsoft seems to have a unique crossover of fragmented business units and centralized marketing. That's why you end up with Azure -> Subproject -> Actual Product/Service word soup. Perviously, they did this with the Windows Live brand from 2005-2012, and "Xbox" for a wide range of gaming projects (many of which were on PC).

studmuffin650
0 replies
21h22m

AWS is putting up good fight

r-bar
0 replies
19h18m

The committee wanted Cosmos, Azure, and Postgres all in the name and wouldn't compromise.

jabart
2 replies
23h4m

Figma uses AWS RDS, RDS doesn't list citus as a supported extension.

gen220
1 replies
21h54m

This is my guess of why they didn't use Citus. They weren't interested in the options of (1) going multi-cloud [DB in Azure Cosmos / Backend(s) in AWS] (2) going all-in on Azure [DB in Azure Cosmos / Backend(s) in Azure] (3) self-managing Postgres+Citus in EC2.

It'd be interesting to compare the expected capex of developing this in-house solution + the opex of maintaining it vs the same categories of expected costs for option (3) – because I imagine that's probably the most palatable option.

They also may have pre-paid for dedicated RDS instances for the next X years (before this horizontal scaling initiative began, to boot), as AWS allows companies to do this at a pretty steep discount rate, which would probably tilt them away from (3).

sgarland
0 replies
20h27m

Especially because Option 3 lets you go waaaay farther on vertical scaling, since you can get native NVMe drives (they mentioned hitting IOPS limits for RDS), more exotic instance classes with far more RAM, and do stuff like ZFS for native compression and snapshots.

gshulegaard
2 replies
22h38m

I have worked on teams that have both sharded and partitioned PostgreSQL ourselves (somewhat like Figma) (Postgres 9.4-ish time frame) as well as those that have utilized Citus. I am a strong proponent of Citus and point colleagues in that direction frequently, but depending on how long ago Figma was considering this path I will say that there were some very interesting limitations to Citus not that long ago.

For example, it was only 2 years ago that Citus allowed the joining of data in "local" tables and data retrieved from distributed tables (https://www.citusdata.com/updates/v11-0). In this major update as well, Citus enabled _any_ node to handle queries, previously all queries (whether or not it was modifying data) had to go through the "coordinator" node in your cluster. This could turn into a pretty significant bottleneck which had ramifications for your cluster administration and choices made about how to shape your data (what goes into local tables, reference tables, or distributed tables).

Again, huge fan of Citus, but it's not a magic bullet that makes it so you no longer have to think about scale when using Postgres. It makes it _much_ easier and adds some killer features that push complexity down the stack such that it is _almost_ completely abstracted from application logic. But you still have be cognizant of it, sometimes even altering your data model to accommodate.

gen220
1 replies
21h50m

You also benefit from the tailwind of the CitusData team making continued improvement to the extension, whereas an in-house system depends on your company's ability to hire and retain people to maintain + improve the in-house system.

It's hard to account for the value of benefits that have yet to accrue, but this kind of analysis, even if you pretty heavily-discount that future value, tilts the ROI in favor of solutions like Citus, IMO. Especially if your time horizon is 5+ or 10+ years out.

Like you said, if they made this decision 3ish years ago, you would have had to be pretty trusting on that future value. A choice, made today, hinges less on that variable.

wayne-li2
0 replies
21h4m

Huh, I would have thought the opposite. Companies at Figma size are easily able to hire talent to maintain a core part of their engineering stack. On the other hand, they retain no control of Citus decision making. Those tailwinds could easily have been headwinds if they went in a direction that did not suit Figma.

sgarland
0 replies
23h23m

I thought of that as well. The only thing I could think of is that they mentioned that they don't want to move off of RDS, and there is 0% chance of Citus coming to AWS since Microsoft bought them.

iamdanieljohns
0 replies
23h37m

I would love to see a comparison of the major PostgresQL services such as Citus, EDB, Crunchy, Neon, and some OSS distributions/packages

gregors
0 replies
21h57m

Where's the fun in that? I'm not being snarky either. Maybe it's not the best decision business-wise, but I guarantee it was more challenging and more fun. There's something to be said for that.

thekhatribharat
17 replies
19h23m

1. They mention that the largest tables ran into several TBs, and they would have soon topped the max IOPS supported by RDS. RDS for PostgreSQL peaks at 256,000 IOPS for a 64 TB volume. For a multi-AZ setup, this costs ~$70K/mo.

2. Let's assume the final outcome was a 5-way shard with each shard supporting ~50,000 IOPS and ~12 TB data. For a multi-AZ setup, this costs ~$100K/mo.

3. It took 9 months to shard their first table. Since it required application changes as well, let's assume this was 9mo * 20 work days/mo * (3 DB engineers + 2 app engineers) = 900 work days. Even at $100K avg. annual pay for an engineer, this is ~$400K.

4. A PostgreSQL-compatible NewSQL like YugabyteDB should cost ~$15K/mo to match top-of-the-line RDS performance. So Figma spent ~25x ($400K/$15K) to implement horizontal sharding in-house, and is still on RDS which costs ~6x ($100K/$15K)

avianlyric
10 replies
19h6m

Their rationale for this choice is covered in the article somewhat extensively near the top.

Additionally, over the past few years, we’ve developed a lot of expertise on how to reliably and performantly run RDS Postgres in-house. While migrating, we would have had to rebuild our domain expertise from scratch. Given our very aggressive growth rate, we had only months of runway remaining. De-risking an entirely new storage layer and completing an end-to-end-migration of our most business-critical use cases would have been extremely risky on the necessary timeline. We favored known low-risk solutions over potentially easier options with much higher uncertainty, where we had less control over the outcome.

TL;DR they were working in a short timeline, with a limited team size, and wanted to minimise any risks to the business.

Clearly cost is an issue for Figma, but downtime, or worse data loss, would have a ginormous impact on their business and potential future growth. If your product is already profitable, your user base growing fast, and with your ARR. Why would risk that growth and future ARR just to save a few $10Ks a month? A very low risk DB migration that lets you keep scaling and raking in more money, is way better than a high risk migration that might save some cash in the long term, but also risks killing your primary business if it goes wrong.

brigadier132
9 replies
18h45m

Ok, what risk? Cockroachdb is already proven technology and costs marginally more (if you use their serverless setup, it's free until you hit real scale). At the startups I've been at that hit scale, scaling sql was always a massive undertaking and affected product development on every single time.

If you don't want downtime, don't use databases that require downtime to do a migration?

Netflix, roblox, every single online gambling website all use cockroachdb.

kamikaz1k
6 replies
18h34m

Never used cockroach so pardon my ignorance, but are there no operational challenges with running/using them? Or are they the same challenges? And how compatible is it from an application developer perspective?

jamra
3 replies
18h23m

CRDB is Postgres compliant so the wire protocol and SQL syntax is all Postgres. It should be a 1 to 1.

jimbokun
2 replies
18h15m

Are all the corresponding latencies for every query one to one too?

hinkley
0 replies
16h44m

In ye olden times I used to stop bosses from throwing away the slowest machine we had, and try to get at least one faster machine.

It’s still somewhat the case, but at the time the world was rotten with concurrent code that only worked because an implicit invariant (almost) always held. One that was enforced by the relative time or latency involved with two competing tasks. Get new motherboards or storage or memory and that invariant goes from failing only when the exact right packet loss happens, to failing every day, or hour, or minute.

Yes, it’s a bug, but it wasn’t on your radar and the system was trucking along yesterday and now everything is on fire.

The people who know this think the parent is a very interesting question. The people who don’t, tend to think it’s a non sequitur.

adbachman
0 replies
6h4m

This is the important question.

We evaluated several horizontally scalable DBs and Cockroach was by far the slowest for our access patterns.

brigadier132
0 replies
18h25m

The managed service is hassle free and it's auto sharded so you don't have traditional scaling issues. You do need to think about how your index choices spread writes and reads on the cluster to avoid hotspots. It's almost completely compatible with postgres wire protocol but it doesn't support things like extensions for the most part.

MarkMarine
0 replies
17h4m

There are TONS of operational issues running cockroach. At the last company I was at cockroach was probably over used as a magical way to run multiple DCs and keep things consistent without high developer overhead, but it was #1 source of large outages. So much so that we’d run a cockroach segmented out for a single microservice to limit the blast radius when it eventually failed.

That and its comically more expensive than Postgres, if you think IOPs are expensive wait till you see the service contract.

martin-adams
1 replies
18h37m

Sounds like their discomfort was in the migration path to 'any other database' alongside not having the experience with another database to mitigate any unknown unknowns.

During our evaluation, we explored CockroachDB, TiDB, Spanner, and Vitess. However, switching to any of these alternative databases would have required a complex data migration to ensure consistency and reliability across two different database stores.
thesz
0 replies
10h23m

ensure consistency and reliability across two different database stores.

This is main known known. And this is hard thing to attain.

My favorite story on that is testing of tendermint consensus implementation [1]. The testing process found a way to break the consensus and the reason was that protocol implementation and KV store controlled by protocol used different databases.

[1] https://jepsen.io/analyses/tendermint-0-10-2

bastawhiz
2 replies
17h59m

A PostgreSQL-compatible NewSQL like YugabyteDB should cost ~$15K/mo to match top-of-the-line RDS performance.

"to match" is doing a lot of work here. It's extremely unwise that a "compatible" database will have the same performance characteristics and no performance cliffs.

So Figma spent ~25x ($400K/$15K)

They nearly got acquired for $20B, I don't think they give a hoot about 400K if it means keeping their stack the same and getting to keep all of the existing organizational knowledge about how to keep the thing online.

thanksgiving
0 replies
17h18m

That’s the problem with case studies, isn’t it? What works for one company in one industry might be a death sentence for another organization with slimmer margins…

__float
0 replies
17h12m

They're also using sticker pricing, which as we know is likely far from what they actually paid.

(And the rest of the argument sounds a lot like our former employer with their...choice of document DB :) which I'll defend too!)

weitendorf
0 replies
16h6m

Figma, worth $10billion, was migrating what seems like their core production data. They probably didn't want to bet the company on a comparatively small software vendor like Yugabyte.

Most likely the engineering cost was much much higher than your quotes, but still insignificant compared to the potential risks. And migrating from RDS to not-RDS could easily not have been cheap in engineering time either, depending on how much software they've built around it.

jimbokun
0 replies
18h16m

Past a certain data size, migrations are always a nightmare. For a much longer time than what you initially estimated, you are managing two systems with all the related operational costs and complexity, as well as all of the IOPs and bandwidth migrating the data.

I imagine scaling out RDS instead mitigated a lot of those costs.

chikitabanana
0 replies
19h15m

bhai, use the western numbering system, for clarity's sake. Thanks for the summary

esafak
12 replies
1d

This is such a familiar story. Company starts with a centralized database, runs into scaling problems, then spends man-years sharding it. Just use a distributed database.

jack_riminton
5 replies
1d

If every startup used every scale-proof method available from the beginning they'd never have the time or resources to build the products to get the customers that would require them to use the scale-proof methods

esafak
4 replies
1d

I agree with your general point but there are relational distributed databases, open source and "serverless", that are compatible with MySQL (planetscale, tidb, vitess...) and postgres (citus, cockroachdb...)

edited for examples.

noja
1 replies
1d

One reason they don't is unhelpful comments like these: alluding to products without naming them.

ofrzeta
0 replies
1d

From the article: "During our evaluation, we explored CockroachDB, TiDB, Spanner, and Vitess". Three of them are open source if I am not mistaken.

mplanchard
0 replies
1d

And many of these are substantially more expensive than RDS, or have terrible INSERT performance, or require making the correct decisions really early about what to use as partition keys. There's no free lunch, sadly.

Closi
0 replies
1d

I don't think these were mature products when Figma started to be developed in 2012.

Closi
2 replies
1d

This assumes that using a distributed database from the start doesn't offer it's own penalties/drawbacks (particularly in 2016).

Particularly considering as per the figma note, they consider their data highly relational (i.e. presumably this means lots of joins in queries)

What database would you have chosen?

esafak
1 replies
1d

That's a fair response for then, but not today where you're spoiled for choice. Spanner was released as a service in 2017, which isn't far off.

prisenco
0 replies
21h49m

Spanner is quite expensive though.

mamcx
0 replies
1d

There are many kinds of apps that use of "distributed" databases is an antipattern...and useless to make them scale.

MOST apps did not need more than "tenant-per-company" and then, maybe, distributed among a small set of servers.

Is it just that the kind of app that shows here (and has this problems) are just niche apps like social networks and complex chat apps.

battwell
0 replies
1d

Doesn't seem a crazy way to start a company. RDS will have scaling problems but is very mature and fairly easy to use early on when you're working on your MVP

I've used CRDB early on at a startup. There was some overhead. You don't get all the nice PSQL features

Although it did save us a giant migration later on

bearjaws
4 replies
23h19m

I doubt even VC money can afford this service.

Serverless Aurora is incredibly expensive for most workloads. I have yet to find a use case for any SaaS product that is used >4 hours a day. Since all my products span at least 3 time zones there is at least 12 hours of activity a day.

Scubabear68
2 replies
22h58m

We found this out the hard way in a small startup. The per query and I/O expense was through the roof.

yard2010
1 replies
19h42m

Did it work though? Did you achieve unlimited scaling? Because if so you should compare the price to the price of a team of great minds such as in this article, working for 2 years to get a solution.

I bet it still would be cheaper to pay people over Amazon, but I'm curious about the numbers

Scubabear68
0 replies
19h7m

It worked fine, the problem was our workloads were minuscule and it still cost $3,000 a month to support 50ish users on a lightly used platform.

The same load in a non-Serverless instance was around $100/month.

brcmthrowaway
0 replies
23h4m

What products

MapleWalnut
1 replies
22h52m

Yes, but that wasn't available when they did this migration

jamesfinlayson
0 replies
18h19m

Hm, looks like it's only available as a preview too. I was wondering why I hadn't seen in mentioned before.

vinner_roy
0 replies
22h39m

Haha alright I get the picture. Too expensive.

sgarland
0 replies
22h51m

At my company, we were given an early talk on Limitless. Never once did the reps mention that it ran on Serverless. Dear lord, that's going to be a hard no from me. Unless they've dramatically changed pricing for Limitless as opposed to normal Serverless, that'll be through the roof.

paulddraper
0 replies
23h13m

"Limitless" refers to the bill, not just the scale.

cynicalsecurity
6 replies
22h5m

They came up with a really over-engineered, over-complicated attempt at splitting data into multiple databases. I'm not sure this was the best idea or even a good idea, but then I also don't understand how it came to the situation when they only had a few months before their current database gets overflown and the whole system collapses.

dwaltrip
5 replies
21h52m

Can you offer insight into what a better approach might have been?

TheP1000
4 replies
21h39m

As others have mentioned, moving to per tenant databases can really simplify things at scale and doesn't leave a massive amount of engineering complexity and debt in its wake.

I feel sorry for the team managing this 5 years from now.

simonw
2 replies
20h44m

Moving to a per-tenant database sounds like even more work to me than moving to shards. Moving to per-tenant means rewriting _everything_ - moving to shards has you rewriting a lot less.

kgeist
0 replies
12h30m

What do you mean by "rewriting everything"? Or maybe your definition of "per-tenant database" is different from mine. In our product, it's just a small layer which routes requests to the target organization's DB. When an organization is created, we create a new DB. Most of application code has no idea there are different DBs under the hood.

There are logical DBs (folders/files on a single physical server), and there's a few physical servers. We're currently at the stage where the first physical server hosts most of smaller organizations, and all other physical servers are usually dedicated servers for larger clients with high loads.

infra_dev
0 replies
20h25m

This is possible to do, but lots of engineering. You can provide the experience of a single DB while each tenant can be placed in their own dedicated Postgres compute. This would help the application to stay the same while tenants are moved to independent computes (you can even move only a few tenants and leave the rest on a shared Postgres compute).

cynicalsecurity
0 replies
21h21m

Exactly this. But at this point, I don't even want to give them advice, I don't really like their service. I like Lunacy more.

ijustlovemath
5 replies
22h57m

NoSQL databases are another common scalable-by-default solution that companies adopt as they grow. However, we have a very complex relational data model built on top of our current Postgres architecture and NoSQL APIs don’t offer this kind of versatility.

As I understand it, NoSQL is for people who need a backend that ingests just about any unstructured data, for teams that may not have a complex relational model worked out/stabilized. Postgres has this in its native jsonb datatype, but they wouldn't need to use that much since it sounds like they already have a good data model. What am I missing here?

suhastech
0 replies
21h35m

Using NoSQL might not be the best idea in this case. I've seen it backfire for many companies. They start with NoSQL, but then end up needing relational features as their business grows. This leads to performance issues, redundancy, and data sync problems early on, which shouldn't be happening.

Especially in the early days, NoSQL companies used to market their databases as general-purpose database that scale easily, but that hasn't always been the case obviously.

I usually recommend starting with a relational database like PostgreSQL. If scaling becomes necessary later on, you can invest in sharding the database. Figma's approach seems reasonable given the tools available at the time.

I've helped small companies switch from NoSQL to SQL because the benefits of NoSQL weren't worth the trade-offs at their stage of growth. In case, anyone is in a similar boat: https://mongotosqlmigration.com/

opportune
0 replies
22h38m

What is your exact question? To me it makes sense that you’d not want to use NoSQL if you’re dealing with data that’s already relational, and heavily leveraging features common in relational DBs that may not come out of the box with NoSQL DBs.

They’re saying basically that NoSQL DBs solve a lot of horizontal scaling problems but aren’t a good fit for their highly relational data, is my understanding. Not that they can’t get NoSQL functionality at eg the query level in relational DBs.

jimbokun
0 replies
17h50m

If scaling isn’t an issue, use a relational database.

“NoSQL” only really pays off when you need to scale horizontally. And even then, one of the horizontally scaling SQL solutions might be a better bet.

ffsm8
0 replies
22h34m

I think they're equating relation database with databases with ACID guarantees, as thats basically a full overlap on a venn diagram.

And we all know that acid has to go at some scale, even if that scale keeps getting pushed further out as our hardware gets better.

(Same with the relational guarantees that eat performance... But only once you've reached a certain amount of throughput and data)

hintymad
4 replies
21h45m

Given that sharding has become a pretty mature practice, is it still worth considering the NewSQL solutions like CRDB, Yugabyte, and TiDB for the sake of auto sharding, given that these NewSQL databases usually trade throughput and latency for auto-sharding and multi-region support? Another added cost is learning how to operate NewSQL databases, assuming one is already familiar with either MySQL or Postgres.

brigadier132
2 replies
18h40m

Sharding mysql and postgres has been a shitshow at every company I've worked at.

paulryanrogers
0 replies
16h7m

Why? How did they shard?

hintymad
0 replies
14h56m

Thanks. This is kinda of information I look for. Could you give more specifics? Why were they shit shows? If the tables are naturally shardable, say by user ID as described in the Figma's case, would the situation be different?

emmanueloga_
0 replies
20h21m

In the OP they said they built their own sharding solution because it was risky to move to a different (newsql) solution and they already had sharding expertise with PG.

I think if starting from scratch it makes sense to look at these newsql DBs that are built to horizontally scale from the very beginning.

TkTech
4 replies
23h36m

Am I the only one finding the layout of this blog distracting? Kind of disappointing from a UX company. The images are also massive, the page was 42.21mb!

Good article none the less! Always appreciate when companies like Figma document technical challenges.

renegade-otter
1 replies
23h17m

It seems like someone at Figma decided to use the latest CSS tricks they just had discovered. Changing background color? Come on.

Cwizard
0 replies
22h42m

I thought this was a bug at first. Does anyone actually want this?

zelphirkalt
0 replies
21h34m

Maybe someone designed this castle in Figma.

ghostly_s
0 replies
20h49m

Odd, it looks perfectly bog-standard for me on Firefox and iOS, aside from the lavender bg which I quite like. I even disabled my ad-blocker to see if it made a difference, no changing bg colors for me...

mannyv
3 replies
23h58m

Hmm, wonder why they didn't try FoundationDB.

Interesting that they had problems with vacuuming. I always thought that part of Postgres was the worst part. I vaguely remember that you needed twice the space to vacuum successfully, which hopefully has changed in later versions.

An article about why vacuum is needed in pg (as compared to mysql/innodb).

http://rhaas.blogspot.com/2011/02/mysql-vs-postgresql-part-2...

yashap
0 replies
20h12m

They want pretty full SQL support, so FoundationDB would be out. I’d think “buy” options for them would be more like Citus, Yugabyte, Cockroach or Spanner.

harikb
0 replies
23h50m

Do you know of any performant relational-db layer on top of FoundationDB? It seems there usecase would need at least simple join, which raw FoundationDB lacks.

eatonphil
0 replies
23h52m

The article mentions they're trying very hard to stick with Postgres. FoundationDB is great but doesn't even have a SQL access layer, let alone a Postgres SQL access layer. :)

dakiol
3 replies
23h50m

I cannot help but think this all sounds pretty much like a hack (a clever one, though).

We do not handle, let's say low-level I/O buffering/caching, by ourselves anymore, right? (at least the folks doing web development/saas). We rely instead of the OS APIs, and that's good. I think we are missing something similar but for db sharding. It seems to me that we are still missing some fundamental piece of technology/infrastructure to handle horizontal data sharding.

yard2010
0 replies
19h51m

This is Hacker News after all

simonw
0 replies
20h46m

Citus and Vitess are examples of horzontal data sharding technology for PostgreSQL and MySQL respectively.

At Figma's size there are sometimes reasons to roll your own, which I think they explain pretty clearly in the article. They wanted a solution they could incrementally engineer onto their existing stack without doing a full rewrite or lift-and-shift to something else.

efxhoy
0 replies
18h20m

I’m no computer scientist but I think the fundamental problem is that the CAP theorem makes it really tricky to do in a “cost free” way. You fundamentally need to sacrifice one, at least a tiny bit.

goshx
2 replies
1d

"Sorry, you’ve stumbled upon a temporary technical issue. Please try refreshing this page in a moment."

Have they? :) I am excited to read the article when it loads

nix0n
0 replies
23h40m

It worked for me on the first try, but here's an archive link in case it goes down again: https://archive.is/xusR7

Lucasoato
0 replies
23h6m

This might be another case of “death by HN”

Thaxll
2 replies
21h40m

How transactions work when you end up querying different shards?

simonw
0 replies
20h43m

I imagine that's one of the reasons they have "colos" - you can aim to collocate tables that are likely to be part of the same transaction in the same shard by putting them in the same colo group.

blauditore
0 replies
21h25m

IIUC they never have db transactions across shards, because customers are basically data silos, which makes sharding quite straightforward.

jeffbee
1 replies
22h35m

Given the list of authors and acknowledgees, what I'd really like to read is the differences between this solution and Dropbox's.

sp1nozick
0 replies
20h23m

horizontal sharding is such a foundational skillset in working with databases - it's essential to database architecture

HermitX
1 replies
22h29m

I This is an intriguing article, clearly showing the team's fondness for Postgres. However, Postgres is an OLTP product. I'm curious about what system Figma's data team uses for their data analysis tasks.

jerrygenser
0 replies
20h56m

They mentioned analyzing query logs in Snowflake in this very article. So... at least they use Snowflake for some things?

tbarbugli
0 replies
20h8m

Citus seems incredibly close to what they built, I wonder why they did not use it

metadat
0 replies
13h26m

It looks like figma mostly just implemented the Instagram sharding paper, and then did the obvious next level of semi-intelligent yet delicate and fragile query routing.

Definitely a technically challenging and overall fun / satisfying engineering exercise!

https://instagram-engineering.com/sharding-ids-at-instagram-...

lenkite
0 replies
3h16m

"Schema changes must be coordinated across all shards to ensure the databases stay in sync. Foreign keys and globally unique indexes can no longer be enforced by Postgres."

One of my best practices for RDBMS data after executing a few sharding projects is to mandate no use of foreign keys even for v1 of a project.

ing33k
0 replies
11h30m

"Additionally, over the past few years, we’ve developed a lot of expertise on how to reliably and performantly run RDS Postgres in-house". Isn't the whole point of paying AWS the premium is that they do it for you ?

gfodor
0 replies
21h32m

Honestly it's depressing that I remember doing this same thing more than a decade ago and we still have to spend developer cycles on this nonsense in 2024.

emptysea
0 replies
1d

Seems they’ve built out a PG version of MySQL’s Vitess

Query rewriting seems interesting, having a layer between your DB and your application would also allow various ACL stuff as well

cynicalsecurity
0 replies
23h4m

It would have benefitted the article to use less of annoying corporate talk.

Xenya
0 replies
13h11m

amazing team, but the author doesn't seem to understand the details

DonnyV
0 replies
15h42m

Is it me or did they just recreate Mongodb Shard feature? But now they have to maintain it. They would of been better off going Mongodb or another document database. Then you wouldn't have to worry about scheme changes. Since your scheme lives there n your data model. You could just map changed fields. They should of just used a tenant based design. If a customer was hindering performance on neighboring tenants. They could just migrate them over to their own large server.

AtlasBarfed
0 replies
19h49m

1 postgres

2 postgres replicas+sharding

3 Cassandra / dynamo

If you are at stage 2, you are using bandaids. You need to start architecting a transition of your most heavily used data views to truly scalable databases like Cassandra / dynamo. I have not used foundation, but aphyr liked it.

That transition takes time, evaluation, pro typing, and organizational learning both at the application programmer, support engineer, and management.