HN comments for: How Figma's databases team lived to tell the scale

bjornsing

63 replies

23h49m

2024-03-14 18:36:51 UTC

One thought that comes up: Wouldn’t it be easier to have each customer in their own (logical) database? I mean, you don’t need transactions across different customers, right? So you’re essentially solving a harder problem than the one you’ve got.

Not sure postgres (logical) databases would scale that well, but don’t see a principal reason why it couldn’t. Has anyone explored this further?

ummonk

8 replies

20h22m

2024-03-14 22:04:03 UTC

This seems to be an architecture Cloudflare is aiming to support with their SQLite service. One database per customer, each database located in the customer’s primary region.

jasonwatkinspdx

5 replies

20h12m

2024-03-14 22:14:10 UTC

I think there's quite a few people chasing similar ideas, like Azure's Durable Entities.

I've been calling it the Lots of Little Databases model vs the Globe Spanning Gorilla.

Like the Spanner paper points out, even if your distributed database semantically appears like a single giant instance, in practice performance means developers avoid using distributed joins, etc, because these can lead to shuffling very large amounts of intermediate results across the network. So the illusion of being on a single giant machine ends up leaking through the reality, and people end up writing workarounds for distributed joins like async materialization.

If we give up the single machine illusion we get a lot of simplification, at the cost of features devs were unlikely to use anyhow. I see having consistent distributed commit but without cross shard joins as a really interesting alternative.

And besides scalability I like the extra security rope of fine grained partitioning from the start.

I'll write a blog post along these lines if I get anything worthwhile done.

themoonisachees

2 replies

19h3m

2024-03-14 23:23:07 UTC

An advantage worth noting is that having actually separated databases means you physically can't make these expensive operations, so a junior dev can't write incredibly inefficient code that would bring down your entire infra.

ummonk

0 replies

14h51m

2024-03-15 03:35:00 UTC

Also makes it a lot harder for devs to do some footgun and leak data across domains.

aledalgrande

0 replies

14h32m

2024-03-15 03:54:13 UTC

Bring down the infra or foot you a 6 figures bill at the end of the month

ummonk

0 replies

14h45m

2024-03-15 03:40:55 UTC

Interesting. Durable Entities strikes me as closer to Cloudflare's Durable Objects (both in name and in design as actors backed by persistent storage).

shalabhc

0 replies

16h5m

2024-03-15 02:20:45 UTC

"Lots of Little Databases" reminded me of https://www.actordb.com/ which does lots of server-side sqlite instances, but the project now looks defunct.

fauigerzigerk

1 replies

9h49m

2024-03-15 08:37:01 UTC

That would be fantastic. Unfortunately it's not true. D1 doesn't support the one database per customer approach unless you have just a handful of customers that you can set up manually.

You have to create each database manually using wrangler or the website. Then you have to create a binding for each database in wrangler.toml so that the database becomes accessible as a variable in your Workers code. Then you have to change your Worker source code to do something with that variable. Then you redeploy.

The issue is that Workers cannot create or list databases. There's no API for it.

ummonk

0 replies

40m

2024-03-15 17:45:53 UTC

They have said they're working on some sort of solution (which they intend to have available by the time D1 exists beta) that will allow some sort of dynamic bindings specifically to address this use case.

In the meantime, though it's not a great solution, one can create and query databases through the REST API instead https://developers.cloudflare.com/api/operations/cloudflare-...

tharkun__

8 replies

19h59m

2024-03-14 22:26:26 UTC

Yes, we've been doing that at my place basically since the start. Each tenant is a schema in postgres. Works perfectly fine on the one hand, i.e. your tables don't grow to 'infinity' just because you're adding more and more tenants. If there's a particular tenant that has lots of data, only that tenant's indexes and tables grow huge and become slower because of that particular reason etc. If a tenant leaves, you keep the schema around for some time, so they can come back and then at some point you just drop the schema!

It does mean having to upgrade each schema individually, which also makes it both easier and harder. Easier because the tables are smaller, so any schema changes that require things like say a table lock, are locking for a smaller amount of time and won't affect more than the one tenant at a given time. It also means that you can get into an inconsistent state of course, where some of your tenants have all the latest DB upgrades, while it failed on another subset.

At some point Postgres's internal tables become a bit of a "problem", as you want to run as many of these updates in parallel as you can for speed, which could lead to contention on Postgres' administrative side. You'll also still need to shard across multiple actual RDS instances, because you still have many tenants running against a single physical piece of hardware that will show its limitations if too many large or active tenants happen to be on the same shard.

And then you have the problem of keeping a record of which physical RDS instance (i.e. shard) the tenant is on. Your application code will need to look that up (and cache that info for some time ;)) and you have some choice there as well. I.e. do you shard those as well and juggle load balancing as basically a 1:1 mapping to shards or do you have your application layer connect to all database shards and handle any tenants? One is more complicated I would say while the other could run out of connections depending on how you need to scale the application layer and what kind of RDS instance you have.

Syntaf

5 replies

19h7m

2024-03-14 23:19:19 UTC

I recall a popular rails gem[1] once upon a time that provided multi-tenancy via postgres schemas.

As it turns out, even the company the initially developed the gem ended up ditching the approach due to some of the issues you outlined above.

Managing separate schemas feels like one of those nefarious decisions that make things simple _initially_ but get you into a world of hurt when you need to scale. The company is since defunct but they have an article where they discuss why they ditched the approach [2], TL;DR it's too difficult to maintain and scale

[1] https://github.com/influitive/apartment#tenants-on-different... [2] https://web.archive.org/web/20201108191323/https://influitiv...

tharkun__

3 replies

18h9m

2024-03-15 00:17:22 UTC

Let's address these one by one based on our experience (the part of the journey that I've been there at least as the implementation of the solution predates me but I live with it).

Migrations

    Things slow down past 100 [...] No one wants friction in their deployment process, especially as we’re attempting to deploy daily or more frequently.

We have more than a thousand schemas per database shard. That is why I said you want things to run in parallel for speed, yes. However, we deploy way more than daily and it's not really an issue in that sense. Schema updates are not that frequent but you do of course need to take into account that you will have to make the schema updates as a separate PR, wait and check that it worked and then deploy your actual change making use of the changes in application code. Which honestly isn't much different from ensuring that your BE changes and FE changes are compatible or made in the right order so that you don't get failed requests because an old FE happens to call a new BE node or vice versa :shrug:

Database resources

    "Need too large of an instance" r3.4xl

We seem to have m5.large. That has 2 virtual cores. Some r5.larges etc. Their r3.4xl has 16?! So not sure what kind of load pattern they have :shrug:

Client Memory Bloat

    Ruby ActiveRecord something

Yeah well, we don't have that, so not sure what to say :shrug: Definitely not a generalized reason to say "can't do, is too complicated for scaling".

Record Identification

    One major drawback of schemas as tenants is that your sequence generators will reside independently in each tenant.

I respectfully disagree. This is a great advantage because it means just because you get more and more tenants (some of which churn and you throw away their data anyway) your identifiers don't grow past limits as easily. In fact, in most cases the identifiers never run out of runway at all.

They complain about "what if you need to join this data somewhere else, then you need to also add the tenantId". Yeah, so? We also have a data warehouse we we do just that. Not a problem at all. We also have other services than our main one, which do use different database technologies where we use tenant as part of the key (for loads that actually benefit from being in a NoSQL type DB) and there we do not have sharding other than what the NoSQL database does by itself so to speak by dividing the keyspace.

That's it. End of article. Basically, we have none of these issues. They don't mention their actual scale. The only number they mention is the "hundreds of schemas". We have more than ten times that number per physical shard and have tens of thousands of tenants total. Again :shrug:

nightpool

2 replies

16h48m

2024-03-15 01:38:22 UTC

That's still 2 orders of magnitude smaller than the scale of Figma—they would need to somehow manage millions of Postgres schemas. I don't think it's a realistic possibility

tharkun__

1 replies

1h25m

2024-03-15 17:00:43 UTC

https://news.ycombinator.com/item?id=39711815 Shopify seems to have millions of customers. Granted, they have MySQL and it might not use database schemas and something more MySQL specific. In general though ...

nightpool

0 replies

2024-03-15 18:22:48 UTC

"We can attach a shop_id to all shop-owned tables and have it serve as a sharding key. Moving a shop from one shard to another involves selecting all records from all tables that have the desired shop_id and copying them to another MySQL shard. For this post, it’s helpful to think of each pod as a MySQL shard."

Sounds like this is just normal sharding. Completely different from having a totally separate schema like GP proposes (with that approach you don't use sharding keys at all).

cqqxo4zV46cp

0 replies

19h0m

2024-03-14 23:26:22 UTC

This is the conclusion I came to when faced with the same quandary.

neeleshs

0 replies

18h16m

2024-03-15 00:10:14 UTC

This is a very common approach and scales quite well. I worked for a company that had thousands of customers and each had their own schema. A single master database that kept track of which customer is on what physical db cluster, and this was globally replicated (EU,ANZ, NA).

Certainly needs a bunch of tooling, but worked well. Some apps were stateless and could connect to any physical cluster. Some others were sticky and only connected to a subset.

Similar architecture in my current company as well and we serve nearly a thousand customer instances served across 4 physical clusters.

We do have some basic tools to provision new customers on the emptiest cluster, move customers from one cluster to another etc

aledalgrande

0 replies

14h51m

2024-03-15 03:34:55 UTC

a single physical piece of hardware that will show its limitations if too many large or active tenants happen to be on the same shard

Shopify has pretty much mastered this https://shopify.engineering/mysql-database-shard-balancing-t...

paxys

7 replies

20h32m

2024-03-14 21:53:50 UTC

This works great until (1) your largest customer outgrows the largest available DB (happens sooner than you'd think for large companies) or (2) you do need transactions across different customers, say to facilitate some kind of sharing. Going all-in on the isolated tenant strategy means when you hit one of these cases it's a nightmare to unwind and rearchitect your entire DB layer.

jasonwatkinspdx

2 replies

19h50m

2024-03-14 22:35:24 UTC

I'd respond by saying (1) is more rare than you're asserting.

There is a huge long tail of companies with datasets that won't fit on a single machine but can be handled by a dozen or so, and where no single customer or service is near the limits of an individual node.

These customers are poorly served at the moment. Most of the easy to implement SaaS options for what they need would be hugely costly vs a small fleet of db servers they administer. Meanwhile, most of the open source options are cargo culting Google or Facebook style architecture, which is a huge burden for a small company. I mean do you really want to run K8S when you have 10 servers in total?

I think there's a lot of interesting stuff happening in this end of the market that's not trying to be a mini Google, like Fly.io.

As for (2), I think a middle ground is supporting cross shard transactions but not joins. This works well enough for VoltDB et all.

bastawhiz

1 replies

17h53m

2024-03-15 00:33:17 UTC

that won't fit on a single machine

It's rarely an issue with the number of bytes and more an issue with hot shards. Whatever shard Google is on (that is, Google the Figma customer) is surely going to be one _hell_ of a hot shard. They have more designers/engineers/PMs actively using Figma than most startups have _users total_. You don't need more than one really really hot customer for this to become more than a hypothetical problem.

When you start to think about it that way, suddenly you need to seriously consider your iops (if you're on RDS) or how much redundancy that physical machine's SSDs have (if you're running it on your own boxes).

ndriscoll

0 replies

13h37m

2024-03-15 04:48:41 UTC

Google still only has ~180k employees, and obviously not all of them use figma, and obviously not all of their figma users are performing actions simultaneously. I'd be surprised if it broke 10k QPS (would an org like Google even have 10k peak active user sessions? Seems doubtful). Human generated traffic tends to just not reach that large of scales unless you're trying to fit the entire planet on one instance.

RDS can be absurdly limited with IOPS, granted, but a modern laptop for example ought to be up to the task. Realistically you could probably be fine even on RDS but you might need to pay through the nose for extra IOPS.

zarzavat

1 replies

19h38m

2024-03-14 22:48:18 UTC

Figma is more or less a desktop application that happens to run in a web browser.

If I use Photoshop to edit a .psd file I don’t think “man that psd file should really be stored in a single planet-sized database of all psd files in existence”. It’s just a file on my computer.

Figma requires a little bit more intermingling of data than Photoshop, it has multiuser support for one, so a pure local storage based approach wouldn’t work. But, at its heart it’s still based on the document model. When you open a Figma document it’s its own isolated little universe, the connections with resources outside the document are limited, and that matches user expectations.

joegibbs

0 replies

8h44m

2024-03-15 09:42:22 UTC

Agreed - it's not like you'd ever have to do SELECT * FROM Shapes WHERE 'type' = 'circle'. Could they have stored each document as a file, stored references to the files in the database, opened the file in the backend when someone opened it in the frontend, and written it back when they've stopped editing it?

winrid

0 replies

20h12m

2024-03-14 22:13:43 UTC

A figma customer won't exceed the requirements of an i3.metal...

jddj

0 replies

19h51m

2024-03-14 22:35:17 UTC

Can you give an example of when a single customer has outgrown the largest available DB?

jakewins

5 replies

23h42m

2024-03-14 18:43:29 UTC

The problem - conceptually - is made much simpler this way; we make use of this at work.

However you will still have shared resource problems - some rogue query destroys IOPS in one tenant now ends up bringing down all tenants etc. There are in theory databases that solve this as well, but my experience has been that at that point what you buy into is a bad version of resource sharing - ie what an operating system does - and you’re better off using OS mechanisms

In other words: yes, but you still have noisy neighbours, and may be better off running lots of small fully separated DBMSes than a big logically separated one

hacker_newz

4 replies

20h44m

2024-03-14 21:41:56 UTC

If tenants are on separate databases how would that be an issue?

sgarland

2 replies

20h19m

2024-03-14 22:07:09 UTC

Because database, in Postgres terms, doesn’t mean physical node. It’s more akin to a VM than anything else. The term for an installation of Postgres is database cluster, which can contain N databases, each of which can contain M schemas.

albert_e

1 replies

18h50m

2024-03-14 23:36:07 UTC

Thanks! Is there a good primer to this terminology that clarifies these terms on various popular database and cloud platforms?

It seems there is good potential for confusion unless we use the same terms consistently when discussing architecture and design across teams.

Even the term RDS (Relational Database Service) is sometimes said to be inaccurate since it is a "Relational Database SERVER as a Service"

A few related terms that cause confusion:

"Schema" could refer to a Database Schema or in some contexts, a single table's logical data structure. (Or a single data set's data structure -- or a standard for one, like JSON Schema)

Data Catalog products like "AWS Glue Data Catalog" which only store the metadata or schemas of the table they crawl ... refer to the entities they store as "Databases" and "Tables" (but no "Schemas") and documentation includes guides talk about "creating a database"[1] and "creating a table"[2] in AWS Glue. There has to be a better way to refer to all these entities without using the word schema with so many meanings -- or calling both physical tables and their metadata as "tables". Otherwise this is needlessly confusing and hard for beginners.

---

EDIT: even more madness. This StackOverflow discussion [3] has more examples of confusing usage:

On Oracle .... A Schema is effectively a user. More specifically it's a set of tables/procs/indexes etc owned by a user. Another user has a different schema (tables he/she owns) however user can also see any schemas they have select priviliedges on.

MySQL: database/schema :: table

SQL Server: database :: (schema/namespace ::) table

Oracle: database/schema/user :: (tablespace ::) table

[1]: https://docs.aws.amazon.com/glue/latest/dg/define-database.h...

[2]: https://docs.aws.amazon.com/glue/latest/dg/tables-described....

[3]: https://stackoverflow.com/questions/298739/what-is-the-diffe...

sgarland

0 replies

18h0m

2024-03-15 00:25:24 UTC

Not AFAIK. In MySQL, the only time “cluster” is used is to refer to NDB Cluster, which is a distributed DB product. Schema means a logical grouping of tables, the same (more or less) as Postgres.

As to schema itself, yes, it’s heavily overloaded, and you just to grok it from context. I can talk about a table’s schema or a DB’s schema, or as you mentioned, JSON schema. Although the latter is helped by simply not using JSON in a relational DB.

You must remember that SQL is an ancient language, and the relational model is even older. There are going to be oddities.

jakewins

0 replies

20h24m

2024-03-14 22:01:42 UTC

Separate logical databases, within the same RDBMs, so sharing CPU, disks, RAM etc

eatonphil

5 replies

23h44m

2024-03-14 18:41:42 UTC

I imagine it only gets you so far. What do you do about customers like Walmart or Oracle? Hundreds, if not thousands, of users all leaving hundreds of comments on Figma files every day. If you want good latency without giving up strong consistency (which the article says they want) you'll need to keep sharding.

willsmith72

1 replies

23h21m

2024-03-14 19:04:45 UTC

A single db can handle that load easily

paulryanrogers

0 replies

17h37m

2024-03-15 00:48:49 UTC

Do you mean an RDBMS running on a single, big-iron, bare-metal server?

jeremyjh

1 replies

22h1m

2024-03-14 20:24:25 UTC

They were running their whole business on one RDS instance 4 years ago. Do you think they now have one customer larger than all their customers combined 4 years ago?

eatonphil

0 replies

21h58m

2024-03-14 20:27:45 UTC

Figma’s database stack has grown almost 100x since 2020

The first sentence from the article seems to suggest its possible?

infecto

0 replies

21h6m

2024-03-14 21:20:05 UTC

I bet it gets you further than you imagine. Entirely depends on the backend services and what they touch but in this scenario you would be deploying/scaling that service based on the customer seat size. I suspect that even for large enterprise customers, the users actively touching Figma are not reaching he thousands but I am happy to be wrong.

After all, Stackoverflow is running off of a handful of machines.

menthe

4 replies

18h44m

2024-03-14 23:41:27 UTC

This, seriously. The long-term maintenance, tribal knowledge & risks associated with this giant hack will be greater than anything they'd ever have expected. Inb4 global outage post-mortem & key-man dependency salaries.

There's no virtually no excuse not spinning up a pg pod (or two) for each tenant - heck even a namespace with the whole stack.

Embed your 4-phases migrations directly in your releases / deployments, slap a py script to manage progressive rollouts, and you're done.

Discovery is automated, blast / loss radius is reduced to the smallest denominator, you can now monitor / pin / adjust the stack for each customer individually as necessary, sort the release ordering / schedule based on client criticality / sensitivity, you can now easily geolocate the deployment to the tenant's location, charge by resource usage, and much more.

And you can still query & roll-up all of your databases at once for analytics with Trino/DBT with nothing more but a yaml inventory.

No magic, no proprietary garbage.

nightpool

3 replies

16h51m

2024-03-15 01:35:12 UTC

Figma has millions of customers. The idea of having a Postgres pod for each one would be nearly impossible without completely overhauling their DB choice.

menthe

1 replies

14h10m

2024-03-15 04:15:53 UTC

You are making a major conflation here. While they do millions of users, they were last reported to only have ~60k tenants.

Decently sized EKS nodes can easily hold nearly 800 pods each (as documented), that'd make it 75 nodes. Each EKS cluster supports up to 13,500 nodes. Spread in a couple of regions to improve your customer experience, you're looking at 20 EKS nodes per cluster. This is a nothingburger.

Besides, it's far from being rocket science to co-locate tenant schemas on medium-sized pg instances, monitor tenant growth, and re-balance schemas as necessary. Tenants' contracts does not evolve overnight, and certainly does not grow orders of magnitude on week over week basis - a company using Figma either has 10 seats, 100 seats, 1000, or 10,000 seats. It's easy to plan ahead for. And I would MUCH rather having to think of re-balancing a heavy hitter customer's schema to another instance every now and then (can be 100% automated too), compared to facing a business-wide SPOF, and having to hire L07+ DBAs to maintain a proprietary query parser / planner / router.

Hell, OVH does tenant-based deployments of Ceph clusters, with collocated/coscheduled SSD/HDD hardware and does hot-spot resolution. And running Ceph is significantly more demanding and admin+monitoring heavy.

nightpool

0 replies

4h29m

2024-03-15 13:57:07 UTC

Reported where? Does that include a monolithic "free tenant" that would be larger than thousands of their other tenants put together? Every Figma user has their own personal workspace along with the workspace of any organizations they may be invited to

asguy

0 replies

14h20m

2024-03-15 04:05:24 UTC

Thousands of directories over thousands of installations? It’s not that far fetched.

jerbear4328

4 replies

21h54m

2024-03-14 20:31:38 UTC

Actually, Apple does this for iCloud! They use FoundationDB[1] to store billions of databases, one for each user (plus shared or global databases).

See: https://read.engineerscodex.com/p/how-apple-built-icloud-to-...

Discussed on HN at the time: https://news.ycombinator.com/item?id=39028672

[1]: https://github.com/apple/foundationdb https://en.wikipedia.org/wiki/FoundationDB

danpalmer

3 replies

21h23m

2024-03-14 21:02:56 UTC

store billions of databases

This is sort of true and sort of false. When you think of a "database", if you're thinking of a Postgres database, you're way off the reality of what "database" means here.

FoundationDB has a concept called "layers", and essentially they have created a layer that looks like a separate database on top of a layer that is separately encrypted groups of keys. They don't have billions of FoundationDB clusters or machines, and at the infra level, i.e. the instances of the FoundationDB server software, it's unaware of individual "databases".

A closer analogy would be like having billions of tables, but even that isn't accurate because in relational databases a table is usually a notably more static concept than data in a table. The closest analogy would be that each of the billions of users has a bunch of rows with a user-id field on them, and there's a proxy that filters everything such that you can view the table as if it only had one user's data in it.

To be clear, FoundationDB is awesome and Apple have done some really cool stuff with it, but it's less crazy/impressive than it sounds.

dexwiz

2 replies

20h37m

2024-03-14 21:48:42 UTC

This sounds like a pretty standard multitenant datastore. Everything has a user/group Id on it, and a logical layer that locks a connection to a specific group.

pests

0 replies

16h30m

2024-03-15 01:56:00 UTC

FoundationDB is really just a set of structures and tools to build any type of database you want on top of a solid foundation.

"FoundationDB decouples its data storage technology from its data model. FoundationDB’s core ordered key-value storage technology can be efficiently adapted and remapped to a broad array of rich data models. Using indexing as an example, FoundationDB’s core provides no indexing and never will. Instead, a layer provides indexing by storing two kinds of key-values, one for the data and one for the index."

https://apple.github.io/foundationdb/layer-concept.html

Then existing standard layers like the Record layer, providing "(very) roughly equivalent to a simple relational database" providing structured types, index, complex types, queries, etc.

https://github.com/FoundationDB/fdb-record-layer

Or one for documents, which speaks the MongoDB wire protocol

https://github.com/FoundationDB/fdb-document-layer

danpalmer

0 replies

19h50m

2024-03-14 22:35:29 UTC

Yeah, the advantage or difference here is that these "layers" are a common design pattern with FoundationDB, several ship in FDB by default, and you're encouraged to make more, so the database certainly has better support than just adding a column for TenantID, but still you're right that it's not too out there.

aeyes

3 replies

23h18m

2024-03-14 19:07:36 UTC

I have pondered about this for quite some time and came to the conclusion that it would make schema migrations more difficult to handle. I think Shopify is using an approach which is similar to what you are describing. The advantage is that you don't end up with hot shards because you can move around large customers independently.

In practice there isn't a big difference, they just colocate several customers according to their sharding key in the same logical database.

jamesfinlayson

1 replies

20h11m

2024-03-14 22:15:14 UTC

I remember Postgres table spaces being used to separate customers at a previous job - I can't remember how migrations were handled (pretty sure they were applied per table space) but I don't think it was a problem (at our scale anyway).

vmfunction

0 replies

19h7m

2024-03-14 23:19:15 UTC

There is usually not much point in making more than one tablespace per logical file system, since you cannot control the location of individual files within a logical file system. However, PostgreSQL does not enforce any such limitation, and indeed it is not directly aware of the file system boundaries on your system. It just stores files in the directories you tell it to use.

seems like the limitation is the logical file system. Which probably will work for most users.

yard2010

0 replies

19h59m

2024-03-14 22:27:18 UTC

I worked in a place that had a database for each tenant and the schema migrations were a real pain. Every time everything goes smoothly except these few databases that have an edge case that screws the whole migration.

icedchai

2 replies

19h54m

2024-03-14 22:32:19 UTC

I worked at a place that did this with MySQL. Every tiny, trial account got their own database. Every huge, actual customer got their own database. Migrations were kinda painful. I would think carefully about doing it this way.

freefaler

1 replies

19h23m

2024-03-14 23:03:09 UTC

we've been doing this for 20k databases with mysql for the last 10+ years. It solves more problems than it creates. Migrations are trickier, but you get sharding, data isolation and easier backups that way.

icedchai

0 replies

19h14m

2024-03-14 23:11:57 UTC

I'm not saying it's always a bad idea, you just need to think about what you're doing. This was closer to 15 years ago now. We had to develop a bunch of our own tooling, and make our own modifications to frameworks that are now ancient history.

petervandijck

0 replies

19h46m

2024-03-14 22:40:02 UTC

Ongoing product development with migrations, tweaking indexes etc. becomes really hard. Every small database tweak now has to be deployed over 1000s of databases.

jimbokun

0 replies

18h13m

2024-03-15 00:13:15 UTC

If customers are not of similar size, and you have a lot of customers, managing all the different databases can be a big headache.

Having a more granular partition key makes it easier to shard the data more evenly.

infra_dev

0 replies

22h43m

2024-03-14 19:42:43 UTC

Nile is a Serverless Postgres that virtualizes tenants/customers. It is specifically built for SaaS companies similar to Figma https://www.thenile.dev/. I am the CEO of Nile.

erikpukinskis

0 replies

17h33m

2024-03-15 00:52:49 UTC

Im curious for anyone who has done this: what’s the point of going all the way from one-db to one-db-per-customer? Why not just split the customers to 2 databases, then to 3, etc? Seems like the same level of system complexity but you avoid the lots-of-databases scaling problem.

You probably don’t even need to be able to migrate people between shards… just put everyone on one db until you hit 50% utilization, then spin up a fresh db and put all new customers on that one. Repeat whenever you run out of databases under 50%.

dhash

0 replies

20h36m

2024-03-14 21:49:46 UTC

I believe physalia [0] explores this concept at production scale quite well.

[0] https://blog.acolyer.org/2020/03/04/millions-of-tiny-databas...

cavisne

0 replies

20h18m

2024-03-14 22:07:38 UTC

Spanners interleaved tables seem like a similar solution, ie you interleave customer data so it all ends up on the same set of hosts for performance, while still having the ability to create transactions across customers.

N_A_T_E

0 replies

22h57m

2024-03-14 19:28:32 UTC

It sounds like they actually did something like this. Their shard key selection could be customer, project, folder or something in their data model at a reasonably high logical level in their hierarchy.

GordonS

0 replies

22h23m

2024-03-14 20:03:15 UTC

I've done something like this before, with each customer getting their own schema within a single Postgres instance.

nosefrog

27 replies

2024-03-14 18:13:46 UTC

Coming from Google, where Spanner is this magical technology that supports infinite horizontal sharding with transactions and has become the standard storage engine for everything at Google (almost every project not using Spanner was moving to Spanner), I'm curious how Figma evaluated Cloud Spanner. Cloud Spanner does have a postgres translation layer, though I don't know how well it works.

It seems like they've (hopefully only temporarily) given up real transactional support with their horizontal postgres scheme?

umvi

12 replies

23h57m

2024-03-14 18:28:53 UTC

Never a good idea to rely on Google proprietary tech (unless you are Google)... it could be sunset at any time without warning. I use GCP but I try my best to stay Google agnostic (avoid GCP-only offerings, etc) so that I can move to AWS if Google pulls the rug out from under me.

rockostrich

7 replies

22h59m

2024-03-14 19:26:48 UTC

GCP products have a much better track record than Google consumer products when it comes to support since there are usually enterprise customers with multi-year contracts worth tens, if not hundreds, of millions of dollars using them.

callalex

3 replies

22h36m

2024-03-14 19:50:12 UTC

That’s what AppEngine customers thought.

weitendorf

1 replies

22h28m

2024-03-14 19:57:43 UTC

I worked on AppEngine (well, Serverless Compute) at Google for over 4 years and left on Friday. Did something happen to AppEngine in the last week?

callalex

0 replies

22h16m

2024-03-14 20:10:21 UTC

I’m talking about the pricing disaster that happened in 2017/2018 where user prices went up from 10x-100x because Google wanted to kill the product without actually killing it.

tempnow987

0 replies

19h14m

2024-03-14 23:11:43 UTC

Didn't they also hammer folks using google maps in a business / cloud context?

I was also an old app engine user, but bailed ages ago (original app engine).

jerrygenser

1 replies

20h53m

2024-03-14 21:32:40 UTC

IoT is one example of a big backbone service that was sunset.

endisneigh

0 replies

20h38m

2024-03-14 21:47:23 UTC

It had barely any usage though from what I can tell from searching about it.

Not that it’s any solace to those affected.

marcinzm

0 replies

20h3m

2024-03-14 22:22:28 UTC

AI Platform is a recent example that’s been deprecated.

weitendorf

3 replies

22h12m

2024-03-14 20:13:42 UTC

I'm biased having worked on GCP, but I think GCP actually has a very good track record of not sunsetting entire products or removing core functionality. When I worked on AppEngine, I would often see apps written 10+ years ago still chugging along.

It is true though that GCP sometimes sunsets specific product functionality, requiring changes on the customers' part. Some of these are unavoidable (eg committing to apply security patches to Python 2.7 given that the rest of the world is mostly not upstreaming these patches anymore), but not all of them.

I would certainly use Cloud Spanner externally now that I've left the company, and IMO spanner has such a compelling featureset that it's a strong reason to use GCP for greenfield development. The problem though is that it's only available on GCP, and could become expensive at large scale.

nojvek

1 replies

17h33m

2024-03-15 00:52:28 UTC

My previous employer was a GCP customer. Google did pull occasional shenanigans totally breaking us with random upgrades without notifying us. Their support wouldn’t acknowledge it was their fault until we were mad at them.

My newer employer is AWS. Their offerings are a lot more stable and support is helpful.

If you want to build a serious business I would avoid GCP. Google doesn’t really give a shit at winning Cloud. Ads is their core business.

weitendorf

0 replies

16h15m

2024-03-15 02:11:08 UTC

Ugh I'm so sorry, you should definitely have been notified in advance about breaking changes, unless they were accidental bugs and regressions. That was something we took pretty seriously for the products I worked on.

I have definitely enjoyed using AWS support. But I've also encountered some broken/janky functionality (eg using custom domain + api gateway + websockets + Lambda for the apparently niche-task of having a real website use the Lambda-websockets integration) on AWS that I don't think would have made it to production at Google. I also really dislike how they handle project-level logging compared to how it's done on GCP.

Ultimately I do think some GCP products like Bigquery, Cloud Run (I'm biased here), and Spanner as well as general DevEx/reliability factors are compelling enough for GCP to be worth serious consideration, even if AWS offers better support.

jerrygenser

0 replies

20h52m

2024-03-14 21:33:40 UTC

It is true though that GCP sometimes sunsets specific product functionality, requiring changes on the customers' part. Some of these are unavoidable (eg committing to apply security patches to Python 2.7 given that the rest of the world is mostly not upstreaming these patches anymore), but not all of them.

A good example is probably IoT. I've heard first hand anecdotes of very difficult migrations off this service.

shalabhc

2 replies

23h49m

2024-03-14 18:37:02 UTC

Global consistency is expensive, both latency-wise and cost-wise. In reality most apps don't need global serializability across all objects. For instance, you probably don't need serializability across different tenants, organizations, workspaces, etc. Spanner provides serializability across all objects IIUC - so you pay for it whether you need it or not.

The other side of something like Spanner is the quorum-based latency is often optimized by adding another cache on top, which instantly defeats the original consistency guarantees. The consistency of (spanner+my_cache) is not the same as the consistency of spanner. So if we're back to app level consistency guarantees anyway, turns out the "managed" solution is only partial.

Ideally the managed db systems would have flexible consistency, allowing me to configure not just which object sets need consistency but also letting me configure caches with lag tolerance. This would let me choose trade-offs without having to implement consistent caching and other optimization tricks on top of globally consistent/serializable databases.

foota

0 replies

21h58m

2024-03-14 20:27:50 UTC

While it doesn't help much with the costs of replication, Spanner can be configured with read only replicas that don't participate in voting for commits, so they don't impact the quorum latency.

Reads can then be done with different consistency requirements, e.g., bounded staleness (which guarantees data less stale than the time bound requested).

See https://cloud.google.com/spanner/docs/reference/rest/v1/Tran... or https://cloud.google.com/spanner/docs/reads#read_types and https://cloud.google.com/spanner/docs/create-manage-configur...

eatonphil

0 replies

23h29m

2024-03-14 18:56:36 UTC

See also: "Strict-serializability, but at what cost, for what purpose?"

https://muratbuffalo.blogspot.com/2022/08/strict-serializabi....

opportune

2 replies

22h48m

2024-03-14 19:37:27 UTC

My perspective from working both inside and outside of Google:

The external spanner documentation doesn’t seem as good as the internal documentation, in my opinion. Because it’s not generally well known outside of G, they ought to do a better job explaining it and its benefits. It truly is magical technology but you have to be a database nerd to see why.

It’s also pretty expensive and because you generally need to rewrite your applications to work with it, there is a degree of lockin. So taking on Spanner is a risky proposition - if your prices get hiked or it starts costing more than you want, you’ll have to spend even more time and money migrating off it. Spanner’s advantages over other DBs (trying to “solve” the CAP theorem) then become a curse, because it’s hard to find any other DB that gives you horizontal scaling, ACID, and high availability out of the box, and you might have to solve those problems yourself/redesign the rest of your system.

Personally I would consider using Cloud Spanner, but I wouldn’t bet my business on it.

nojvek

1 replies

17h31m

2024-03-15 00:54:51 UTC

If you really have that much data and traffic, the $ costs start to add up to multiple engineer comp costs. At that point it’s cheaper to move to something you have good control over.

I.e sharding at application layer and connecting to the DB instance replica where the customer data is hosted.

weitendorf

0 replies

15h44m

2024-03-15 02:41:49 UTC

Depends. The cost may pay for itself but the engineers you have already may have higher ROI things to do. It's also nice to have operational stuff managed for you. Personally I'd be happy to pay extra for the kinds of problems Spanner solves to free myself up to do other things (to a point, ofc).

sharding at application layer and connecting to the DB instance replica where the customer data is hosted.

Spanner does global consistency/replication. If having good performance per-tenant globally is a concern, this helps a lot, and is hard to implement on your own. It can also ultimately save you money by limiting cross-region traffic.

hipadev23

2 replies

2024-03-14 18:18:02 UTC

The problem is nobody outside Google trusts them to run or operate anything.

Edit: To the Googlers downvoting these comments. Your behavior only reinforces our views.

szundi

1 replies

2024-03-14 18:21:33 UTC

Google means: good chance discontinued after you have worked out the bugs and have a stable system at last

groestl

0 replies

23h18m

2024-03-14 19:07:46 UTC

This is definitely not the case with their core cloud products.

almost every project not using Spanner was moving to Spanner

This even includes Datastore. Even Datastore moved to Spanner.

echelon

2 replies

2024-03-14 18:15:52 UTC

Why would anyone marry themselves to Google? That sounds like the most boneheaded move possible.

First, you should never be beholden to a single vendor for the most critical technological underpinnings. You're backing yourself into a corner.

But more importantly, Google can't even figure out how to prioritize their cloud efforts. That's not a good partnership to be in for anyone except Google.

I wouldn't care if my solution was 10x worse than Cloud Spanner from a technology perspective. It'd be 1000x better from a strategy perspective.

You can hire engineers to do consistency at scale. It's your core competency and you can't just handwave and outsource that, lest you wind up stuck in a crevasse. Hire smart engineers and do the work yourself. It'll pay off.

Thaxll

1 replies

21h12m

2024-03-14 21:13:43 UTC

You can't, that's why spanner only exists a Google. That tech is that good, not found anywhere else.

echelon

0 replies

19h52m

2024-03-14 22:33:51 UTC

But you don't need it. There are a limitless number of ways to deal with the problems spanner solves. And by choosing your own, you can focus on the subset that matter to your case and not be tied down to Google.

zenbowman

0 replies

21h2m

2024-03-14 21:23:51 UTC

I wouldn't say "infinite", its still susceptible to read hotspotting; and while fine-grained locking enables generally higher write throughputs, you can still get in a situation where interconnected updates end up being pretty slow.

That said, its way better than anything else I've used in my career.

ksb

0 replies

2024-03-14 18:22:39 UTC

Ping time from AWS data centers to GCP ones

adastral

23 replies

23h47m

2024-03-14 18:38:29 UTC

I see they don't mention Citus (https://github.com/citusdata/citus), which is already a fairly mature native Postgres extension. From the details given in the article, it sounds like they just reimplemented it.

I wonder if they were unaware of it or disregarded it for a reason —I currently am in a similar situation as the one described in the blog, trying to shard a massive Postgres DB.

_boffin_

7 replies

23h43m

2024-03-14 18:42:25 UTC

how "massive" is massive in your case?

dijit

5 replies

23h36m

2024-03-14 18:49:41 UTC

I've had CitusDB running across 68 bare metal machines (40 vCPU, 768GiB ram, 20TiB of storage each + 40GiB network links) and it ran decently well.

Not sure what your definition of massive is, I think Spanner would easily beat it.

Also, it's very use-case dependent, you can't "just use" Citus for everything, it's not quite as flexible as a bog-standard pgsql install due to the way it's sharding, you have to be a tad more careful with your data model.

VHRanger

2 replies

16h28m

2024-03-15 01:57:53 UTC

Is there a reason there's comparatively little storage in your machines in relation to RAM or even CPUs?

Do your machines do compute heavy loads or something?

For a DB I'd expect a lot more storage per node

dijit

1 replies

14h15m

2024-03-15 04:10:48 UTC

NVMe SSDs aren't so large unfortunately.

a 1U server has capacity for 8 drives, we used 2 slots for the OS (RAID1), 2 slots for the WAL volume (2 slots) leaving only 4 slots in RAID10.

So I'm already cheating a little and claiming WAL storage was part of total storage.

VHRanger

0 replies

5h35m

2024-03-15 12:50:28 UTC

Shouldn't you try to get something with PCIe bifurcation in this case?

I doubt you're saturating the PCIe bus bandwidth on any of them?

I imagine your DB is extremely high performance, though!

skunkworker

1 replies

21h33m

2024-03-14 20:52:54 UTC

What is your definition of "decently well", and is your primary cluster (without replicas) above 1PB?

simonw

0 replies

20h48m

2024-03-14 21:37:35 UTC

They said 20TiB * 68, which I think is 1.5PB.

adastral

0 replies

23h36m

2024-03-14 18:49:28 UTC

Around ten heavily-updated (50-400k updated rows/min) tables ranging between 500M and 5B rows, with a couple tables over 40B rows each (5TB each IIRC).

junto

5 replies

23h25m

2024-03-14 19:01:19 UTC

Before clicking on the article I assumed it was Citus, and was surprised when it wasn’t.

Maybe because CitusData was bought by Microsoft around the same time, so Microsoft could create “Azure Cosmos DB for Postgres Cluster”, yet another one of Microsoft’s typical product naming crapshoots.

victor106

4 replies

22h22m

2024-03-14 20:03:28 UTC

yet another one of Microsoft’s typical product naming crapshoots.

Well said. I haven't seen any company as terrible as Microsoft at naming things. Anyone know why?

salynchnew

1 replies

21h8m

2024-03-14 21:18:01 UTC

Naming things is hard.

At a previous employer, I saw several cool-ish open source projects instantly doomed to obscurity by picking a name that either completely duplicated the name of an existing OSS project or were guaranteed to have terrible SEO for another reason.

However, Microsoft seems to have a unique crossover of fragmented business units and centralized marketing. That's why you end up with Azure -> Subproject -> Actual Product/Service word soup. Perviously, they did this with the Windows Live brand from 2005-2012, and "Xbox" for a wide range of gaming projects (many of which were on PC).

thomasjudge

0 replies

19h30m

2024-03-14 22:55:41 UTC

related, Microsoft on Microsoft marketing:

https://www.youtube.com/watch?v=EUXnJraKM3k

studmuffin650

0 replies

21h22m

2024-03-14 21:04:09 UTC

AWS is putting up good fight

r-bar

0 replies

19h18m

2024-03-14 23:07:40 UTC

The committee wanted Cosmos, Azure, and Postgres all in the name and wouldn't compromise.

jabart

2 replies

23h4m

2024-03-14 19:21:41 UTC

Figma uses AWS RDS, RDS doesn't list citus as a supported extension.

gen220

1 replies

21h54m

2024-03-14 20:31:51 UTC

This is my guess of why they didn't use Citus. They weren't interested in the options of (1) going multi-cloud [DB in Azure Cosmos / Backend(s) in AWS] (2) going all-in on Azure [DB in Azure Cosmos / Backend(s) in Azure] (3) self-managing Postgres+Citus in EC2.

It'd be interesting to compare the expected capex of developing this in-house solution + the opex of maintaining it vs the same categories of expected costs for option (3) – because I imagine that's probably the most palatable option.

They also may have pre-paid for dedicated RDS instances for the next X years (before this horizontal scaling initiative began, to boot), as AWS allows companies to do this at a pretty steep discount rate, which would probably tilt them away from (3).

sgarland

0 replies

20h27m

2024-03-14 21:58:33 UTC

Especially because Option 3 lets you go waaaay farther on vertical scaling, since you can get native NVMe drives (they mentioned hitting IOPS limits for RDS), more exotic instance classes with far more RAM, and do stuff like ZFS for native compression and snapshots.

gshulegaard

2 replies

22h38m

2024-03-14 19:47:30 UTC

I have worked on teams that have both sharded and partitioned PostgreSQL ourselves (somewhat like Figma) (Postgres 9.4-ish time frame) as well as those that have utilized Citus. I am a strong proponent of Citus and point colleagues in that direction frequently, but depending on how long ago Figma was considering this path I will say that there were some very interesting limitations to Citus not that long ago.

For example, it was only 2 years ago that Citus allowed the joining of data in "local" tables and data retrieved from distributed tables (https://www.citusdata.com/updates/v11-0). In this major update as well, Citus enabled _any_ node to handle queries, previously all queries (whether or not it was modifying data) had to go through the "coordinator" node in your cluster. This could turn into a pretty significant bottleneck which had ramifications for your cluster administration and choices made about how to shape your data (what goes into local tables, reference tables, or distributed tables).

Again, huge fan of Citus, but it's not a magic bullet that makes it so you no longer have to think about scale when using Postgres. It makes it _much_ easier and adds some killer features that push complexity down the stack such that it is _almost_ completely abstracted from application logic. But you still have be cognizant of it, sometimes even altering your data model to accommodate.

gen220

1 replies

21h50m

2024-03-14 20:35:26 UTC

You also benefit from the tailwind of the CitusData team making continued improvement to the extension, whereas an in-house system depends on your company's ability to hire and retain people to maintain + improve the in-house system.

It's hard to account for the value of benefits that have yet to accrue, but this kind of analysis, even if you pretty heavily-discount that future value, tilts the ROI in favor of solutions like Citus, IMO. Especially if your time horizon is 5+ or 10+ years out.

Like you said, if they made this decision 3ish years ago, you would have had to be pretty trusting on that future value. A choice, made today, hinges less on that variable.

wayne-li2

0 replies

21h4m

2024-03-14 21:21:34 UTC

Huh, I would have thought the opposite. Companies at Figma size are easily able to hire talent to maintain a core part of their engineering stack. On the other hand, they retain no control of Citus decision making. Those tailwinds could easily have been headwinds if they went in a direction that did not suit Figma.

sgarland

0 replies

23h23m

2024-03-14 19:03:03 UTC

I thought of that as well. The only thing I could think of is that they mentioned that they don't want to move off of RDS, and there is 0% chance of Citus coming to AWS since Microsoft bought them.

iamdanieljohns

0 replies

23h37m

2024-03-14 18:48:35 UTC

I would love to see a comparison of the major PostgresQL services such as Citus, EDB, Crunchy, Neon, and some OSS distributions/packages

gregors

0 replies

21h57m

2024-03-14 20:29:05 UTC

Where's the fun in that? I'm not being snarky either. Maybe it's not the best decision business-wise, but I guarantee it was more challenging and more fun. There's something to be said for that.

thekhatribharat

17 replies

19h23m

2024-03-14 23:03:14 UTC

1. They mention that the largest tables ran into several TBs, and they would have soon topped the max IOPS supported by RDS. RDS for PostgreSQL peaks at 256,000 IOPS for a 64 TB volume. For a multi-AZ setup, this costs ~$70K/mo.

2. Let's assume the final outcome was a 5-way shard with each shard supporting ~50,000 IOPS and ~12 TB data. For a multi-AZ setup, this costs ~$100K/mo.

3. It took 9 months to shard their first table. Since it required application changes as well, let's assume this was 9mo * 20 work days/mo * (3 DB engineers + 2 app engineers) = 900 work days. Even at $100K avg. annual pay for an engineer, this is ~$400K.

4. A PostgreSQL-compatible NewSQL like YugabyteDB should cost ~$15K/mo to match top-of-the-line RDS performance. So Figma spent ~25x ($400K/$15K) to implement horizontal sharding in-house, and is still on RDS which costs ~6x ($100K/$15K)

avianlyric

10 replies

19h6m

2024-03-14 23:19:33 UTC

Their rationale for this choice is covered in the article somewhat extensively near the top.

Additionally, over the past few years, we’ve developed a lot of expertise on how to reliably and performantly run RDS Postgres in-house. While migrating, we would have had to rebuild our domain expertise from scratch. Given our very aggressive growth rate, we had only months of runway remaining. De-risking an entirely new storage layer and completing an end-to-end-migration of our most business-critical use cases would have been extremely risky on the necessary timeline. We favored known low-risk solutions over potentially easier options with much higher uncertainty, where we had less control over the outcome.

TL;DR they were working in a short timeline, with a limited team size, and wanted to minimise any risks to the business.

Clearly cost is an issue for Figma, but downtime, or worse data loss, would have a ginormous impact on their business and potential future growth. If your product is already profitable, your user base growing fast, and with your ARR. Why would risk that growth and future ARR just to save a few $10Ks a month? A very low risk DB migration that lets you keep scaling and raking in more money, is way better than a high risk migration that might save some cash in the long term, but also risks killing your primary business if it goes wrong.

brigadier132

9 replies

18h45m

2024-03-14 23:41:02 UTC

Ok, what risk? Cockroachdb is already proven technology and costs marginally more (if you use their serverless setup, it's free until you hit real scale). At the startups I've been at that hit scale, scaling sql was always a massive undertaking and affected product development on every single time.

If you don't want downtime, don't use databases that require downtime to do a migration?

Netflix, roblox, every single online gambling website all use cockroachdb.

kamikaz1k

6 replies

18h34m

2024-03-14 23:52:04 UTC

Never used cockroach so pardon my ignorance, but are there no operational challenges with running/using them? Or are they the same challenges? And how compatible is it from an application developer perspective?

jamra

3 replies

18h23m

2024-03-15 00:02:51 UTC

CRDB is Postgres compliant so the wire protocol and SQL syntax is all Postgres. It should be a 1 to 1.

jimbokun

2 replies

18h15m

2024-03-15 00:10:55 UTC

Are all the corresponding latencies for every query one to one too?

hinkley

0 replies

16h44m

2024-03-15 01:42:07 UTC

In ye olden times I used to stop bosses from throwing away the slowest machine we had, and try to get at least one faster machine.

It’s still somewhat the case, but at the time the world was rotten with concurrent code that only worked because an implicit invariant (almost) always held. One that was enforced by the relative time or latency involved with two competing tasks. Get new motherboards or storage or memory and that invariant goes from failing only when the exact right packet loss happens, to failing every day, or hour, or minute.

Yes, it’s a bug, but it wasn’t on your radar and the system was trucking along yesterday and now everything is on fire.

The people who know this think the parent is a very interesting question. The people who don’t, tend to think it’s a non sequitur.

adbachman

0 replies

6h4m

2024-03-15 12:21:48 UTC

This is the important question.

We evaluated several horizontally scalable DBs and Cockroach was by far the slowest for our access patterns.

brigadier132

0 replies

18h25m

2024-03-15 00:01:18 UTC

The managed service is hassle free and it's auto sharded so you don't have traditional scaling issues. You do need to think about how your index choices spread writes and reads on the cluster to avoid hotspots. It's almost completely compatible with postgres wire protocol but it doesn't support things like extensions for the most part.

MarkMarine

0 replies

17h4m

2024-03-15 01:22:06 UTC

There are TONS of operational issues running cockroach. At the last company I was at cockroach was probably over used as a magical way to run multiple DCs and keep things consistent without high developer overhead, but it was #1 source of large outages. So much so that we’d run a cockroach segmented out for a single microservice to limit the blast radius when it eventually failed.

That and its comically more expensive than Postgres, if you think IOPs are expensive wait till you see the service contract.

martin-adams

1 replies

18h37m

2024-03-14 23:48:25 UTC

Sounds like their discomfort was in the migration path to 'any other database' alongside not having the experience with another database to mitigate any unknown unknowns.

During our evaluation, we explored CockroachDB, TiDB, Spanner, and Vitess. However, switching to any of these alternative databases would have required a complex data migration to ensure consistency and reliability across two different database stores.

thesz

0 replies

10h23m

2024-03-15 08:02:31 UTC

ensure consistency and reliability across two different database stores.

This is main known known. And this is hard thing to attain.

My favorite story on that is testing of tendermint consensus implementation [1]. The testing process found a way to break the consensus and the reason was that protocol implementation and KV store controlled by protocol used different databases.

[1] https://jepsen.io/analyses/tendermint-0-10-2

bastawhiz

2 replies

17h59m

2024-03-15 00:26:32 UTC

A PostgreSQL-compatible NewSQL like YugabyteDB should cost ~$15K/mo to match top-of-the-line RDS performance.

"to match" is doing a lot of work here. It's extremely unwise that a "compatible" database will have the same performance characteristics and no performance cliffs.

So Figma spent ~25x ($400K/$15K)

They nearly got acquired for $20B, I don't think they give a hoot about 400K if it means keeping their stack the same and getting to keep all of the existing organizational knowledge about how to keep the thing online.

thanksgiving

0 replies

17h18m

2024-03-15 01:07:44 UTC

That’s the problem with case studies, isn’t it? What works for one company in one industry might be a death sentence for another organization with slimmer margins…

__float

0 replies

17h12m

2024-03-15 01:13:32 UTC

They're also using sticker pricing, which as we know is likely far from what they actually paid.

(And the rest of the argument sounds a lot like our former employer with their...choice of document DB :) which I'll defend too!)

weitendorf

0 replies

16h6m

2024-03-15 02:19:32 UTC

Figma, worth $10billion, was migrating what seems like their core production data. They probably didn't want to bet the company on a comparatively small software vendor like Yugabyte.

Most likely the engineering cost was much much higher than your quotes, but still insignificant compared to the potential risks. And migrating from RDS to not-RDS could easily not have been cheap in engineering time either, depending on how much software they've built around it.

jimbokun

0 replies

18h16m

2024-03-15 00:09:37 UTC

Past a certain data size, migrations are always a nightmare. For a much longer time than what you initially estimated, you are managing two systems with all the related operational costs and complexity, as well as all of the IOPs and bandwidth migrating the data.

I imagine scaling out RDS instead mitigated a lot of those costs.

chikitabanana

0 replies

19h15m

2024-03-14 23:11:21 UTC

bhai, use the western numbering system, for clarity's sake. Thanks for the summary

esafak

12 replies

2024-03-14 18:00:53 UTC

This is such a familiar story. Company starts with a centralized database, runs into scaling problems, then spends man-years sharding it. Just use a distributed database.

jack_riminton

5 replies

2024-03-14 18:05:33 UTC

If every startup used every scale-proof method available from the beginning they'd never have the time or resources to build the products to get the customers that would require them to use the scale-proof methods

esafak

4 replies

2024-03-14 18:09:02 UTC

I agree with your general point but there are relational distributed databases, open source and "serverless", that are compatible with MySQL (planetscale, tidb, vitess...) and postgres (citus, cockroachdb...)

edited for examples.

noja

1 replies

2024-03-14 18:11:49 UTC

One reason they don't is unhelpful comments like these: alluding to products without naming them.

ofrzeta

0 replies

2024-03-14 18:18:47 UTC

From the article: "During our evaluation, we explored CockroachDB, TiDB, Spanner, and Vitess". Three of them are open source if I am not mistaken.

mplanchard

0 replies

2024-03-14 18:15:11 UTC

And many of these are substantially more expensive than RDS, or have terrible INSERT performance, or require making the correct decisions really early about what to use as partition keys. There's no free lunch, sadly.

Closi

0 replies

2024-03-14 18:22:44 UTC

I don't think these were mature products when Figma started to be developed in 2012.

Closi

2 replies

2024-03-14 18:05:09 UTC

This assumes that using a distributed database from the start doesn't offer it's own penalties/drawbacks (particularly in 2016).

Particularly considering as per the figma note, they consider their data highly relational (i.e. presumably this means lots of joins in queries)

What database would you have chosen?

esafak

1 replies

2024-03-14 18:07:42 UTC

That's a fair response for then, but not today where you're spoiled for choice. Spanner was released as a service in 2017, which isn't far off.

prisenco

0 replies

21h49m

2024-03-14 20:36:54 UTC

Spanner is quite expensive though.

mamcx

0 replies

2024-03-14 18:16:38 UTC

There are many kinds of apps that use of "distributed" databases is an antipattern...and useless to make them scale.

MOST apps did not need more than "tenant-per-company" and then, maybe, distributed among a small set of servers.

Is it just that the kind of app that shows here (and has this problems) are just niche apps like social networks and complex chat apps.

dang

0 replies

2024-03-14 18:19:21 UTC

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

battwell

0 replies

2024-03-14 18:20:33 UTC

Doesn't seem a crazy way to start a company. RDS will have scaling problems but is very mature and fairly easy to use early on when you're working on your MVP

I've used CRDB early on at a startup. There was some overhead. You don't get all the nice PSQL features

Although it did save us a giant migration later on

vinner_roy

10 replies

23h23m

2024-03-14 19:03:10 UTC

Could you use Aurora Limitless for this instead? https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-au...

bearjaws

4 replies

23h19m

2024-03-14 19:07:01 UTC

I doubt even VC money can afford this service.

Serverless Aurora is incredibly expensive for most workloads. I have yet to find a use case for any SaaS product that is used >4 hours a day. Since all my products span at least 3 time zones there is at least 12 hours of activity a day.

Scubabear68

2 replies

22h58m

2024-03-14 19:28:10 UTC

We found this out the hard way in a small startup. The per query and I/O expense was through the roof.

yard2010

1 replies

19h42m

2024-03-14 22:44:13 UTC

Did it work though? Did you achieve unlimited scaling? Because if so you should compare the price to the price of a team of great minds such as in this article, working for 2 years to get a solution.

I bet it still would be cheaper to pay people over Amazon, but I'm curious about the numbers

Scubabear68

0 replies

19h7m

2024-03-14 23:18:33 UTC

It worked fine, the problem was our workloads were minuscule and it still cost $3,000 a month to support 50ish users on a lightly used platform.

The same load in a non-Serverless instance was around $100/month.

brcmthrowaway

0 replies

23h4m

2024-03-14 19:21:42 UTC

What products

MapleWalnut

1 replies

22h52m

2024-03-14 19:33:31 UTC

Yes, but that wasn't available when they did this migration

jamesfinlayson

0 replies

18h19m

2024-03-15 00:06:53 UTC

Hm, looks like it's only available as a preview too. I was wondering why I hadn't seen in mentioned before.

vinner_roy

0 replies

22h39m

2024-03-14 19:46:35 UTC

Haha alright I get the picture. Too expensive.

sgarland

0 replies

22h51m

2024-03-14 19:34:23 UTC

At my company, we were given an early talk on Limitless. Never once did the reps mention that it ran on Serverless. Dear lord, that's going to be a hard no from me. Unless they've dramatically changed pricing for Limitless as opposed to normal Serverless, that'll be through the roof.

paulddraper

0 replies

23h13m

2024-03-14 19:13:20 UTC

"Limitless" refers to the bill, not just the scale.

cynicalsecurity

6 replies

22h5m

2024-03-14 20:20:32 UTC

They came up with a really over-engineered, over-complicated attempt at splitting data into multiple databases. I'm not sure this was the best idea or even a good idea, but then I also don't understand how it came to the situation when they only had a few months before their current database gets overflown and the whole system collapses.

dwaltrip

5 replies

21h52m

2024-03-14 20:34:18 UTC

Can you offer insight into what a better approach might have been?

TheP1000

4 replies

21h39m

2024-03-14 20:47:09 UTC

As others have mentioned, moving to per tenant databases can really simplify things at scale and doesn't leave a massive amount of engineering complexity and debt in its wake.

I feel sorry for the team managing this 5 years from now.

simonw

2 replies

20h44m

2024-03-14 21:41:34 UTC

Moving to a per-tenant database sounds like even more work to me than moving to shards. Moving to per-tenant means rewriting _everything_ - moving to shards has you rewriting a lot less.

kgeist

0 replies

12h30m

2024-03-15 05:56:20 UTC

What do you mean by "rewriting everything"? Or maybe your definition of "per-tenant database" is different from mine. In our product, it's just a small layer which routes requests to the target organization's DB. When an organization is created, we create a new DB. Most of application code has no idea there are different DBs under the hood.

There are logical DBs (folders/files on a single physical server), and there's a few physical servers. We're currently at the stage where the first physical server hosts most of smaller organizations, and all other physical servers are usually dedicated servers for larger clients with high loads.

infra_dev

0 replies

20h25m

2024-03-14 22:00:34 UTC

This is possible to do, but lots of engineering. You can provide the experience of a single DB while each tenant can be placed in their own dedicated Postgres compute. This would help the application to stay the same while tenants are moved to independent computes (you can even move only a few tenants and leave the rest on a shared Postgres compute).

cynicalsecurity

0 replies

21h21m

2024-03-14 21:04:30 UTC

Exactly this. But at this point, I don't even want to give them advice, I don't really like their service. I like Lunacy more.

ijustlovemath

5 replies

22h57m

2024-03-14 19:28:38 UTC

NoSQL databases are another common scalable-by-default solution that companies adopt as they grow. However, we have a very complex relational data model built on top of our current Postgres architecture and NoSQL APIs don’t offer this kind of versatility.

As I understand it, NoSQL is for people who need a backend that ingests just about any unstructured data, for teams that may not have a complex relational model worked out/stabilized. Postgres has this in its native jsonb datatype, but they wouldn't need to use that much since it sounds like they already have a good data model. What am I missing here?

vmfunction

0 replies

18h49m

2024-03-14 23:37:20 UTC

if you have postgres, just use https://github.com/FerretDB/FerretDB

suhastech

0 replies

21h35m

2024-03-14 20:50:32 UTC

Using NoSQL might not be the best idea in this case. I've seen it backfire for many companies. They start with NoSQL, but then end up needing relational features as their business grows. This leads to performance issues, redundancy, and data sync problems early on, which shouldn't be happening.

Especially in the early days, NoSQL companies used to market their databases as general-purpose database that scale easily, but that hasn't always been the case obviously.

I usually recommend starting with a relational database like PostgreSQL. If scaling becomes necessary later on, you can invest in sharding the database. Figma's approach seems reasonable given the tools available at the time.

I've helped small companies switch from NoSQL to SQL because the benefits of NoSQL weren't worth the trade-offs at their stage of growth. In case, anyone is in a similar boat: https://mongotosqlmigration.com/

opportune

0 replies

22h38m

2024-03-14 19:47:42 UTC

What is your exact question? To me it makes sense that you’d not want to use NoSQL if you’re dealing with data that’s already relational, and heavily leveraging features common in relational DBs that may not come out of the box with NoSQL DBs.

They’re saying basically that NoSQL DBs solve a lot of horizontal scaling problems but aren’t a good fit for their highly relational data, is my understanding. Not that they can’t get NoSQL functionality at eg the query level in relational DBs.

jimbokun

0 replies

17h50m

2024-03-15 00:36:18 UTC

If scaling isn’t an issue, use a relational database.

“NoSQL” only really pays off when you need to scale horizontally. And even then, one of the horizontally scaling SQL solutions might be a better bet.

ffsm8

0 replies

22h34m

2024-03-14 19:52:12 UTC

I think they're equating relation database with databases with ACID guarantees, as thats basically a full overlap on a venn diagram.

And we all know that acid has to go at some scale, even if that scale keeps getting pushed further out as our hardware gets better.

(Same with the relational guarantees that eat performance... But only once you've reached a certain amount of throughput and data)

hintymad

4 replies

21h45m

2024-03-14 20:40:30 UTC

Given that sharding has become a pretty mature practice, is it still worth considering the NewSQL solutions like CRDB, Yugabyte, and TiDB for the sake of auto sharding, given that these NewSQL databases usually trade throughput and latency for auto-sharding and multi-region support? Another added cost is learning how to operate NewSQL databases, assuming one is already familiar with either MySQL or Postgres.

brigadier132

2 replies

18h40m

2024-03-14 23:45:57 UTC

Sharding mysql and postgres has been a shitshow at every company I've worked at.

paulryanrogers

0 replies

16h7m

2024-03-15 02:18:30 UTC

Why? How did they shard?

hintymad

0 replies

14h56m

2024-03-15 03:29:26 UTC

Thanks. This is kinda of information I look for. Could you give more specifics? Why were they shit shows? If the tables are naturally shardable, say by user ID as described in the Figma's case, would the situation be different?

emmanueloga_

0 replies

20h21m

2024-03-14 22:04:25 UTC

In the OP they said they built their own sharding solution because it was risky to move to a different (newsql) solution and they already had sharding expertise with PG.

I think if starting from scratch it makes sense to look at these newsql DBs that are built to horizontally scale from the very beginning.

TkTech

4 replies

23h36m

2024-03-14 18:49:31 UTC

Am I the only one finding the layout of this blog distracting? Kind of disappointing from a UX company. The images are also massive, the page was 42.21mb!

Good article none the less! Always appreciate when companies like Figma document technical challenges.

renegade-otter

1 replies

23h17m

2024-03-14 19:09:06 UTC

It seems like someone at Figma decided to use the latest CSS tricks they just had discovered. Changing background color? Come on.

Cwizard

0 replies

22h42m

2024-03-14 19:44:05 UTC

I thought this was a bug at first. Does anyone actually want this?

zelphirkalt

0 replies

21h34m

2024-03-14 20:51:29 UTC

Maybe someone designed this castle in Figma.

ghostly_s

0 replies

20h49m

2024-03-14 21:37:13 UTC

Odd, it looks perfectly bog-standard for me on Firefox and iOS, aside from the lavender bg which I quite like. I even disabled my ad-blocker to see if it made a difference, no changing bg colors for me...

mannyv

3 replies

23h58m

2024-03-14 18:27:40 UTC

Hmm, wonder why they didn't try FoundationDB.

Interesting that they had problems with vacuuming. I always thought that part of Postgres was the worst part. I vaguely remember that you needed twice the space to vacuum successfully, which hopefully has changed in later versions.

An article about why vacuum is needed in pg (as compared to mysql/innodb).

http://rhaas.blogspot.com/2011/02/mysql-vs-postgresql-part-2...

yashap

0 replies

20h12m

2024-03-14 22:13:50 UTC

They want pretty full SQL support, so FoundationDB would be out. I’d think “buy” options for them would be more like Citus, Yugabyte, Cockroach or Spanner.

harikb

0 replies

23h50m

2024-03-14 18:36:03 UTC

Do you know of any performant relational-db layer on top of FoundationDB? It seems there usecase would need at least simple join, which raw FoundationDB lacks.

eatonphil

0 replies

23h52m

2024-03-14 18:34:04 UTC

The article mentions they're trying very hard to stick with Postgres. FoundationDB is great but doesn't even have a SQL access layer, let alone a Postgres SQL access layer. :)

dakiol

3 replies

23h50m

2024-03-14 18:36:15 UTC

I cannot help but think this all sounds pretty much like a hack (a clever one, though).

We do not handle, let's say low-level I/O buffering/caching, by ourselves anymore, right? (at least the folks doing web development/saas). We rely instead of the OS APIs, and that's good. I think we are missing something similar but for db sharding. It seems to me that we are still missing some fundamental piece of technology/infrastructure to handle horizontal data sharding.

yard2010

0 replies

19h51m

2024-03-14 22:34:43 UTC

This is Hacker News after all

simonw

0 replies

20h46m

2024-03-14 21:40:22 UTC

Citus and Vitess are examples of horzontal data sharding technology for PostgreSQL and MySQL respectively.

At Figma's size there are sometimes reasons to roll your own, which I think they explain pretty clearly in the article. They wanted a solution they could incrementally engineer onto their existing stack without doing a full rewrite or lift-and-shift to something else.

efxhoy

0 replies

18h20m

2024-03-15 00:05:23 UTC

I’m no computer scientist but I think the fundamental problem is that the CAP theorem makes it really tricky to do in a “cost free” way. You fundamentally need to sacrifice one, at least a tiny bit.

goshx

2 replies

2024-03-14 18:21:28 UTC

"Sorry, you’ve stumbled upon a temporary technical issue. Please try refreshing this page in a moment."

Have they? :) I am excited to read the article when it loads

nix0n

0 replies

23h40m

2024-03-14 18:45:52 UTC

It worked for me on the first try, but here's an archive link in case it goes down again: https://archive.is/xusR7

Lucasoato

0 replies

23h6m

2024-03-14 19:20:19 UTC

This might be another case of “death by HN”

Thaxll

2 replies

21h40m

2024-03-14 20:45:27 UTC

How transactions work when you end up querying different shards?

simonw

0 replies

20h43m

2024-03-14 21:42:29 UTC

I imagine that's one of the reasons they have "colos" - you can aim to collocate tables that are likely to be part of the same transaction in the same shard by putting them in the same colo group.

blauditore

0 replies

21h25m

2024-03-14 21:00:24 UTC

IIUC they never have db transactions across shards, because customers are basically data silos, which makes sharding quite straightforward.

jeffbee

1 replies

22h35m

2024-03-14 19:50:51 UTC

Given the list of authors and acknowledgees, what I'd really like to read is the differences between this solution and Dropbox's.

sp1nozick

0 replies

20h23m

2024-03-14 22:03:08 UTC

horizontal sharding is such a foundational skillset in working with databases - it's essential to database architecture

HermitX

1 replies

22h29m

2024-03-14 19:57:18 UTC

I This is an intriguing article, clearly showing the team's fondness for Postgres. However, Postgres is an OLTP product. I'm curious about what system Figma's data team uses for their data analysis tasks.

jerrygenser

0 replies

20h56m

2024-03-14 21:29:52 UTC

They mentioned analyzing query logs in Snowflake in this very article. So... at least they use Snowflake for some things?

tbarbugli

0 replies

20h8m

2024-03-14 22:17:59 UTC

Citus seems incredibly close to what they built, I wonder why they did not use it

metadat

0 replies

13h26m

2024-03-15 05:00:03 UTC

It looks like figma mostly just implemented the Instagram sharding paper, and then did the obvious next level of semi-intelligent yet delicate and fragile query routing.

Definitely a technically challenging and overall fun / satisfying engineering exercise!

https://instagram-engineering.com/sharding-ids-at-instagram-...

lenkite

0 replies

3h16m

2024-03-15 15:10:18 UTC

"Schema changes must be coordinated across all shards to ensure the databases stay in sync. Foreign keys and globally unique indexes can no longer be enforced by Postgres."

One of my best practices for RDBMS data after executing a few sharding projects is to mandate no use of foreign keys even for v1 of a project.

ing33k

0 replies

11h30m

2024-03-15 06:56:04 UTC

"Additionally, over the past few years, we’ve developed a lot of expertise on how to reliably and performantly run RDS Postgres in-house". Isn't the whole point of paying AWS the premium is that they do it for you ?

gfodor

0 replies

21h32m

2024-03-14 20:54:16 UTC

Honestly it's depressing that I remember doing this same thing more than a decade ago and we still have to spend developer cycles on this nonsense in 2024.

emptysea

0 replies

2024-03-14 18:17:13 UTC

Seems they’ve built out a PG version of MySQL’s Vitess

Query rewriting seems interesting, having a layer between your DB and your application would also allow various ACL stuff as well

cynicalsecurity

0 replies

23h4m

2024-03-14 19:21:39 UTC

It would have benefitted the article to use less of annoying corporate talk.

Xenya

0 replies

13h11m

2024-03-15 05:14:38 UTC

amazing team, but the author doesn't seem to understand the details

DonnyV

0 replies

15h42m

2024-03-15 02:44:11 UTC

Is it me or did they just recreate Mongodb Shard feature? But now they have to maintain it. They would of been better off going Mongodb or another document database. Then you wouldn't have to worry about scheme changes. Since your scheme lives there n your data model. You could just map changed fields. They should of just used a tenant based design. If a customer was hindering performance on neighboring tenants. They could just migrate them over to their own large server.

AtlasBarfed

0 replies

19h49m

2024-03-14 22:36:46 UTC

1 postgres

2 postgres replicas+sharding

3 Cassandra / dynamo

If you are at stage 2, you are using bandaids. You need to start architecting a transition of your most heavily used data views to truly scalable databases like Cassandra / dynamo. I have not used foundation, but aphyr liked it.

That transition takes time, evaluation, pro typing, and organizational learning both at the application programmer, support engineer, and management.