Single biggest thing you need to nail down fast: the data model. It is extremely hard to shift as things grow, and without careful thought, it’ll turn into a horrifying miasma of JSONB columns, duplicated data, orphaned rows, and garbage performance.
Customers are going to store surprisingly large items in Docs, where you’d be tempted to inline them instead of offloading to S3 et al.
Chat practically needs to be its own DB. Discord runs on Scylla, Slack runs on Vitess over MySQL. The needs of chat access are wildly different from other types of storage.
If you’re doing any kind of active-active, have a plan for how to move off of that, because it does not scale (at least, not without breath-takingly expensive hardware).
Source: DBRE at one of your competitors.
EDIT: The fact that you’re doing offline saves (which is very cool!) makes me think that you may be using something like Ditto [0], which IIRC is MyRocksDB under the hood. I have no experience with either, but I do know some super sharp folks working at Ditto.
[0]: https://ditto.live
No one cares about that. Export to open document format or microsoft. You are living in a bubble of “hackers”. You are not your average user.
Case in point of “engineers are not product people”
He's actually dead on point.
Back in the day, here is Sokovia, there was a local competetitor to Facebook. They had a great start and everything went perfect for them, but it quickly turned out that the technical side was really bad. Sluggish interface, constant outages, etc.
They tried to rewrite the app from scratch two times, and eventually failed.
So yes, making sure you're moving in the right direction at the beginning of your journey is pretty important. You don't have to overengineer and stay in your shed until you have a complete, feature complete product, but at least make sure, that you're building on the right foundation.
This is actually the same reason Friendster failed in the face of Facebook, pun intended. Friendster simply could not keep up technically and had to shut down. Later, Facebook actually had a more solid technical footing and could scale quickly.
https://gimletmedia.com/shows/startup/n8hogn
https://gimletmedia.com/shows/startup/8whow5
Facebook didn't kill Friendster. Friendster getting popular killed Friendster's weak stack, and then everyone moved to Facebook.
Yes, that's what I meant, my apologies if I conveyed that incorrectly. Friendster's own scale was untenable for them but Facebook was able to handle their own scale much better.
By all means, don’t spend forever agonizing over the perfect schema and never ship. It really does not matter at small scale anyway, DBs are absurdly fast.
Just understand and accept that you are taking on heavy technical debt that will need to be repaid, and that it’s much more difficult to do once you’ve already vertically scaled several times along the way.
Product sell lies to customers that engineering struggles to produce, because reality is a harsh mistress.
if you can dream it, you can type it into a chatgpt text prompt!
Different apps have different technical problems that can be an enabler or a source of never-ending technical debt. Being able to add new features easily, rather than being stuck scaling, could make or break this product.
If your selling point is consolidating apps, you absolutely have to get the data model right, else you don't solve the problem. Just because you don't go in and sell it that way, doesn't mean it's not important as hell. The very reason it's hard to get apps to interoperate is that each one has it's own data model. If they used one giant data model... it wouldn't be a problem.
The problem with modern development is having to nail down the data model first.
I wish we would develop software where the data model could easily change.
To do this every data dependency in the system needs to traceable. Nothing does this so far. And everyone just picks a database off the shelf but none are even remotely useful for this.
Admittedly, yes. This is the massive appeal of Mongo et al., or just JSON[B] columns in an RDBMS.
Unfortunately, at a very deep level, that’s simply not how RDBMS works. The tuples are a B+tree, and in some (MySQL [InnoDB], SQL Server) cases everything is clustered around the PK. If you don’t create a data model that’s easily exploitable for optimizations designed around that data structure, you’re gonna have a bad time. It’s no different than if you decided to use strings to store ints – you _can_, but it’s a bad idea for a variety of reasons.
What you can do is give yourself as much leeway as possible, by following some basic best practices. For example, it’s a hell of a lot easier to update a tiny reference table than to update billions of rows when you decide that column `region` should say `European Union` instead of `EU`.
Yeh RDBMS is probably the wrong choice for most apps. It was good for crunching sales data in batches back in the day. Everything today is pipelines and reactivity.
My dream is to have a tool to model my logical data model and then it will organize my data into the best storage and caches.
I don't think any existing database today is useful.
Ah, I misunderstood your point. I disagree that RDBMS is the wrong choice. Most apps are CRUD, and have the same basic patterns.
Nosql doesn't solve the schema migration problem. It just means you don't formalize your schema. But your code will implicitly require a certain schema anyway. Changing the schema means changing the code and migrating data. You'll have to write migration scripts and think about backward compatibility. Same problems as in sql.
The trick is maintaining a full graph of all data dependencies through the entire codebase. Then migrations can be done with ease. But no one does this. They shovel data from one database to the next, with tons of little adhoc data stores along the way.
I'd rather the data model be designed properly upfront so that it doesn't need to change, but can be extended with new functionality.
Schemaless was one of the original drivers for NoSQL databases.
Now, when I need something schemaless, I start with a Postgres table with an ID and a jsonb or json field... which at least makes it easy to have a schema when the inevitable happens and schema-dependent code ends up getting added to the project.
This is a hard problem.
Single biggest technical thing, anyway. IMO, the single biggest thing is focus and clarity in their communication. If people without a working mental model of software development can’t instantly understand the tangible problem it solves in their existing business process, they won’t even scroll past the break, let alone pay for it. Consolidation and modularity are solutions, but people don’t go shopping for solutions without a problem. Have you ever gone out looking for a better commercial version of work-related software you didn’t have any problem with? “App chaos” is way too abstract of a problem for most people to grasp. Do people have trouble sharing google docs over slack? Do companies have trouble with sharepoint and teams not being integrated enough? Does your tool do it better? Does your tool do it approximately as well, but cheaper? More reliably? If so, do people find the existing solutions too pricy or unreliable, or does that not impact them enough to care?
Unless they define, upfront, specific problems people really have, that their unified solution solves, then nobody is going to pay attention.
The second biggest problem is having an interface design team that makes all of those disparate apps consistent enough to be more usable than individual solutions. The fact that nearly no popular user-facing applications are developer-managed FOSS (as opposed to Firefox/blender/signal/et al which are managed by a company that hires professional designers) despite being free, tells you everything you need to know about dev-driven UI/UX. This is coming from someone that worked as a full time developer for years and contributed many thousands of hours of coding to FOSS projects before switching to design.
Fair point. I know little to nothing about design, and don’t really care about it. It’s not that I think it’s unimportant, it’s just not something I want to expend any time learning. To be fair, I’m also not trying to create any user-facing products.
To me, rsync.net is peak design. It has just enough modernity to appeal to people who might expect that, but it quickly gets out of the way and tells you what it is, why it matters, and how much it costs.
At the other end of the spectrum, there’s tarsnap.com, which is probably a turnoff for anyone who doesn’t like text. I love it (as, apparently, do enough other people to keep its author comfortably employed), but I get that it’s an extremely narrow niche.
There’s less than no shame in not having expertise in something outside of your area of expertise, and realizing that’s the case puts you way ahead of the pack. There’s a reason most designers you work with have relevant degrees, and the ones that don’t that are in high level positions in good organizations might as well have them— it’s just a lot more complex than most developers assume. When they realize that, great! When they’re swinging around giant Dunning-Krueger derived overconfident declarations about something you’re designing… not so great.
As a full-time developer for a decade, and in other technical roles for a decade before that, I had a few similar experiences with designers. One repeatedly insisted that Wordpress along with their ramshackle loopdy-looped spaghetti php plugin (still including comments from the tutorials they copied tidbits of code from) was robust enough to enough to replace our very tight Django-based code base that did a hell of a lot more than serve up our website… but they insisted it would take half as long to reimplement it all in php. There wasn’t even a good reason for it– they learned everything they knew about development by osmosis from working on web projects, and a mishmash of articles they read on the topic over the years, and after getting one piece of code to work in a low volume application, thought they were a dual-field specialist. That’s actually pretty rare among designers, but developers that feel that way about design are the norm. We all know what Larry Wall thought the three most important traits were for developers…
Nailed the real important part, the product marketing
It can sting to realize your grand, genuinely useful technical idea won’t sell itself, but it just won’t.
He's most likely using SQLite per account, because that's the easiest way to have an offline DB and sync it, which will most likely scale perfectly fine with appropriate indexes as long as you are careful about the feature set.
That introduces a new problem when it syncs to others in the same workspace, if it’s large.
Nice thing about SQLite is you can “clone” the repository by copying a single file. And in cases where you need incremental sync, you can use an SQLite of diff’s as a single packfile (similar to git).
Things like cr-SQLite also have a lot of potential to make single SQLite per client a lot more viable. But I’m interested to see what you think the problems are? Have you found a solution or alternative?
Notion might have written something about their journey in this regard?
They have [0] [1], yes, but they also mention [2] learning that skipping building indices during a DB copy (doing so instead after the new instance is built) is much faster, which is pretty basic RDBMS knowledge. It’s great to be learning, and even better to be sharing that knowledge, but it gives me pause about accepting much of what they’ve written as expertise.
IME, many SaaS companies have eschewed the idea of having any DB experts, and this inevitably leads to pain down the road.
[0]: https://www.notion.so/blog/data-model-behind-notion
[1]: https://www.notion.so/blog/sharding-postgres-at-notion
[2]: https://www.notion.so/blog/the-great-re-shard
If you know people at Ditto, let them know their website looks completely obnoxious if fonts are not loaded. It looks something like this: