I wonder if 1.7 petabytes of data (1T indexed records) could fit on a single (very) beefy baremetal server for under a couple thousand dollars a month, served by SQLite.
Like this: https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a...
I wonder if 1.7 petabytes of data (1T indexed records) could fit on a single (very) beefy baremetal server for under a couple thousand dollars a month, served by SQLite.
Like this: https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a...
I’m working on a specialized data store[1] that would be perfect for this kind of use case (large “cold” storage with indexing). But I’m having trouble finding potential customers. I’ve tried Google search ads but got 99% spam and 1% potential investors, but 0% potential customers. If anybody has any ideas I’m all ears.
You need to be doing enterprise sales not marketing. There is a lot of advice here and in general on that but you definitely need to be making calls with that type of business.
Yep nobody with the problem you’re offering to solve is going to solve it by googling and picking some random company they’ve never heard of with no track record.
Not even click a search ad and fill in a contact form? When I’m on the other side of the table I do that. But perhaps I’m unique in that aspect?
(I understand there won’t be any significant business without enterprise sales. But that’s not what I’m looking for at this stage.)
When I’m on the other side of the table I do that
No, you don't. There are many established storage solutions out there. If you're in the market for one, you can easily fill days, weeks or months vetting those. So, why would you bother dealing with a sales rep from a random one you never heard of before, and isn't used by anyone. You don't even provide any details on what makes it different or better from anything else out there.
Well the reason I’m working on this in the first place is that when I was on the other side of the table I was looking for one. I filled in the contact forms of a couple of different startups that had products somewhat in line with what I was looking for, and talked to their sales reps. Admittedly they weren’t as early stage as my project, but on the other hand they weren’t 100% focused on my use-case either.
I guess what I’m trying to say is that I was hoping that someone with a write intensive workload would want to spend some time evaluating a product built specifically for that. But perhaps I’m wrong? Even if your workload was 99% writes you’d rather go to some established player (e.g. MongoDB) with a product optimized for 50/50 read/write?
I guess what I’m trying to say is that I was hoping that someone with a write intensive workload would want to spend some time evaluating a product built specifically for that.
Again, it's not clear to me exactly what it is you're doing that's any different from the plethora of existing off-the-shelf solutions.
You're saying that you started this project/company because you were looking for a solution to a specific use case (write-intensive workloads) and existing options didn't work - can you expand on that? Can you create a chart, for example, that lists out the specific things that Haystackdb does and alternatives don't? Presumably, if you optimize for write-intensive workloads, there are some drawbacks when it comes to reads - no? Or maybe storage? That's good to highlight.
What you need are whitepapers/blog posts/youtube videos/talks at conferences/etc. that highlight the technical details of your solution, because you're trying to get technical people interested in your product to the point where they will invest time to learn more.
Well, it’s pretty simple: HaystackDB is designed from the ground up for write intensive workloads, so it’s much more economical than existing off-shelf-solutions for that type of workload. Is that not clear from the landing page?
From pricing: “$0.2 per million writes, $20 per million reads”. The typical cost profile is $2 per million read/writes, or even more for writes.
Forgive me because what follows will sound harsh, but I think you need to hear it based on your response.
HaystackDB is designed from the ground up for write intensive workloads
Okay.
so it’s much more economical than existing off-shelf-solutions for that type of workload.
That's a leap in logic. Just because you designed it with this workload in mind, well, doesn't automatically mean that it's any good for this workload (or any workload). If solving a problem was as easy as declaring "I will design my solution from the ground up for this problem", then we'd all live in peace and harmony. So that's what people are asking you here: how do you make your DB "much more economical" for that type of workload? What technology, what ideas have you had to make it possible? If you don't want to reveal that, then you need proof that it's better than the competition, not a declaration, that it's better than the competition.
Is that not clear from the landing page?
It's clear that you want to market your solution as something good for write-heavy workloads. Why should we believe you've done a good job designing your solution?
From pricing: “$0.2 per million writes, $20 per million reads”. The typical cost profile is $2 per million read/writes, or even more for writes.
Who knows how you came up with pricing? Perhaps you're betting on your customers being stupid and not realizing that taking a 10x hit on the price of reads will lose them (and earn you) more money in the long run. After all, what good is writing to a DB if you never read from it...? Or perhaps it's some kind of promotional / loss leader pricing that will change soon in the future. In any case, it's, again, not proof that your solution is adapted to the customer's problem.
Forgive me because what follows will sound harsh, but I think you need to hear it based on your response.
No worries. I appreciate you taking the time.
you need proof that it's better than the competition, not a declaration, that it's better than the competition
Fair point. I realize I’ll need that before making any sales. But I was hoping to get a few leads from the contact form without it.
Perhaps you're betting on your customers being stupid and not realizing that taking a 10x hit on the price of reads will lose them (and earn you) more money in the long run. After all, what good is writing to a DB if you never read from it...?
No it’s not a malicious trick. There are use-cases where most records will never be read back. For example, if you go into the Uber app you can find a history of all your trips and you can click one and bring up a receipt for it. Most users will rarely if ever do that. So you end up writing many more receipts to your database than what you’ll ever retrieve.
Is that not clear from the landing page?
The marketing byline you have on your landing page is clear enough, but nobody will take that seriously without a deeper technical description.
When I read it, I assumed you wrote some code to move data in and out of lower-cost S3 or Glacier storage tiers because you don't control storage pricing and you run on top of existing public cloud infrastructure. Maybe I'm right, maybe I'm wrong - but if I'm looking for a solution, I need to assess whether I should invest time and effort to do a deeper dive, and that's the box I would put you in, without any more detail.
Anyway, good luck. Hope it works out.
The companies that have these types of problems all have AWS reps (or whatever vendor) that get first crack at a solution, even if their senior engineers or CTOs do some googling. Frequently discounts can be negotiated on products that aren’t a perfect fit, or companies will get early access to new products that solve their problem (AWS calls this “limited preview”).
A good chunk of B2B infrastructure products like this are developed using a “golden partner” model. The first customer (or few) gets a free or reduced cost license, the developer gets a real-world scenario with real data to use to figure out what the minimal functionality actually is to be a marketable product and to work out bugs. This arrangement frequently requires a preexisting relationship and trust between both parties.
Yep. Always a bit dangerous to go up against AWS and similar. My hope here is that this product is too niche for the major cloud vendors to invest in. But since Uber is building stuff themselves that assumption may be wrong.
A “golden partner” model makes a lot of sense, thanks.
If you know a technology leader at a company that has this problem, reach out and ask if they’ve had any pain related to it. Ask to take them out to lunch and tell them about a solution you’ve been working out. Or even a short demo call. See if they’d be interested in an “innovation partnership.” You give them a discounted/free license (can be time limited to a year or however long it takes to validate that your solution works and saves them money - then returns to full-price), they agree to feature prominently in your marketing material or provide a reference for your next lead.
$20 per million reads
Quite frankly, this is not gonna work. I manage a system with a very write-heavy workload (lots of small writes) and even though our writes far outpace our reads, this pricing makes your system about ten times more expensive than an RDS cluster.
There's no data about performance. There's no information on how or whether data is persisted to durable storage before a write is acknowledged. There's not even any information on how big keys or values can be. There's no public information on support.
When choosing a system like yours, my priorities are:
1. Data safety
2. Performance
3. Cost
... In that order. You've done nothing to educate me on 1 and 2 and your pricing isn't better than what you're seeking to displace.
When your product is a tool for developers, show up with hard facts about your product. Zero people (as you've seen) are even remotely interested in building a product on top of a system without knowing whether the system will hold up to their use case. And other than a very anemic FAQ section, you have no documentation at all, whatsoever.
All valid points. I guess I’m hesitant to put time into documentation and similar, if I can’t somehow find a steady stream of sales prospects.
even though our writes far outpace our reads, this pricing makes your system about ten times more expensive than an RDS cluster
That indeed sounds off… Are you sure you’re comparing the total cost to that of an RDS cluster? I am aware that reads will be more expensive (due to the write optimization), but I was hoping most customers would make it back on cheap writes. Also the storage itself ($0.23 per GB-month) should be much cheaper than RDS.
My total database is maybe 400 gigs. Most of the writes overwrite existing data, so storage cost isn't a concern. With the cost of an upfront RI for the year on RDS (with basically as many iops as I can use), your solution gives me ~100 million reads. That's...like a month of usage at best.
At least I'm my case, the fundamental problem you're facing is that reads are just too expensive. Writes and reads tend to grow at the same pace in many products: there's a ratio that tends to stay the same as you scale. $20/million reads is just a _lot_. The ratio of writes to reads for your pricing needs to be 100:1 or more for it to make sense for me, but I'm more like 10-20:1.
I guess I’m hesitant to put time into documentation and similar, if I can’t somehow find a steady stream of sales prospects.
This is part of why a database company is hard to build. You will simply not find anyone willing to give you money, because the alternative is going to be a solution your customers already know and understand and which is likely extremely mature. You're competing with Postgres and Mongo. You can't ship a database product that doesn't work: you're asking people to build on you for their storage primitive. If you fuck up, that's a business-ending event for your customer. You've either got to come to the table with an extremely compelling product ("I couldn't build my business without this") or you've got to show why someone should trust you over an established but somewhat more expensive alternative.
The ratio of writes to reads for your pricing needs to be 100:1 or more for it to make sense for me
Correct. I bet Uber’s use case here is something like 1000:1. I’ve worked on systems that were over 1000000:1. That’s where HaystackDB makes sense.
but I'm more like 10-20:1.
Then RDS is hard to beat.
You might be surprised how often a deep graph of microservices ends up rereading the same prior transactions over the course of stateful payment processing and on-demand payouts. DynamoDB can give you 2.6 million short reads of base load (1 RCU/s provisioned) for $0.12 per month, which would make a $65 alternative (2.6 * $5 + 2.6 * $20) a hard sell.
Typically you have a “hot-cold” architecture where transactions stay in “hot” storage for a number of days (I think 12 days in the OP if I remember correctly), then are moved to “cold” storage. HaystackDB is primarily designed for the “cold” portion of that architecture. But I am considering adding an open source / source available “frontend” that would provide low-cost “hot” storage.
Besides all the other good feedback here, I will offer my extremely petty reason why I wouldn't spend much time evaluating this product. In the FAQ, under the "Are transactions fully ACID?" heading, there is a typo: "simultaineously". It gives me the impression that not enough care has gone into an important part of this product. I know it's not a fair jugdgement, but first impressions matter.
That’s easy to fix, thanks.
typo was not the point of the comment though :)
also is there a demo or some sort of technical whitepaper.
Not yet. Is that something you’d find compelling? Anything in particular you’d like to see?
would love to see snowflake like paper. I am not sure if you are targetting enterprise sales or trying to woo developers who can advocate for your product in their companies. If you are targeting latter, Its unlikely that developers are going to fill a contact us form to try out your product.
One observation I have regarding your homepage is that the message isn't very clear. The headline doesn't mention any benefits I get from using your software.
I think you should invest some time into improving your landing page and maybe you may see some traction. A good resource for this which I've bookmarked is here(1). Hope that helps.
(1) https://www.indiehackers.com/post/my-step-by-step-guide-to-l...
Thanks. So you think it would work better if it would just say “save money”, rather than jump straight into the “what”?
To me, when I read the below, that just screams “save money”. But maybe I should do that conversation for the reader so to speak?
From benefits box: “Sometimes you need to index a huge amount of data, to accelerate just a few search queries. But building indexes and keeping them in hot storage can be expensive. HaystackDB builds only the indexes needed for sub-second query latency, across billions of keys, while keeping all your data in low-cost object storage like S3.”
I asked chatgpt for a headline based on your prompt and it gave me this:
HaystackDB: Swift Searches, Massive Savings - Index Billions, Store Smartly, Query in a Flash!”
Thanks! That headline is pretty good. :)
With a disclaimer that I have no formal nor practical background in marketing, here are some ideas:
1. It is a bit unclear to me when I would use Haystack. The main advantage seems to be cost cutting. It would be nice to see some realized examples of this.
2. When competing for price, you may look like the cheap, and thereby untrusted alternative. There is a risky business paradox here, for which I am sure a fellow HN poster will supply the name: you charge less, therefore you make less, and you will not be able to sustain the service, making me not want to spend money.
3. Have you tried looking for companies that may actually need this solution? Have you tried contacting them directly?
1. Good point, thanks.
2. True. One reason I haven’t priced it ridiculously cheap is to avoid this judgement, and fate. With this pricing I won’t necessarily have a smaller profit margin than competitors. The cost advantage comes from a smarter architecture. Any ideas on how I can communicate that would be greatly appreciated.
3. I used to work for one that needed it. I’ve also interviewed at one that had the same problem. A bit hesitant to reach out to potential customers though before I have a solid product I can deliver. But perhaps I shouldn’t be?
3. I used to work for one that needed it. I’ve also interviewed at one that had the same problem. A bit hesitant to reach out to potential customers though before I have a solid product I can deliver. But perhaps I shouldn’t be?
Companies generally have to be suffering pretty badly to take a risk on changing their tech stack to something unproven. And the risk for you at that point is that they choose to spend 10x on consultants to implement some existing system instead.
The CTO needs to trade off the opportunity cost of developing new features/existing maintenance against integrating an unproven product. How can you de-risk this for them? (Even just showing that you recognize that this is the case can help)
Maybe this is a time to "do things that don't scale". ie: offer to integrate it into their system for them (for at least some small part/pain point), and likely in parallel so that they can evaluate it without taking down the existing system.
Just my two cents.
I would seriously consider an open source business model with an appropriate licensing model. I see lot of companies are open to experimenting with open source db's.
Good point. I’m thinking about releasing an open source (or source available) “frontend” for it, and just charge for the “cold storage backend”. How would you feel about that?
Some B2B and B2B2C products just don't work with walk-in leads. You need someone to create and chase down a set of leads.
Good point. Any ideas how I could experiment with this “on the cheap”? Any tools I could use to identify and contact leads?
Looking at pricing, it's crazy expensive (and that comparing to AWS, which is crazy expensive). How do you justify that?
The idea is that it should be about a tenth of the cost compared to S3 or DynamoDB. Is that not how you read the pricing? Or do you just think that’s still too expensive?
EDIT: Or maybe it’s because reads are expensive? That’s a consequence of the write optimization. The idea is that potential customers will be doing 90%+ writes.
Most places I've worked we couldn't even consider using a product that wasn't supported by a major cloud provider like this. What problem does your product solve that customers absolutely 100% need it?
Structurally, you are a small entity trying to compete on cost with hyper scalar cloud providers and open source software. Most ISVs like you charge a ton of money for big problems very few enterprise customers have.
I think you need to find a specific use case where your product is a clear winner. Like 'HaystackDB is the best option for healthcare exchanges to use when receiving claims'.
This is a good summary of why I’m hesitant to put more work into it.
The counter argument I guess is that developing your own data store in-house should be even more of a no-no, and companies do that. (One example is obviously Uber, but my previous employer is another example.)
Do you think the option to self-host the product would help tip the scale?
What problem does your product solve that customers absolutely 100% need it?
To be blunt there is no such problem: you can always throw more money e.g. at DynamoDB. But if you have a very write intensive workload (such as the use-case described in the OP), then you can save 90% of that money.
Maybe find more articles like the above
try connect to the respective people at said teams via LinkedIn and ask feedback
Perhaps. But I have a feeling it’s too late once they’ve started building something in-house. Any ideas on how I could find the ones that will publish an article like this one year from now? That’s the ones I’m after, I think.
Find just 1 customer.
The homepage could benefit from more tangible examples, because right now I can't discern where it fits into my current stack. For most companies, it would be replacing something in a specific context.
Like a side-by-side example. Doing "work" on BigTable (show code examples) versus doing the same "work" on Haystack. Then show the specific metrics on how Haystack is cheaper/faster/better.
The market is saturated with storage products, so it's tough to differentiate yourself. Your site does not help by the way. You're also not selling an end-user product to the public, rather you're selling a technical and infrastructure solution to very technical people - that's a different type of sale. To get those people interested, you must put together technical whitepapers/blog posts/webinars/youtube videos/etc. to explain your approach.
To consider your product an alternative I'd like to see benchmarks that seem trustworthy, something like a Jepsen analysis or case studies at existing customers, and be able to test it within the EU, i.e. not on US:ian services.
Seems you're in the vicinity of Lund, should be a 'science park' or similar close to the uni where you can find companies that have problems you could solve. Talk to 'incubators', 'accelerators' and the like there.
Uber must have picked up some Google rejects. This type of homegrown project was seen at Google all the time.
Usually to aim for a significant promotion.
“Designed and built homegrown system to save $Xm! Give me promo, bro?”
Just so happened to ignore that it took X+Y additional to build. Also it will probably be going to the G graveyard in a few years.
$6m in annual cost savings is truly unremarkable, if we are to believe levels.fyi [0] [1]
If you're truly paying engineers, project managers, etc $500k a head, it dramatically undermines the financial cost savings.
It very well might be the case that "We spent $25m of engineering resources to save $6m annually".
[0] https://www.levels.fyi/companies/uber/salaries/software-engi...
[1] https://www.levels.fyi/companies/uber/salaries/software-engi...
Bitter story time:
I made a config change to our AWS instances and projected approximately $10MM/year in AWS costs savings (pre-savings).
My boss asked me "Who told you to do this? We need to focus on $project instead". I found another team and transferred out. 3 months later there was a big fire drill about AWS costs and they took my 1-pager and executed it. Didn't get any credit in the shipped email nor did the manager reach out to apologize.
Of course you didn't. You used your time to promote yourself instead of doing what you were asked to do instead. That could have cost a promotion for your manager who could have promoted you.
I don’t know you are downvoted. Aligning the personal interest vector with the companies interest vector is a huge problem that is usually underrepresented in NH comments.
Usually we only complain about the short sighting of the CEOs that prioritize short term stock gains over long term prosperity, but that also is just a specialized case of the success vector misalignment
This is a poor way to view salaried work.
If OP is meeting deliverables and did this on the side, this is a great example of a story that would make me inclined to hire.
The problem is OP didn't meet deliverables and the team was behind schedule.
Helpful in the right light, but equally not helpful in another.
“Team behind schedule” usually means that some manager promised an arbitrary date based on an imaginary timeline.
Every team needs slack. Bad managers hate slack. Good managers hide slack.
I think maybe my point was misunderstood.
I was trying to say that the employee tried to do what’s good for the company. However that backfired as his personal, and his manager’s, success vector are not aligned with the company’s. Hence although he did something that benefited the company it hurt his and his manager’s performance (exactly because they are not aligned with that of the company’s).
Hope it clears things up.
We were migrating to Kubernetes from fixed hosts and had to buy the capacity. I just spent a day looking at our utilization and made a better decision about what instance type to use.
The organization still got the end result, though (to play devil's advocate). That sounds like a win for the company. They got the cost savings, plus they redirected attention back to a project that was higher priority than saving $10m 3 months earlier than they could have.
It may be a win for the company in some categories, but a loss in others.
Sounds like a fantastic way to maneuver and get credit for someone else's proactive engenuity
we spent $25m of engineering resources to save $6m annually
This is a huge ROI. Borrowing $25m costs about $1.25m/yr so you're winning even with no upfront costs
That math assumes the on-going maintenance of LedgerStore is no higher than that of using DynamoDB was.
I suspect there's now a team of technologists maintaining, securing, and operating LedgerStore.
I suspect they already had a team maintaining/debugging/operating dynamodb
Hosted services doesn't reduce maintenance costs to zero
They have teams maintaining/debugging/operating their use/consumption of dynamodb; they don't have teams maintaining/debugging/operating/securing dynamodb itself.
With LedgerDB, they now have to do both.
Optimizing to address high percentile issues (latency/error/correctness) on someone else’s code follows a bimodal pattern: it’s easy until it is impossible. Doing it on your own code is progressively more difficult and the complexity may cause you to give up, but you normally don’t hit the same wall.
I mean, ideally you are still employing those people into the future. Plus was there other opportunities to drive value that would've been better spent?
Ideally for the people, but not a requirement. I doubt they won't be conducting more layoffs either.
Question is: Where else could they have spent those $25m? Could they have built something with more value?
Opportunity cost is a cost, too
if at uber's scale they still have enough opportunities of that scale for every one of their thousands of engineers i need to buy some of their stock
Of course now we are assuming that the existing solution didn’t also require engineering salaries to maintain.
Yes but since revenue is growing at over 70% due to squeezing out the drivers, there's more money to spend on fighting Amazon over the DynamoDB contract https://www.forbes.com/sites/lensherman/2023/01/16/ubers-new...
$25m spend to save $6m annually is a return on investment of 4 (and a bit) years to break even, then it's positive afterwards. So as long as the project runs for more than 4 years, they'd be ahead.
The only problem is that the database code might need maintenance, specs change or situations change, and the cost is _not_ 25m, but way more per year than anticipated. So much so that after a few years, a new batch of engineers start eyeing writing a new database system to purportedly save money...
That’s what I was thinking and fully loaded cost at least 35% more than their salary.
Imagine trading 5 headcount full-time to manage the 1T+ fully custom database on an ongoing basis when they could have just used DynamoDB and have been done with it.
Or better, having to engineer a new feature that already existed in DynamoDB and just losing money at that point.
Right, but that's $25m in R&D investment. Much better than $6m in cost of good/services delivered! Former is great innovation and will be ignored by investors because it's just a fixed cost on the way to becoming profitable. The latter is going to appear in the marginal cost of services calculation.
The thing I find odd about this is that the headline figure is about old immutable records. Almost all of that 1.7PB is ancient by what seems to be to by any practical standard. Uber is not likely to care about the credit card authorization flow for a ride two years ago, except maybe for analytics.
If I were doing this, I would be looking at data warehousing systems. 1.7PB of, say, Parquet files in S3 is not terribly expensive. 1.7PB of Parquet files in on-prem or collocated object storage, even replicated a zillion times, is quite cheap. And quite a few companies and open-source projects are currently competing aggressively to provide awesome tools for querying that data.
The hot data would fit on basically anything — the choice should be about robustness and barely even consider cost per TiB. Datomic got written up recently and seems credible for this type of application. FoundationDB is bulletproof. Postgres could probably handle it without breaking a sweat, although active/active replication isn’t free. Heck, writes straight to a warehouse with a cache in front to help with reads seems credible — Uber rides rarely go for longer than a couple hours, and back-of-the-envelope math suggests that the total data rate is maybe 50GB/hour. An entire day of data for an entire country would fit on a single very ordinary commodity server, and the live data for the entire world would fit on one mildly beefy server. The indexes involved sound straightforward.
The primary use case is not analytics. This data store is the system of record in their credit card authorization and billing pipeline, and so it has extreme consistency requirements. The lion’s share of its engineering is to provide consistency across a large spectrum of failure modes.
Old data could probably live at lower cost in a data warehouse, but then developers would have multiple systems and namespaces to deal with in order to query on transactions.
This data store is the system of record in their credit card authorization and billing pipeline, and so it has extreme consistency requirements.
This seems like an odd requirement to me. What are the use cases beyond a few minutes past the end of the ride?
- Reconciliation of Uber's finances. This definitely needs to be correct and consistent and maybe even auditable, but it's also done at the end of any given billing period. A daily roll-up would be sufficient.
- Showing each driver and each rider their own history. This ought to be fully correct (especially for drivers), but it's a massive double-edged sword. Uber's app, for example, appears to be entirely missing the "please forget my ride history" button. And this isn't surprising: Uber apparently stores it in a ledger, fully indexed, complete since at least 2017, that is cryptographically immutable. Is this actually a good thing? What happens when a privacy regulator tells Uber to expunge, anonymize, or pseudonymize old data?
- Handling chargebacks? Having a high quality record going back 180-365 days seems useful. But the cost to Uber of occasionally losing a record is very low and is barely more than proportional to the probability of loss of a given record.
So I don't get it. If I were running an operation like this, I would prefer not to have a fully immutable ledger full of personally identifiable location, usage, and transactional data.
This seems like an odd requirement to me. What are the use cases beyond a few minutes past the end of the ride?
Exactly what he the OP said, authorization. Card networks cap latency of authorization requests (the time the merchant submits a request, to when it makes its make through the merchant processor, networks, issuer processor and back.)
Capture can happen days after successful authorization.
Which is …normal. Different types of data access have different SLA requirements. Any company that cares about cost will warehouse data after x time frame. It’s done even in banking. Uber has a much more lenient need to make this data instantly available.
It’s done even in banking
What are you basing this on, banks have less real time constraints when it comes to certain data but thats rare. You will almost never see a bank, processor, etc. archive their system of record data.
"then developers would have multiple systems and namespaces to deal with in order to query on transactions."
Yes, which they already had as DynamoDB only held 8 weeks of data. Presumably all payments to drivers were calculated monthly and turned into invoices. Any corrections would end up as adjustments on future invoices. Seems pretty normal.
Immutable data is always consistent. This is almost certainly an append-only ledger, a well-established solution for a simple problem.
This. DynamoDB will export to parquet (or csv); s3 select becomes possible for data. More importantly, Athena can query it all
In some countries (e.g. Italy, France) you are required by law to keep records of transactions and other data for 10y. But I agree with the data tiering part. Anything older than X months should go to storage (parquet files in some cloud storage).
I heard from some X-Uber people that you could call Uber a database company as much as you could call it a transportation company. Something like 80+ databases invented there in one form or another.
Promotion-driven development. I suppose better than blog post driven development, but marginally so.
I heard from some X-Uber people that you could call Uber a database company
Someone once called Airbus planes "Sun servers with wings". All businesses today have an IT department and those build around the idea of real-time scheduling done by computers are compute/storage businesses first. That's why Uber was able to launch Uber Eats or e-bikes. I am not surprised they have a bunch of busy devs building cool stuff. Not all of it is ego-driven. I liked their logger for Go last time I used it.
nih-syndrome is a real thing. I'm not saying they don't make cool stuff but the engineering culture that prioritizes creating a new database over using something that already exists, then promoting the people who develop the (70th) new DB over the people who were focused on actually delivering value for the customer, it's not my choice in working environment.
The architecture where all the magic lives in the database layer and the services treat the DB like this magical thing that synchronizes across multiple DCs for you, and takes care of every complicated part of engineering a distributed system is a "now you have two problems" situation. It's a massive complicated system that will fail in novel ways, and you don't have a large community of other users that you can fall back on for expertise.
you don't have a large community of other users that you can fall back on for expertise.
That is one of the most important points of consideration when I am being asked to asses new tech. If the project is not actively supported by the community, I don't recommend it. I just can't expose the client to risk just because there's a new shiny thing.
I suppose better than blog post driven development, but marginally so.
I find DB development quite interesting, and I find Uber's core product quite not interesting (from an engineering perspective).
So as an outsider without any financial stake in the company, please keep writing about databases!
https://news.ycombinator.com/item?id=38300425#38322311
Uber is famous for NIH syndrome
I'm not sure if this is true of modern Google. Certainly there are a ton of custom things around, but it also seems like in the last 5 years there has been a huge push towards not re-inventing more databases/etc, and normalising everything onto a few core technologies. A good example is Spanner, many products have been moved off other home-grown systems onto Spanner, basically to reduce complexity and cost. Sure Spanner is a Google technology, but at least it's one, instead of having all sorts of different esoteric systems, and Spanner is just one example, this is happening in a number of different domains.
I pretty much never see engineering salaries factored into these types of savings projects. I assume because engineers are already viewed as a sunk cost or maybe it’s just because it’s way less tangible. Have seen many designs describe how X saves Y dollars but ignores the engineering effort to maintain and build it. Half the time I suspect it’s just so people have something to work on, rather than it being some critical fix.
That was where my mind went to when I saw the headline. Granted that while I’m not on the Finance side of things and am in fact a developer, “six million” didn’t seem like much at all considering engineer salaries. It’s certainly an achievement, but at what short and long term salaried cost?
$6m across 2.5 years is like $15.5m, how many engineers man-months to breakeven, I'm pretty sure it was worth the work.
$6m in perpetuity is like infinite and engineers can be fired.
It really isn't. The first years dominate the value and later years are worth nothing due to inflation. Google a calculator and use a reasonable discount rate and I suspect you will find that todays value for an infinite perpetuity is a lot less than your intuition might guess. It always surprises me.
But they will save more than $6M the second year because AWS will up their prices.
What?
It costs me $10M to run something every year, it now costs me $4M to run something every year. I have $6M in my pocket every year now in perpetuity. Compound that annually with the assumption that I maintain or increase top line revenues and thats pure extra profit.
Note - I admit, all of this ignores two key things (a) we dont know the engineers salaries who built this and (b) we dont know the ongoing maintenance costs.
A primer on valuation: in many financial contexts, $1 of operating savings may be worth much more than $1 of investment.
That is because an investment is a one-off, so it's actually worth $1, but the savings are recurring, so they are worth the same number of years that a company's profits are valued. Depending on sector and investors' beliefs in the future of companies, this factor is typically in the 5-20x range. That means that $1 of savings is well worth at least $5 of investments.
Factor in anything you want!
If that $1 of investment doesn't yield any returns that's not an investment it's just an expense.
So yes $1 of savings is worth more than $1 of spending.
You could potentially take it as an investment loss for tax purposes. Whether that’s proper depends greatly on the circumstances surrounding how the money was accounted for and spent.
In an ideal org, perhaps. In many places, forming that team starts a process where it continually finds reasons to still exist, so your $1m is yearly until a reorg.
Sigh.
If anything, it reduces Uber's exposure to AWS' proprietary technology. I don't know how to measure how much that's worth but they probably do.
This usually comes from people who have never done a mass migration at scale.
You’re always dependent on your infrastructure. Even if you have nothing but everything hosted on a bunch of VMs, it can take years and millions of dollars to migrate.
No, just use Terraform and Kubernetes is not the answer.
The typical enterprise is dependent on depending on the source between 80 - 120 SaaS products - ie outside vendors.
*> Even if you have nothing but everything hosted on a bunch of VMs, it can take years and millions of dollars to migrate.
I'd assume it takes fewer millions to migrate your own tech stack from AWS to somewhere else than it takes to migrate from AWS proprietary solutions. Is that reasonable?
No because you still have to deal with permissions, integrations with AWS services like networking, training, security audits, regression testing, often physical network connections (Direct Connect), DNS…
And you’re dealing with your PMO department, project managers, finance, security, contract negotiations, retraining your ops department…
And you know that Aurora MySQL instance that was suppose to prevent “lock in”? I bet you someone somewhere in your org thought about creating an ETL job and then said forget it and used “select into S3” to move data from MySQL into S3.
As a project manager trying to ship code so you can show “impact” to put on your promo doc, are you going to choose for your team to spend weeks to write an ETL job to prevent “lock in” or are you going to tell the developer to write that one line of SQL?
There are all sorts of choices you can make that will save time and money and ship features that actually deliver value instead of worrying about the boogie man of “lock in”.
And I really hope that there was some better technical reason than just saving $6 million dollars a year for a multibillion dollar company to go through the migration.
Thanks for the insights. So in the case that it's actually more expensive to migrate your own tech stack somewhere else than, say, migrate from AWS proprietary to GCP proprietary, it seems there might be other reasons.
The difficulty would be worse of course if you depend on anything proprietary from the cloud vendor.
But the main question is, once you do all of this work and spend time to be “cloud agnostic”, does it add business value?
In the case of Dropbox, it made sense to move from the cloud. In the case of Netflix, they decided to move to the cloud.
But you can’t stay completely “cloud agnostic”.
Let’s take a simple case of using Kubernetes and building the underlying infrastructure using Terraform.
The entire idea behind Kubernetes is to abstract your infrastructure - storage, load balancers, etc.
But eventually, you still have to deal with what’s underneath. I used AWS’s own Docker orchestration service for years - ECS. But I just learned Kubernetes last month.
I still had to know how to troubleshoot problems with IAM permissions, load balancers, view CloudTrail logs for permission issues, know how the underlying storage providers worked, make sure I had the right plug installed for K8s to work with AWS’s infrastructure etc.
Once I got all of that figured out, then I could go through the tutorials and mind map the difference between ECS and AWS’s Kubernetes implementation - EKS.
But I had years of experience with AWS. I could have never easily troubleshoot the same types of issues with Azure’s or GCP’s version of K8s. Now multiply that by an entire department.
Once everything is configured correctly, a developers experience would be the same across environments
Migrations at scale are always a pain from one system to another.
Source: I worked at AWS in the Professional Services department for three years. I’m mostly a developer and I dealt with the “modernization” side of “lift and shift and then modernize”.
Indeed. Everyone enjoying the Broadcom / VMWare situation is running around saying how glad they are that it’s “just a bunch of VMs” and this migration will be a walk in the park.
It could be worse. It could be better.
Companies this size almost certainly have different terms of use. I worked for a smaller, but still ASX200 company that had a custom contract, and assigned staff that would drop by 2-3 days a month. Of specific note was that if AWS wanted to stop doing business with us they had to give at least X notice (from memory that was 12 months for us).
For our risk profile this was more than enough time to migrate off any AWS' proprietary technology.
That makes it worth less to avoid exposure.
$6M/y is something like 20 heads (depending on where they are, could be more). So probably it's a win. Hard to see that this could take more than about 5. Add cost of hay and water of course.
is it? if you consider the value 20 engineers could drive instead in that time
if you assume they wouldn't have had anything else meaningful to work on during that time to save money, then you have a different problem in the company. $6M seems like the value 1 engineer can drive in a company at the scale of Uber
You don’t need to consider the cost they could drive during that time. You have a direct and tangible savings for engineering time invested. That possible value they could otherwise derive is moot and hypothetical, this is the real deal!
But if we’re being honest, there isn’t actually any meaningful quantification of engineering time to understand return on investments at this level (not to say there’s none, but it sure does get wish washy). Corporate and engineering strategy isn’t so carefully weighed, and to believe otherwise is to fall victim to the pseudoscience that is software estimation. You just have to estimate directionally if a given proposal has you heading in a better direction in the long term, pursue that, and course correct along the way.
Put another way, the end state justifies the means and resourcing. It’s rarely possible to fully understand either the costs or benefits with much accuracy up front. You slowly put more resources into projects that show promise, and revoke them if the projects do not appear to be heading in a value add direction.
You don’t need to consider the cost they could drive during that time.
You don't need to, but you 100% should. "Opportunity cost" (cost of not doing something) is real.
This is the problem with all refactoring/migration projects. It's very easy to get a lot of people to agree a company should migrate from Node to Go or Monolith to Microservices (or to clean up a mountain of tech debt), but it's much harder to justify the time it takes away from building things your users care about.
True, but often the project that was supposed to build something users care about turns to dust. On one side, you have rosy projections. On the other, a cap on gains, so sure everyone picks the first, but nobody measures if it worked.
One can build a great career working only on key, promising initiatives that never amount to any value in the end. By the time it's clear the project lost money outright, you are on to something else.
It's less that 20 heads. The gross spend for each engineer is probably closer to $0.5 million annually. You can layoff 5% without any impact on the company and save so much more. A company like Uber ($130B market cap) isn't going to bother with building something internally to save $6M/year. The only reason to do it is improve efficiency that actually improves the user experience, which then we're talking about a big deal. Sometimes those things happen only because engineers don't have anything else to do and someone needs a promotion...
A better strategy as a company this size would be to write the PRD for moving and then call AWS and negotiate.
I think it's likely that they tried this. But DynamoDB is expensive to consume probably because it's expensive to run and maintain. If you develop for a particular use case, a lot of optimizations can reduce these costs. For a large enough business, the fixed costs of in-house are easily amortized. It'd be hard for AWS to compete.
Spot on. $6M/annually is not much of a saving for a company like Uber ($130B market cap). It only makes sense if it's also more efficient and actually improves the app.
Yeah in one quarter from a quick google it looks like Uber profits $1.6 Billion. Basically why I never thought about cost savings projects after I learned to put things into perspective.
People really struggle with large numbers in business I've noticed.
A quick look at their careers shows they hiring eng seemingly anywhere but the US so maybe they've already saved the dollars there.
At Google we had a conversion chart between things like memory / cpu and SWE hours, so if you had an idea that would save 8TB of RAM and take 4 hours to do it, you could decide if it would be worth your time or not.
Because they often are a sunk cost.
E.g., you can’t lay off an entire SRE team and have nobody on the on-call rotation. If some of their project work is cost control that is basically free cost savings.
Congrats to anyone who worked on it! However, I'm guessing the cost of just running this team be quite large and not significantly different from the savings (6M), and add on top of it the overhead of maintenance. Payments would not likely be a long-term bet as well, so kind of interesting why teams take up such projects ? Is it some kind of sunk-cost with the engineering teams you already have?
Developing and maintaining a totally bespoke DB system with that kind of volume even for $5m/yr, spitball you could get yourself 25 top-notch engineers without AI PhDs and have another mil left over for metal. Sounds plenty feasible to have a nice tailored suit for a core part of your business.
you could get yourself 25 top-notch engineers without AI PhD
Not in the US though. According to levels.fyi, an SDE2 makes ~275k/year at Uber. Hire 25 of those and you're already at $6.875MM. In reality you're going to have a mix of SDE1, SDE2, SDE3, and staff so total salaries will be higher.
Then you gotta add taxes, office space, dental, medical, etc. You may as well double that number.
And that's just the cost of labor, you haven't spun up a single machine and or sent a single byte across a wire.
Work from home doesn't mean that home has to be in the US.
Then you gotta add taxes, office space, dental, medical, etc. You may as well double that number.
Economies of scale help a bit with this for larger companies, so it's probably not quite double for Uber, but yeah, not too far off as a general rule of thumb. Probably a 75% increase on the employee facing total comp to get fairly close to the company's actual cost for the employee.
"and have another mil left over for metal" was the part accounting for hardware, infrastructure, etc.
And you can fudge the employee salary a mil or two either way, but the point is that spending that much on a team to build something isn't infeasible or even unreasonable.
Is accounting really a core part of Uber's business? They're a transportation company not a bank. I kind of question the premise really
Uber is a technology company that tracks 'rides' between drivers that are contractors and customers, and accounts for taking money from one and giving it to another. I wouldn't just call it a core part, I'd go so far as it say it is the intrinsic essence of their business. They're not a bank, but they're not running a branch with tellers taking cash and running ATMs, either.
They are in the transportation market serving transportation needs for a transportation-seeking customer base. How they accomplish that is obviously interesting, but their attempts to move laterally haven’t been amazing from what I can tell (though I don’t follow them closely).
They are structured and run like a tech company but imo they don’t produce a tech product.
It doesn't sound like they needed to implement a new DB system for this.
This is using existing features of Docstore which is Uber's own DynamoDB (sharded MySQL) which they seem to be using for almost everything.
I'd be curious as well to see a more complete cost-benefit analysis, and I'd be especially interested in labor cost.
We don't know how much time and head count Uber committed to this project, but I would be impressed if they were able to pull this off with fewer than 6-8 people. We can use that to get a very rough lower-bound cost estimate.
For example, AWS internally uses a rule of thumb where each developer should generate about $1MM ARR (annual recurring revenue). So, if you have 20 head count, your service should bring in about $20MM annually. If Uber pulled this off with a team of ~6 engineers, by AWS logic, they should about break even.
Another rule of thumb I sometimes see applied is 2x developer salary. So for example, let's assume a 7-person team of 2xSDE1, 3xSDE2, 1xSDE3, and 1xSTAFF, then according to levels.fyi that would be a total annual salary of $2.3MM. Double that, and you get $4.6MM/year to justify that team annual cost footprint, which is still less than $6MM.
Of course, this is assuming a small increase in headcount to operate this new, custom data store, and does not factor in a potentially significant development and migration cost.
So unless my math is completely off, it sounds to me like the cost of development, migration, and ownership is not that far off from the cost of the status quo (i.e. DynamoDb).
If the savings are 6 million per year, then in later years it should pay off since the development is a one time cost.
The cost doesn't suddenly drop to zero once development is done. Typically a system of this complexity and scale requires constant maintenance. You'll need someone to be on-call (pager duty) to respond to alarms, you'll need to fix bugs, improve efficiency, apply security patches, tune alarms and metrics, etc.
In my experience you probably need a small team (6-8 people) to maintain something like this. Maybe you can consolidate some things (e.g. if your system has low on-call pressure, you may be able to merge rotations with other teams, etc.) but it doesn't go down to zero.
If you follow the various links on the Uber site, you see that they have multiple applications sitting on the same database. see https://www.uber.com/blog/schemaless-sql-database/ . It's not just 1 design of a database, with 1 application on top...
Not an engineer, but something like this takes 6-8 people working on only this for a full year?
That has been my experience, yes. You need one full-time manager, one full-time on-call/pager duty (usually a rotation), and then 4-6 people doing feature development, bug fixes, and operational stuff (e.g. applying security patches, tuning alarms, upgrading instance types, tuning auto-scaling groups, etc. etc.).
Maybe you can do it a bit cheaper, e.g. with 4-6 people, but my point is that there's an on-going cost of ownership that any custom-built solution tends to incur.
Amortizing that cost over many customers is essentially the entire business model of AWS :)
I'm guessing the cost of just running this team be quite large and not significantly different from the savings (6M), and add on top of it the overhead of maintenance
I'm guessing they know a lot about their costs, and you know very little. There's little value in insulting the team members like this.
That was not a nice reply for a non-insult. Do you have anything to add maybe?
That was not a nice reply for a non-insult.
It's an insult if you dismissively explain basic things to the folks working on the project.
I'm guessing they know a lot about their costs, and you know very little.
I'm curious what makes you believe the OP doesn't know about cost? They might be director-level at a large tech company with 20+ years experience for all you know...
There's little value in insulting the team members like this.
I'd argue it's not insulting to question a claim (i.e. 'we saved $6MM') that is offered with little explanation.
Regardless of position at some other company it will tell you precisely 0 about this specific situation.
It’s not insulting to speculate in a conversational way around the errors we very very commonly see
At one end of the spectrum, some people here claim to write this kind of software over a weekend. Some others claim they require a salary of $600,000, and still need nine additional colleagues to pull something like this off.
There is a lot of room in between, where cost estimates are more realistic.
This answer pretty much sums a lot of my experience. Of course when the guy somehow pulls this off in 2 weeks it is seen as an easy side project with proof that it is, haha
This is why incentives favor the heavy bloated enterprise approach: if it looks expensive, people feel like they got something good for their money.
Plenty of things can be prototyped over a weekend, but many will require months and even years to get production-ready, feature-complete, and useful, especially at scale.
The estimate sounds suspiciously similar to just the data storage component of DynamoDB. 1.7PB of data and indexes is about $5.1m/year in DynamoDB storage at list.
Supporting that, Uber’s blog post linked from the article mentions cost savings as a benefit from going from three systems to one, and doesn’t really mention any dollar figure afaict.
https://www.uber.com/en-AU/blog/migrating-from-dynamodb-to-l...
If you read the article the system was a layer on top of DynamoDB they updated it to use internal product Docstore which required adding a feature to Docstore. So it's not as involved as people make it out to be. Also records are immutable which makes a lot of things way easier.
You're assuming that the team only works on this product. It is possible they are owners of a lot more than just 1 db.
Off the self software doesn't make sense for a company that is planning on lasting a long time. These solutions are all designed for multiple use cases. That means that there is complexity and inefficiencies that are not required for your particular problem. If you were to just focus on your problem wouldn't you just end up at an ASIC as the most optimal solution? Reason most software doesn't is 1) people like to re-invent the wheel 2) As you go start going lower level the less qualified people you can find.
Payments would not likely be a long-term bet as well
How so? It’s a pretty ubiquitous problem…
The original[1][2] articles are a better read IMO. The link is just a summary of the two with added spelling and grammatical errors that materially impact the meaning.
1. https://www.uber.com/blog/how-ledgerstore-supports-trillions...
2. https://www.uber.com/blog/migrating-from-dynamodb-to-ledgers...
Seems to happen with all our blog posts that appear on here (I work at Uber) - I don't get why the originals don't get upvoted but these rehashes do - are our titles just not as good?
Other than the comments about titles, the entire blogpost doesn't show for me with ublock. So I'll open it, see a picture of some birds, scroll around for a bit then give up.
That's probably because you are running software that is meant to hide content on a page.
Ad Blocking is recommended by USA government agency for security reasons, not running an ad blocker is a dangerous and suggest lack of information/education about IT stuff.
Agreed, but if legit content gets blocked you only have yourself to blame.
Like turning off JS and saying webapps don't work anymore.
but if legit content gets blocked you only have yourself to blame.
If the bug is on the devs then the devs are to blame, for maybe expecting teh ads are loaded, or the tracking third party code.
The project I am working on works with ad blocker on. Also we had issues with users that had a spellchecking extension active, it would create a ton of hidden markup on a contenteditable element, and we made code to handle the issue instead of having many tickets to our support complaining and we telling them that is their fault for using a popular extension.
And if someone with a js heavy blog asked why it wasn't getting traction on a lynx centered forum they'd probably be told that their content wasn't readable for a portion of the users.
What's the purpose of this comment?
My point is that a random dev running a pretty plain adblock (aren't we all?) simply cannot view their post. This is down to uber, their practices, an external developer and how uber create their blog (they don't just have the content in the page). If I'm not a special case with extremely weird luck, a bunch of devs seeing links to their posts will open them and not see any actual content. They will then, I assume, be less likely to upvote them.
Given that they are seeing problems with posts being upvoted this seems somewhat relevant.
I have no issues reading their blog with uBlock Origin.
You are running software that is blocking content you want to read. That is my point.
If I put on blinders and then complain I can't see your stuff, that's my fault not yours - regardless if your stuff is good or the worst annoying spam ever. If I want to see it for some reason, maybe I should take off the blinders
You are running software that is blocking content you want to read. That is my point.
Yes. It's my point too. I am running very standard software for a dev and it is stopping their dev blog posts being visible.
If I put on blinders and then complain
I'm not complaining. I'm explaining, given the evidence I have, why they may be seeing poor results on HN. If I'm not alone (and since I have no custom setup designed to keep our their blog posts that would be a surprise) then there are other developers who cannot see their posts.
Loads fine for me with ublock. Perhaps you have a custom rule blocking something?
Nothing custom, so it must be on a list somewhere.
edit - it doesn't have to really be blocking the actual post here even, if their loading code breaks when some other tracking code doesn't run, that could explain it.
I have the exact same problem, except on Uber Eats.
Yes, that’s definitely the main reason. It’s called “burying the lede”.
Saving $6M is key information that makes this story interesting. It’s buried all the way at the bottom of the first blog and is completely missing from the second blog which focuses specifically on the migration
TaaS : title as a service
People have done this, eg https://www.reddit.com/r/GrowthHacking/comments/k20g42/ai_to...
However that appears to be defunct now
I'm usually guilty of this. The hands-on person involved in a highly technical project gets excited and bogged down in the details of the project that they end up not being the most compelling storyteller about it.
Don't blame yourself. Not everyone is here for the money, many of us are here for the tech.
i mean it could use a few "blazing fast" sprinkled about
And you can't have blazing fast without rust, and a little kvetching about lifetimes
While your broader point is well taken, isn’t Uber a famous Go shop?
Lol I was not being on topic or constructive - just repeating the meme that rust is synonymous with "blazing fast", because of endless statements to the effect of "rust is blazing fast," or "if you want blazing fast code, use rust," or the endless blazing fast rust libraries:
https://duckduckgo.com/?q=blazing+fast+rust
Now I'm not an expert in either rust or go. But I know my deductive meme logic:
1. Uber's solution is not blazing fast
2. They are a Go house
Then the meme implies:
3. Their solution is slow because they did not use rust!
Q.E.M. (Quod Erat Memonstrandum)
I don't get why the originals don't get upvoted
Because they were never submitted? I looked for the first one, it doesn't seem to be on HN.
Just put all your articles into a customGPT with the examples from the rehashes for each one and then ask the GPT to rewrite your title to the a “rehash” like title for the new posts ;)
Personally, yes, the rehash's title is stronger. It tells a story whose ending piques your curiosity to read more.
"Uber Migrates" (beginning: company that I'm interested in does something) "1T records" (middle: that's a lot of records; I wonder what happened) "from DynamoDB to LedgerStore" (hmm, how do they compare?) "to Save $6M Annually" (end: that's a good chunk of change for me, but was it worth it to Uber? Why did it save that amount? Let me read more)
It's a simple and engaging "there and back again" story that paves the way for a sequel.
Versus:
"How LedgerStore Supports Trillions of Indexes at Uber" (ah, okay, a technology supports trillions of indexes. Moving on to the next article in my feed)
"Migrating a Trillion Entries of Uber’s Ledger Data from DynamoDB to LedgerStore" (ah, a big migration. I'm not sure who did it or whether anything interesting came of it, or even whether it happened or is just theoretical because of the gerund, and moving one trillion of something is cool but not something I probably need to read about right now, so let's move on)
YMMV. Some probably prefer the more abstract/less narrative titles, but the first one is more of an attention grabber for me.
Ok, we've changed to the second link from https://www.infoq.com/news/2024/05/uber-dynamodb-ledgerstore....
Submitters: "Please submit the original source. If a post reports on something found on another site, submit the latter." - https://news.ycombinator.com/newsguidelines.html
So did the engineers who proposed this get some kind of bonus considering how much money they saved the company?
If a project fails, do you pay for the loss since you want a share of the profits as well?
Someone probably gets fired so I guess someone does pay the ultimate price.
Losing your job because the outcome of your efforts (or even external events) is not what I would call the ultimate price.
"The metaverse division has now lost more than $45 billion since the end of 2020"
Your compensation for your work is your salery. So I would say that it's fair that the actual risk taker is benefiting from the potential rewards?
Is it an offer to become a shareholder without actually buying any shares? That would be absolutely great, but unfortunately, it doesn't work this way.
That's the beauty of it, you can choose to spend you money how you want!
You wouldn't want all your earnings to be in stocks, you want liquidity. For example investing your earned money into a public company, or buying food.
The "risk takers" are not taking at any risk at all. What's the chance they end up on the street, or even suffer personal financial stress about their life? That they will have to move, sell their car, home, etc. It's 0%.
What.. they are taking a lot of risk...
But I guess we first have to agree on "who" we are taking about - is it the company itself or the owner / shareholders ?
Back to your question, yes that could happen in several different cases. But of course the risk/benefit is not split 50/50 (nor 0 risk, 100 upside, as you said), in reality the future outcome depends on both internal and external events.
Even the richest(?) man in the world was relatively close to loosing it all;
Musk, who had $200 million in cash at one point, invested “his last cent in his businesses” and said in a 2010 divorce proceeding, “About four months ago, I ran out of cash.” Musk told the New York Times https://www.cnbc.com/2017/04/27/the-crucial-decision-teslas-... https://archive.nytimes.com/dealbook.nytimes.com/2010/06/22/...
Getting to say you led the effort that saved $6M and resulted in some blog posts is probably the reward. At my firm, associating your name to dollars is the fastest way up the corporate ladder.
Exactly. Work should be owned by the workers.
Employees are constantly saving cost or adding value, that's what they are paid for.
Uber migrated all its payment transaction data from DynamoDB and blob storage into a new long-term solution
No way they have 1 trillion transactions right?
1T "records". Any given transaction can have N records. I'm assuming this includes Uber Eats as well.
Still, they have 10B rides in 2023 including Eats, say 75-100B since inception. What would be a record such that each transaction needs 10-15 on average?
Consider it might be quite de-normalized as typical at scale.
Some records for the customer, some for the driver, some for the restaurant...
You might even have a few more.
Eg you might have a record for each stage of the meal. When it's ordered, when it's cooked, when it's delivered, etc.
They need to pay the driver and they need to handle taxes; that alone triples your estimated 100B.
What would be a record such that each transaction needs 10-15 on average?
Does it have to be 1-dimensional? Depends exactly what payments is. There are refunds, discounts, paying e.g. drivers. There are also things like monthly subscriptions people can subscribe to for discounts / unlimited uses. Lots of things add up.
75-100B
This seems low, off the bat. 15 years of Uber, 9 years of Uber Eats.
But even just looking at my most recent trip with Uber, there are 7 different records visible on the receipt. Not including backend recordkeeping that isn't exposed to the user (driver payments, driver loan repayments, revenue recognition, internal fees/records, etc).
Total trip amount, Trip fare, Booking fee, Tip, State fee, Payment #1 (trip itself), and Payment #2 (driver tip)
Now consider Uber Eats where there is (at least) one record for each item in an order...plus tax, tip, etc as always.
Then consider things like wait time charges, subscriptions, split charges, pending charges, chargebacks, refunds, disputes, blah blah blah.
An average of 10 records per customer transaction seems entirely reasonable.
I can see that as transactions with credit cards go through lots of process (withholding, approval, charging, settling, etc..)
The blog post says billions of transactions and trillions of indexes (or rather index entries I presume), if I remember correctly.
Does no one ever delete data? It's hard to believe there's much business value in keeping every individual payment record dating back to 2017.
At an individual level I appreciate when an app or service I use maintains all records from the start of our relationship, I’ve infrequently found myself going back and looking for something, and it’s always a breath of fresh air to see that nothing was deleted.
Sorry for the offtopicness, but please see https://news.ycombinator.com/item?id=40418627 regarding a flamewar that happened over a week ago. It's important that this not happen again.
I was gonna say you could just email me, but I see that I left that field blank, it’s filled now for future use.
In systems that deal with money, money-related data is virtually never deleted. The reason is the fear that deletion can be exploited somehow in the future, rather than the old data being actionable.
For example, if a customer registers with the name of a deleted customer, which will resurface some "unfinished" transactions or rules associated with the older version of the "same" customer that haven't been properly deleted but appeared to be deleted for a while.
Also, in general, deletion is very difficult because money doesn't just disappear. You'd need some sort of compaction (think: Git squash) rather than deletion to be able to balance the system's books... but then you'd be filling the system with fake transactions...
From my experience from working with these kinds of systems, the typical solution is to label entities with active/inactive labels to substitute deletion. But entities never go away.
I agree with you, but there is a plus for deleting old data.
If you are not required to keep the information for more than X years, and you still keep it, then you have to provide it when it's requested.
If you didn't keep it, then it can't be used against you.
If you delete it after it was requested, then you are in trouble.
I'm not sure if they have regulatory obligations to keep them, or what, but it still seems like you could back them up to cold storage after a reasonable period of time.
It might just be an internal policy to cover all the crazy combinations of regs the world over. They might just say 10/20/100 years is their policy, now figure out how to store it.
Payment information is often subject to pretty strict regulatory requirements, including archival durations. Having to keep all the original information for 10 years is not entirely uncommon.
I would gladly pay 6 million/year to not be on call, and have to worry about things like bios and ssd firmware ever again.
Thats a great situation to be in when one can spend 6 million even when there was some chance to save.
I tried same for ready to eat meal everyday to save me from potential kitchen disasters but sadly numbers didn't work out.
You're not saving money, controlling your own destiny sure. That's worth something, maybe even more than 6 million, but I was a SRE at Uber who had to be oncall for systems like this, believe it or not, people like me aren't free either :)
They didn’t say they are running this on prem.
Additionally, it would give us faster network latency because it was running on-premises within Uber’s data centers.
My "favorite" non-cloud issue was dying/dead RAID card batteries in DB hosts (to preserve unflushed cache in RAM on the card in case of power failure).
More power to them. At this point even technically decent teams/companies have given up on developing large, complex systems in favor of SaaS. After carefully evaluating our strategic course of action answer always is AWS.
Its only team who propose alternative they have to justify rigorously how come they differ in conclusion.
Not if you're in the EU, due to, among other things, Schrems II.
AFAIK Schrems II prevents transfers of data to the US.
AWS has datacentres around the world, including multiple locations in the EU.
Where did you learn that?
Schrems II prohibits transfer of personal information to companies reachable by the CLOUD act.
"At this point even technically decent teams/companies have given up on developing large, complex systems in favor of SaaS"
Yeah until those bill come, They would consider alternative
My Amazon stock thanks you.
Another victim of the "Great Normalization", i.e. that entire generation of garbage tech debt generated during the 2010s that was built on NoSQL stores that never should have been, is now coming due. You could probably make an entire consulting business out of migrating these things to MySQL.
. You could probably make an entire consulting business out of migrating these things to MySQL.
I think it will be very complex task to run MySQL for 1PB 1T transactions..
This seemed so obvious at the time, you were throwing away so many useful features for the promise of "web scale". I argued with many developers at the time, but they insisted that NoSQL (Mongo) needed to be used.
This is exactly the comment I came here to make. The NoSql technical debt accretes like dead leaves blocking a sewer drain. Eventually someone has to wade in and normalize…the sewer grate, I guess. Okay, it’s not a perfect analogy.
It seems LedgerStore is not open source [1], and finding any info on it requires following a trail of backlinked Uber blog posts. Here's one with the most info on LedgerStore that I can find, from 2021:
https://www.uber.com/en-US/blog/dynamodb-to-docstore-migrati...
Yeah. This looks like some internal solution. In general Uber seems to be high on "not invented here" scale - they like to conclude no existing Open Source solutions are good enough for them and they need to build their own... this is different from Facebook approach for example which chose to made MySQL better by adding MyRocks/RocksDB to it and keep them Open Source.
It's a weird world where Facebook/Meta is becoming a small bastion of hope. Llama 2/3 being an example of bucking the trend of going closed source for LLM models.
Granted it's not quite in the same calibre as OpenAI/Claude, and the real test is when it is and they still release it.
Don't worry it's 99% likely LedgerStore is built on top of MySQL.
Reading the article it’s clear pretty quickly that Uber was using DynamoDB poorly.
It seems they need strong consistency for certain CUJs and then a lot of data warehousing for historical transactions.
It’s strange to me that they didn’t first convert their 2 table DynamoDB architecture into DynamoDB and Redshift architecture or similar. This is a pretty common pattern.
Can you post some references to this pattern?
I don't understand why they needed 2 weeks of immutable transactions in Dynamo. Could anyone give any hints?
I wonder if they considered https://tigerbeetle.com
Would be interesting. Considering TigerBeetle is written in Zig. And Uber is probably only rare big company which has support contract with Zig foundation.
One interesting caveat is that Uber only uses the Zig toolchain, not the Zig language.
https://www.uber.com/en-US/blog/bootstrapping-ubers-infrastr...
I think there is some reckoning of cloud service providers coming(assuming logical actors...). I was doing some contract work for a small place that had a GCP Bigtable that was costing $11k+ per month for some reports that were based on data from a 375MB !!! mysql db into big-table for the reports to run.
They hired some out of school data scientist to do reports and they were doing crazy ineffective things with the tiny dataset. Wanted me to fix it for pennies tomorrow and I declined.
Not that I disagree with your overall point, but I don't think this
I was doing some contract work for a small place that had a GCP Bigtable that was costing $11k+ per month for some reports that were based on data from a 375MB !!! mysql db into big-table for the reports to run.
Is a good example. It's just a badly architected system, and you'd have exactly the same problem if you were running the same thing on a massively over provisioned on premise db.
I think its a perfect example that cloud providers are profiting on a unsustainable scheme. They rely on their services papering over (literally with money) lack of skill/knowlege of their customers. This tactic either kills the customer or creates someone that is desperate to no longer be your customer.
Is this another outlier when you reach certain scale, it’s more beneficial to roll your own? Pretty amazing what Uber has to deal with.
Also it’s not very clear from the original articles, what is the new total “cost of ownership” of this new refactored service. Like now they need to manage their own databases and the storage backing them. Or did i miss it?
I worked for a company which used Redis at the prototyping phase, but then wrote own database to improve performance and resilience. The company wasn't selling an end-user facing product, the product was a distributed filesystem.
My take on this is that most companies don't have the expertise to build systems like databases, and even if the costs would otherwise suggest such a development as desirable would be simply afraid of doing it.
Say you wanted to build an app on a database like LedgerStore but at much smaller scale, what are the best open source options out there right now?
We have a pretty minimal setup at formancehq/ledger[1] that uses pg as a storage backend, and comes with a programmability layer to model complex transactions (e.g. sourcing $100 from three accounts in cascade).
Every time I've ever used DynamoDB it cost way more than I would have ever expected.
Yes
I read the article so I roughly know what LedgerStore is - but I have no idea where it is hosted.
From one of the original sources linked in this thread
LSG promised shorter indexing lag (i.e., time between when a record is written and its secondary index is created). Additionally, it would give us faster network latency because it was running on-premises within Uber’s data centers.
https://www.uber.com/en-AU/blog/migrating-from-dynamodb-to-l...
I don't know about economics of this particular project but damn dynamodb is expensive. At some point I was thinking that everyone else was just using it wrong, doing scans and queries instead of point-wise lookups into pre-computed tables.
It turns out however that even when you use it as a distributed hashtable you still pay a huge premium.
Why? 120 usd per 100 WCU per year, 30 usd per 100 RCU, does not sound expensive. 1 RCU reads up to 4Kb, to read 100 MBs you would need 100 000 RCU which would cost 30 000 usd/year, or 2500/year. Unless my math is off I don’t think anything come close in terms of price.
Original story looks to be https://www.uber.com/en-AU/blog/migrating-from-dynamodb-to-l...
So they saved $0.000006 per record, it's really about the little things...
Assuming there are a minimum of two teams a total of 20 maintaining this in-house software, I gave 250k as cost per engineer (salary plus health and other benefit costs to the company). Thats $5 million right there. I am estimating lowest range. Thats why Amazon calls these efforts undifferentiated heavy lifting. is there a slight premium to pay than rolling your own and maintaining yes. Its worth all the trouble and security and management overhead into rolling your own.
You look at stuff like that and think about
„how much talent is wasted on pointless things that help noone in the world while getting paid heaps for nothing”
We could accomplish everything if ppl stopped wasting time on pointless tasks.
Does anyone know whether Uber considered Amazon QLDB for the implementation? Seems like it might have been a good fit, at first blush.
The article states that they already had an in house solution for cold data, so one of the benefits they claim is simplifying by moving to one system for both hot and cold data.
There was an era around 2015, when all the cool tech companies like netflix, spotify, soundcloud, uber and others were building alot of infrastructure and database tools. Nowadays, engineers often talk in AWS/Cloud terminologies.
It is breathe of fresh air to see that orgs are still building tools like that.
How much did the migration effort cost?
$6M... isn't that much?
they do 2 billion rides per quarter
what are the 'trillions' here?
^ also that translates to ~1000 transactions per second with some assumptions; have never understood why they care so much about infra scaling
1000 tps is like 1 box
Is there some information on why they need to store this much data for immediate retrieval? And why is it so much?
I think this is fantastic illustration of how expensive proprietary cloud based data stores can be... and what it is feasible to migrate from them to something else.
Wow crazy amount of work went into this. Well done
Given 30.7TB SSD’s are about $5500 each and you’d need 56 to to get to 1.7PB (with no redundancy). Not to mention that SQLite’s maximum DB size is 140TB.
I don’t think you’d be able to fit this much storage into a single machine, especially not for a few thousand a month and SQLite wouldn’t be appropriate for this use-case.
There are 61.44 TB NVMe drives (best price I've seen right now is ~6200. They were ~4800 earlier this year). You can have a 1U server with 32 E1.L slots so you should be able to fit ~1.9PB raw storage into 1U for a little over $200k. Don't know how business financing works, but at 8% interest with a 5 year amortization, that's a bit over $4k/month.
Do you have any good recommendation for such 1U server with 32 slots? Thanks
Supermicro https://www.supermicro.com/en/products/nvme?pro=formfactor%3...
Shows 2U in order to fit 32 drives, but hopefully you can find a 2U slot for this absolute monster of a storage box if you have that need.
This one is 1U and supports dual socket CPU and 6 TB RAM:
https://www.supermicro.com/en/products/system/1U/1029/SSG-10...
I've never done procurement so can't really speak to recommendations. I just see that the products apparently exist.
There's also 32 drive 1U JBOF enclosures for more expansion:
https://www.supermicro.com/en/products/system/1u/136/ssg-136...
Our ops team actually wanted to do this, but we on the project have nightmares from putting 1PB of database on a single host ><
Nimbus ExaDrive is 100TB [1].
[1] https://nimbusdata.com/products/exadrive/
At the moment they're just paying someone else to buy $5000 SSD's and run a database on them at many X markup.
There is no upper bounds to economy of scales. Maybe there is for the cents per GB of raw storage, but power usage, security, rent, and everything else scales too, and few of them have upper bounds on economy of scales.
Economies of scale generally have upper limits. Often when you approach the largest scale the existing market will supply you essentially need to become your own supplier which then runs into span of control issues. The organization needs to become competitive in that new market or their costs increase.
Keep scaling and eventually vertical integration ends up looking like a Soviet style planned economy. Your remote mining town needs some way for people to get soap etc so you open a store with it’s own supply chain etc etc.
Yes, there is an upper limit, and that is the entirety of global economy. But that assertion is as useless as it is true.
Hardly the global economy, even approaching a global monopoly for something tends to hit such limits. Few companies deal with this today because most markets are fragmented.
But even at the 30% marketshare the iPhone has been dealing with these issues for a while. They just can’t buy 200 million volume buttons or whatever off the shelf. Now imagine what happens if one of their suppliers would fail days before the phone launches. They have such tight integration not just with the manufacturer’s process but also their finances because Apple simply can’t get replacements at scale on short notice. And remember that’s at ~30% market share, it just gets worse after that.
Outsourcing doesn't mean off the shelf. Apple still works with third parties to manufacture parts as they can leverage the scale of their overall operations to do it cheaper than it would cost apple to do.
They aren’t just working with from a design or quality control standpoint that’s normal enough, they are also purchasing equipment for 3rd party companies to use which isn’t. At this point they need to be world class experts in everything from batteries, sensors, software, and processors to glass.
Tesla does some stuff in house because they can, but they just didn’t have any options when it came to scaling battery production. They hit the limits of what the market could supply without getting directly involved.
If you install a RAID controller and a couple of disk boxes, it's possible with 1:1 replication, or with backups. 60 disk 3.5" units already exist, so 2.5" SSD racks. It won't be cheap, but will be resilient and fast. Bloody fast if you have the budget.
These systems support zero-downtime snapshots. You tell it to snapshot, it instantly snapshots, you can run a differential/incremental backup at great speeds. Your RAID controller is already caching the hot data, so the impact is minimal.
Except network cost there's no extra disk required. It's just broadcasted writes consumed on the other hand.
These boxes are not dumb JBODS. They support their own replication/backup subsystems, so everything is transparent.
Resilient and fast from a disk perspective, but in practice massively bottlenecked by the fact that Sqlite can only have 1 writer at a time.
StorageReview plays with 2PB flash machines all the time https://www.youtube.com/watch?v=UQMKtlIjeuk
1PB in a rack with spinning rust + flash buffer has been easy for years now.
How would you replicate that SQLite DB onto other hosts to achieve redundancy?
One could use Litestream [1]
[1]: https://litestream.io
What if a continuous replication system has a bug one day, and you realize you are just a bit corrupted and have to rerun? Or is it the same with cloud tools?
That's why you always test your backup. I backup the full sqlite.db every day and test the litestream replication every week. So far litestream have been solid.
Would you care to tell us what your backup and restore policy would be for 1.7 PB of data?
I'm replying to the question of how one would replicate SQLite 3 in production for redundancy. I myself consider 10GB would be the limit for using SQLite 3 in read/write in production and switch to PostgreSQL.
That's a huge discrepancy. One half of HN wants to put petabytes on SQLite, while your limit is only 10GB.
There's only one person in this thread trying to put petabytes in SQLite. Everyone else is telling them the myriad reasons why it's a terrible idea.
Why not use SQLite's own guidance on where SQLite probably isn't appropriate:
- Client/Server applications (Check)
- High-volumes (Check)
- Large datasets (Check)
- High concurrency, particularly for writes (Check)
https://www.sqlite.org/whentouse.html
By the time the TB is restored, time to start the next test
How do you detect restored but bit flipped data ?
I do this in backup testing:
See SQLite3 documentation: https://www.sqlite.org/pragma.html#pragma_integrity_checkSounds like it will take awhile for TB and it checks db integrity not data integrity
Then the same happens as when there is a bug in Aurora's replication. You lose data. I know this from personal experience.
Any open source doing something similar ?
Litestream is open source.
https://github.com/benbjohnson/litestream
Ai. I remember another product and thought it was this. Sorry. Move on and keep up the good work.
No it won't. sqlite "only" works with up to 281TB [0] [1]
[0] https://www.sqlite.org/releaselog/3_33_0.html
[1] https://www.sqlite.org/limits.html (#12)
You can split up into 10 SQLite DB's on this individual server.
You've now implemented sharding on top of SQLite.
Eventually all programs will be able to read email.
Any non-trivial complexity codebase eventually implements a mediocre SQL/Lisp/etc.
Exactly! Why take a system not designed for this sort of scale and force it to scale, rather than use systems which are designed and tested for this scale and volume? All you will do is hackily re-invent all the other things that the other databases had to do to scale to this extent.
Plus size is only one limit, you would be limited to 1 write every few milliseconds. My napkin maths estimate is that there are at least 1-2m writes per hour going into this thing, so probably 300-600 writes / second (Average) and maybe over 1k writes/second peak. We are going to fall over here!
Not sure why some people seem to have a viwe of "There is no scaling problem that can't be solved with a sufficient enough number of SQLite databases".
1 is a scalable, managed, highly available service, with economies of scale the other is a fixed size, capital expenditure with fixed performance, limited DR, requiring a couple of SRE/DevOps and colo
There is also the will it always work question
Once you are splitting up 10 sqlite dbs you have a bespoke distributed system anyways, and you will find yourself doing all the headache of LedgerStore anyways.
Most of the novel work in LedgerStore is probably around managing the headaches of distributed storage, not the persistence layer.
Just storing petabytes of data is not the issue. Managing and querying it reliably is.
Beware of WAL mode, as you sacrifice ACID in this configuration.
https://sqlite.org/lang_attach.html
'Transactions involving multiple attached databases are atomic, assuming that the main database is not ":memory:" and the journal_mode is not WAL. If the main database is ":memory:" or if the journal_mode is WAL, then transactions continue to be atomic within each individual database file. But if the host computer crashes in the middle of a COMMIT where two or more database files are updated, some of those files might get the changes where others might not.'
The value proposition of commercial loud isn't cost savings unless you manage to quantify all of the ancillary and extrinsic factors such as security risk, HVAC, datacenter personnel, and hardware lifecycle. Any well capitalized and organized company could build their own cloud much more cheaply, but really a significant portion of the calculation is outsourcing the risk components.
The problem with outsourcing the risk components is that you don't know for sure whether they are properly taken care of. Major cloud providers have been caught "oopsing" your data, and bam it is gone. Furthermore, they have no incentive to be more efficient about it, they could easily be using 10x the amount of resources necessary, and you wouldn't even have a clue, you're just paying for evermore expensive crap that becomes less reliable over time.
But the cloud providers compete with other other! Look at the efficient market in display in their bandwidth pricing!
For very large customers, the cloud providers do compete with each other on cost. They often pay different prices than are advertised.
Lots of orgs fail to turn money into talent and then talent into products.
It just takes one bad hire at senior level and suddenly your cloud is a vmware install where all machines are boot off network disk, and contention makes the entire thing fall over.
1.7 petabytes on Sqlite?
Sqlite's own advice:
This is the worry IMO. It's fine to dump it on a server with SQLite, but once you start hitting scaling limits, you're in for a potentially rough migration.
Also a bit scary to have a system without a scaling mechnism built-in in the path of customer traffic. At some point you may be racing to upgrade it.
Maybe it could and now you got 99 new Problems. That's why more experienced decision makers won't allow this to happen.
Sometimes things just aren't nails, even when you have a really good hammer.
it will take forever to create that index. Link describes 10B rows dataset.
You wouldn't want to do 1T records on one server even if you could. At that scale, you would prefer to be somewhat distributed for availability and scalability. Also, SQLite has issues at large scale.
A reasonable number for one server is about 32-128 TB, and 1.7 petabytes with some redundancy fits nicely in ~30 servers with a decent distributed database.
Sure but then you get a whole new set of costs and folks you have to hire to maintain that hardware.
One of the main reasons you put up with the annoyances of tuple-based storage like DynamoDB is because you want extremely high availability that simply cannot be provided by one computer in one physical location.