HN comments for: Jepsen: Datomic Pro 1.0.7075

adrianco

26 replies

22h0m

2024-05-15 20:26:34 UTC

I was a fly on the wall as this work was being done and it was super interesting to see the discussions. I was also surprised that Jepsen didn’t find critical bugs. Clarifying the docs and unusual (intentional) behaviors was a very useful outcome. It was a very worthwhile confidence building exercise given that we’re running a bank on Datomic…

belter

10 replies

20h24m

2024-05-15 22:02:34 UTC

I was also surprised that Jepsen didn’t find critical bugs.

From the report..."...we can prove the presence of bugs, but not their absence..."

cdchn

4 replies

18h17m

2024-05-16 00:10:12 UTC

"Absence of evidence is not evidence of absence."

andreareina

1 replies

17h8m

2024-05-16 01:19:11 UTC

If you've looked, it is. The more and the better you look, the better evidence it is.

kelseyfrog

0 replies

16h36m

2024-05-16 01:50:58 UTC

If you run it through bayes theorem, it adjusts the posterior very little.

nine_k

0 replies

2024-05-16 18:25:21 UTC

s/evidence/proof/.

Evidence of absence ("we searched really carefully and nothing came up") does update the Bayesian priors significantly, so the probability of absence of bugs can now be estimated as much higher.

kelseyfrog

0 replies

16h38m

2024-05-16 01:49:23 UTC

Thank you. I've updated my initial guess of p(critical bugs | did not find critical bugs) from 0.5 to 0.82 given my estimate of likelihood and base rates.

jupp0r

3 replies

17h21m

2024-05-16 01:06:31 UTC

In practical terms, if you are a database and Jepsen doesn't find any bugs, that's as much assurance as you are going to get in 2024 short of formal verification.

stuarthalloway

1 replies

3h8m

2024-05-16 15:19:11 UTC

Formal verification is very powerful but still not full assurance. Fun fact: Testing and monitoring of Datomic has sometimes uncovered design flaws in underlying storages that formal verification missed.

nine_k

0 replies

2024-05-16 18:22:17 UTC

What kind of flaws? I would expect performance problems.

pests

0 replies

14h29m

2024-05-16 03:58:04 UTC

The work antithesis has been doing here has me really excited as well.

vasco

0 replies

19h50m

2024-05-15 22:36:44 UTC

That's consistent with the usual definition of "finding" anything.

killingtime74

6 replies

19h22m

2024-05-15 23:05:23 UTC

Did you not do this work yourself before you started running the bank on it?

cdchn

5 replies

18h16m

2024-05-16 00:11:30 UTC

I doubt any organization that isn't directly putting lives on the line are testing database technology as thoroughly and competently as Jepsen. Banks jobs are to be banks, not be Jepsen.

killingtime74

4 replies

15h11m

2024-05-16 03:16:08 UTC

I would have thought they would be more rigorous, since mistakes for them could threaten the very viability of the business? Which is why I assume most are still on mainframes. (Never worked at a bank)

raverbashing

0 replies

5h22m

2024-05-16 13:05:18 UTC

Banks are not usually ran by people who go for the first fad.js they see ahead; they usually also can think ahead further than 5 min.

Also, I'm sure they engineer their systems so that every operation and action is logged multiple times and have multiple redundancy factors.

A main transaction DB will not be a "single source of truth" for any event. It will be the main source of truth, but the ledger you see in your online bank is only a simplified view into it.

harperlee

0 replies

13h35m

2024-05-16 04:51:49 UTC

Banks exist since a long time before computers existed, and thus have ways to detect and correct errors that are not purely technological (such as double entry bookkeeping, backups, supporting documentation, different processes). So a bank can survive a db doing nasty things on a low enough frequency such that is not detected beforehand, so they don’t need to “prove in coq” that everything is correct.

cdchn

0 replies

11h40m

2024-05-16 06:47:15 UTC

Mistakes don't threaten them that much. When Equifax (admittedly not a bank) can make massive negligent fuckups and still be a going concern there isn't much heat there. Most fuckups a bank make can be unwound.

Foobar8568

0 replies

13h23m

2024-05-16 05:03:58 UTC

Anyone who has worked in a bank and is glad of its solutions is either a fool, clueless or politician.

Banks have to answer to regulation and they do by doing the bare minimum they can get away with.

fiatjaf

5 replies

8h41m

2024-05-16 09:46:10 UTC

What bank is that, if I may ask?

swah

3 replies

8h26m

2024-05-16 10:01:20 UTC

First brazilian fully digital bank, got pretty big in a decade.

I'd love to hear the story from the first engineers, how they got support for this, etc. They never did tech blog posts though...

stuarthalloway

0 replies

7h14m

2024-05-16 11:12:50 UTC

Ed Wible is a founder of Nubank and chose Datomic. He and Lucas Cavalcanti gave a talk on it at Clojure/conj 2014.

https://www.youtube.com/watch?v=7lm3K8zVOdY

puredanger

0 replies

3h36m

2024-05-16 14:51:02 UTC

Ed's post when Cognitect joined Nubank is still a great read: https://building.nubank.com.br/welcoming-cognitect-nubank/

jonahbenton

0 replies

6h12m

2024-05-16 12:15:25 UTC

There are some videos, both of the start and of their progress. Some of the most impressive work I have ever seen, remarkable.

loevborg

0 replies

8h36m

2024-05-16 09:50:40 UTC

https://building.nubank.com.br/functional-programming-with-c...

SOLAR_FIELDS

1 replies

2h51m

2024-05-16 15:36:29 UTC

Given that Rich Hickey designed this database the outcome is perhaps unsurprising. What a fabulous read - anytime I feel like I’m reasonably smart it’s always good to be humbled by a Jensen analysis

nine_k

0 replies

2024-05-16 18:20:36 UTC

A good design does not guarantee the absence of implementation bugs. But a good design can make introducing bugs harder / less probable. This must be the case, and then it's a case to study and maybe emulate.

amgreg

24 replies

23h21m

2024-05-15 19:06:18 UTC

It struck me that Jepsen has identified clear situations leading to invariant violations but Datomic’s approach seems to have been purely to clarify their documentation. Does this essentially mean the Datomic team accepts that the violations will happen, but don’t care?

From the article:

From Datomic’s point of view, the grant workload’s invariant violation is a matter of user error. Transaction functions do not execute atomically in sequence. Checking that a precondition holds in a transaction function is unsafe when some other operation in the transaction could invalidate that precondition!

stuarthalloway

9 replies

22h4m

2024-05-15 20:22:37 UTC

As Jepsen confirmed, Datomic’s mechanisms for enforcing invariants work as designed. What does this mean practically for users? Consider the following transactional pseudo-data:

[

[Stu favorite-number 41]

;; maybe more stuff

[Stu favorite-number 42]

]

An operational reading of this data would be that early in the transaction I liked 41, and that later in the transaction I liked 42. Observers after the end of the transaction would hopefully see only that I liked 42, and we would have to worry about the conditions under which observers might see that 41.

This operational reading of intra-transaction semantics is typical of many databases, but it presumes the existence of multiple time points inside a transaction, which Datomic neither has nor wants — we quite like not worrying about what happened “in the middle of” a transaction. All facts in a transaction take place at the same point in time, so in Datomic this transaction states that I started liking both numbers simultaneously.

If you incorrectly read Datomic transactions as composed of multiple operations, you can of course find all kinds of “invariant anomalies”. Conversely, you can find “invariant anomalies” in SQL by incorrectly imposing Datomic’s model on SQL transactions. Such potential misreadings emphasize the need for good documentation. To that end, we have worked with Jepsen to enhance our documentation [1], tightening up casual language in the hopes of preventing misconceptions. We also added a tech note [2] addressing this particular misconception directly.

[1] https://docs.datomic.com/transactions/transactions.html#tran...

[2] https://docs.datomic.com/tech-notes/comparison-with-updating...

aphyr

4 replies

21h50m

2024-05-15 20:37:27 UTC

To build on this, Datomic includes a pre-commit conflict check that would prevent this particular example from committing at all: it detects that there are two incompatible assertions for the same entity/attribute pair, and rejects the transaction. We think this conflict check likely prevents many users from actually hitting this issue in production.

The issue we discuss in the report only occurs when the transaction expands to non-conflicting datoms--for instance:

[Stu favorite-number 41]

[Stu hates-all-numbers-and-has-no-favorite true]

These entity/attribute pairs are disjoint, so the conflict checker allows the transaction to commit, producing a record which is in a logically inconsistent state!

On the documentation front--Datomic users could be forgiven for thinking of the elements of transactions as "operations", since Datomic's docs called them both "operations" and "statements". ;-)

stuarthalloway

2 replies

20h58m

2024-05-15 21:28:50 UTC

Mea culpa on the docs, mea culpa. Better now [1].

In order for user code to impose invariants over the entire transaction, it must have access to the entire transaction. Entity predicates have such access (they are passed the after db, which includes the pending transaction and all other transactions to boot). Transaction functions are unsuitable, as they have access only to the before db. [2]

Use entity predicates for arbitrary functional validations of the entire transaction.

[1] https://docs.datomic.com/transactions/transactions.html#tran...

[2] https://docs.datomic.com/transactions/transaction-functions....

lgrapenthin

1 replies

18h49m

2024-05-15 23:38:07 UTC

Somewhat unrelated ad docs: It appears that "Query" opens a deadlink

JB024066

0 replies

4h45m

2024-05-16 13:42:17 UTC

Thanks for the report! just fixed the link.

Voultapher

0 replies

20h39m

2024-05-15 21:47:32 UTC

The man the myth the legend himself. I haven't ceased to be awed by how often the relevant person shows up in the HN comment section.

Loved your talks.

puredanger

3 replies

21h15m

2024-05-15 21:12:05 UTC

Datomic transactions are not “operations to perform”, they are a set of novel facts to incorporate at a point in time.

Just like a git commit describes a set of modifications, do you or should you want to care about which order or how the adds, updates, and deletes occur in a single git commit? OMG no, that sounds awful.

The really unusual thing is that developers expect intra-transaction ordering to be a thing they accept from any other database. OMG, that sounds awful, how do you live like that.

cdchn

1 replies

18h12m

2024-05-16 00:14:39 UTC

Do developers not expect intra-transaction ordering from within a transaction?

kccqzy

0 replies

17h7m

2024-05-16 01:20:03 UTC

It depends on the previous experience of said developers, and such expectation varies widely.

voganmother42

0 replies

20h12m

2024-05-15 22:14:33 UTC

Nested transactions or savepoints also exist in other systems

aphyr

9 replies

23h6m

2024-05-15 19:21:00 UTC

Yeah, this basically boils down to "a potential pitfall, but consistent with documentation, and working as designed". Whether this actually matters depends on whether users are writing transaction functions which are intended to preserve some invariant, but would only do so if executed sequentially, rather than concurrently.

Datomic's position (and Datomic, please chime in here!) is that users simply do not write transaction functions like this very often. This is defensible: the docs did explicitly state that transaction functions observe the start-of-transaction state, not one another! On the other hand, there was also language in the docs that suggested transaction functions could be used to preserve invariants: "[txn fns] can atomically analyze and transform database values. You can use them to ensure atomic read-modify-update processing, and integrity constraints...". That language, combined with the fact that basically every other Serializable DB uses sequential intra-transaction semantics, is why I devoted so much attention to this issue in the report.

It's a complex question and I don't have a clear-cut answer! I'd love to hear what the general DB community and Datomic users in particular make of these semantics.

refset

5 replies

19h39m

2024-05-15 22:47:41 UTC

I don't know whether it was intentional or not, but IIRC DataScript opted for sequential intra-transaction semantics instead.

stuarthalloway

2 replies

5h35m

2024-05-16 12:52:20 UTC

It is worth noting here that Datomic's intra-transaction semantics are not a decision made in isolation, they emerge naturally from the information model.

Everything in a Datomic transaction happens atomically at a single point in time. Datomic transactions are totally ordered, and this ordering is visible via the time t shared by every datom in the transaction. These properties vastly simplify reasoning about time.

With this information model intermediate database states are inexpressible. Intermediate states cannot all have the same t, because they did not happen at the same time. And they cannot have different ts, as they are part the same transaction.

refset

1 replies

3h35m

2024-05-16 14:52:06 UTC

Thank you for the explanations. Do you happen to know why transactions ("transaction requests") are represented as lists and not sets?

stuarthalloway

0 replies

2h54m

2024-05-16 15:32:37 UTC

When we designed Datomic (circa 2010), we were concerned that many languages had better support for lists than for sets, in particular list literals and no set literals.

Clojure of course had set literals from the beginning...

huahaiy

0 replies

17h5m

2024-05-16 01:21:44 UTC

Correct. I don't know about DataScript's intention, but it is intentional for Datalevin, as we have tests for sequential intra-transaction semantics.

aaroniba

0 replies

33m

2024-05-16 17:54:25 UTC

Yes. Perhaps this is a performance choice for DataScript since DataScript does not keep a complete transaction history the way Datomic does? I would guess this helps DataScript process transactions faster. There is a github issue about it here: https://github.com/tonsky/datascript/issues/366

nickpeterson

2 replies

22h57m

2024-05-15 19:30:13 UTC

I feel like “enough rope to shoot yourself” is kind of baked into any high power, low ceremony tool.

stuarthalloway

1 replies

21h24m

2024-05-15 21:03:20 UTC

As a proponent of just such tools I would say also that "enough rope to shoot(?) yourself" is inherent in tools powerful enough to get anything done, and is not a tradeoff encountered only when reaching for high power or low ceremony.

nickpeterson

0 replies

6h17m

2024-05-16 12:10:09 UTC

I always loved the broken phrase because it implies something really went terribly wrong ;)

aaroniba

2 replies

13h17m

2024-05-16 05:10:14 UTC

I think the article answers your question at the end of section 3.1:

"This behavior may be surprising, but it is generally consistent with Datomic’s documentation. Nubank does not intend to alter this behavior, and we do not consider it a bug."

When you say, "situations leading to invariant violations" -- that sounds like some kind of bug in Datomic, which this is not. One just has to understand how datomic processes transactions, and code accordingly.

I am unaffiliated with Nubank, but in my experience using Datomic as a general-purpose database, I have not encountered a situation where this was a problem.

aphyr

1 replies

12h30m

2024-05-16 05:56:38 UTC

This is good to hear! Nubank has also argued that in their extensive use of Datomic, this kind of issue doesn't really show up. They suggest custom transaction functions are infrequently written, not often composed, and don't usually perform the kind of precondition validation that would lead to this sort of mistake.

aaroniba

0 replies

43m

2024-05-16 17:43:42 UTC

Yeah, I've used a transaction functions a few times but never had a case where two transaction functions within the same d/transaction ever interacted with each other. If I did encounter that case, I would probably just write one new transaction function to handle it.

SoftTalker

0 replies

22h36m

2024-05-15 19:50:58 UTC

Sounds similar to the need to know that in some relational databases, you need to SELECT ... FOR UPDATE if you intend to perform an update that depends on the values you just selected.

amluto

5 replies

15h25m

2024-05-16 03:01:48 UTC

I wonder if Datomic’s model has room for something like an “extra-strict” transaction. Such a transaction would operate exactly like an ordinary transaction except that it would also check that no transaction element reads a value or predicate that is modified by a different element. This would be a bit like saying that each element would work like an independent transaction, submitted concurrently, in a more conventional serializable database (with predicate locking!), except that the transaction only commits if all the elements would commit successfully.

This would have some runtime cost and would limit the set of things one could accomplish in a transaction. But it would remove a footgun, and maybe this would be a good tradeoff for some users, especially if it could be disabled on a per-transaction basis.

lgrapenthin

4 replies

14h53m

2024-05-16 03:34:21 UTC

I wouldn't use it. The footgun is imaginary. I use Datomic for ten years and I can assure you that I never stepped on it. As a Datomic user you see transactions as clean small diffs, not as complicated multi step processes. This is actually much more pleasant to work with.

amluto

2 replies

14h26m

2024-05-16 04:00:48 UTC

Now I’m curious: what’s a useful example of a Datomic transaction that reads a value in multiple of its elements and modifies it?

lgrapenthin

0 replies

13h31m

2024-05-16 04:55:58 UTC

You could include two transaction functions that constrain a transaction to different properties about the same fact and then alter that fact. I don't know of a practical usecase or that I ever encountered that, it would be extremely rare IME.

hlship

0 replies

12h18m

2024-05-16 06:08:57 UTC

In traditional databases, only the database engine has a scalable view of the data - that’s why you send SQL to it and stream back the response data set. With Datomic, the peer has the same level of read access as the transactor; it’s like the database comes to you.

In this read and update scenario, the peer will, at its leisure, read existing data and put together update data; some careful use of compare and set, or a custom transaction function, can ensure that the database has not changed between read and writes in such a way that the update is improper, when that is even a possibility - a rarity.

At scale, you want to minimize the amount of work the transactor must perform, since it so aggressively single threaded. Off loading work to the peer is amazingly effective.

aphyr

0 replies

12h23m

2024-05-16 06:04:24 UTC

This is also good to hear! I'm not sure whether I'd call it a "footgun" per se--that's really an empirical question about how Datomic's users understand its model. I can say that as someone with some database experience and a few weeks of reading the Datomic docs, this issue actually "broke" several of the tests I wrote for Datomic. It was especially tricky because the transactions mostly worked as expected, but would occasionally "lose updates" or cause updates intended for one entity to wind up assigned to another.

Things looked fine in my manual testing, but when I ran the full test suite Elle kept catching what looked like serious Serializability violations. Took me quite a while to figure out I was holding the database wrong!

luc4sdreyer

3 replies

11h57m

2024-05-16 06:30:04 UTC

I'm a bit worried that most of the links on https://www.datomic.com/ are broken.

ndr

2 replies

9h21m

2024-05-16 09:05:55 UTC

Any one in particular? I just clicked through some of them and they all worked for me except for one.

From https://www.datomic.com/ -> "Getting Started" points to the wrong https://docs.datomic.com/operation/datomic-overview.html instead of the correct https://docs.datomic.com/datomic-overview.html

fulafel

0 replies

4h6m

2024-05-16 14:21:25 UTC

https://docs.datomic.com/overview/storage.html at least (Linked from "Cassandra and DynamoDB" in the article)

JB024066

0 replies

3h23m

2024-05-16 15:03:43 UTC

I think we just fixed that one. Sorry for the hiccups!

koito17

3 replies

22h46m

2024-05-15 19:40:52 UTC

This is the first time I try reading a Jepsen report in-depth, but I really like the clear description of Datomic's intra-transaction behavior. I didn't realize how little I understood the difference between Datomic's transactions and those of SQL databases.

One thing that stands out to me is this paragraph

  Datomic used to refer to the data structure passed to d/transact as a “transaction”, and to its elements as “statements” or “operations”. Going forward, Datomic intends to refer to this structure as a “transaction request”, and to its elements as “data”.

What does this mean for d/transact-async and related functionality from the datomic.api namespace? I haven't used Datomic in nearly a year. A lot seems to have changed.

stuarthalloway

2 replies

20h50m

2024-05-15 21:37:23 UTC

Datomic software needed no changes as a result of Jepsen testing. All functionality in datomic.api is unchanged.

klysm

1 replies

17h50m

2024-05-16 00:37:29 UTC

Congrats, that is a rare outcome!

aphyr

0 replies

12h35m

2024-05-16 05:51:50 UTC

Yeah, I think this is next to Zookeeper as one of the most positive Jepsen reports. :-)

jwr

3 replies

9h34m

2024-05-16 08:52:41 UTC

This is a fantastic detailed report about a really good database. I'm also really happy to see the documentation being clarified and updated.

As a side note: I so wish Apple would pay for a Jepsen analysis of FoundationDB. I know Aphyr said that "their tests are likely better", but if indeed Jepsen caught no problems in FoundationDB, it would be a strong data point for another really good database.

mdaniel

2 replies

3h4m

2024-05-16 15:22:46 UTC

I would never, ever want to take food out of aphyr's mouth, but is there something specific that makes either just creating the Jepsen tests somehow out of reach of a sufficiently motivated contributor, or is so prohibitively expensive that a "gofundme-ish" setup wouldn't get it done?

I (perhaps obviously?) am not well-versed in that space to know, but when I see "wish $foo would pay for" my ears perk up because there is so much available capital sloshing around and waiting on Apple to do something is (in my experience) a long wait

SOLAR_FIELDS

1 replies

2h45m

2024-05-16 15:41:36 UTC

I have heard from people who paid for a Jepsen test that he is eye wateringly expensive (and absolutely, rightfully should be, there are very few people in the world that can conduct analyses on this level) but maybe achievable with a gofundme.

I am not sure, for the same reason, that designing a DIY Jepsen suite correctly is really achievable for the vast majority of people. Distributed systems are very hard to get right, which means that testing them is very hard to get right as well.

PeterCorless

0 replies

15m

2024-05-16 18:11:34 UTC

He provides a good and unique service. He's worth every penny. Note that for some companies, the real "expense" is dedicating engineering hours to fix the shit he lit on fire in your code.

thom

2 replies

21h55m

2024-05-15 20:31:54 UTC

I’ve not really spent much time with Datomic in anger because it’s super weird, but is any of this surprising? Datomic transactions are basically just batches and I always thought it was single threaded so obviously it doesn’t have a lot of race conditions. It’s slow and safe by design.

rtpg

1 replies

10h40m

2024-05-16 07:46:33 UTC

Well the example of "incrementing x twice in the same transaction leads to x+1, not x+2" seems pretty important! I imagine you gotta be quite careful!

stuarthalloway

0 replies

7h23m

2024-05-16 11:04:00 UTC

What does the following expression return?

(let [x 1] [(inc x) (inc x)])

In Clojure the answer is [2 2]. A beginner might guess [2 2] or [2 3]. Both are reasonable guesses, so a beginner needs to be quite careful!

But that isn't particularly interesting, because beginners always have to be quite careful. When you are learning any technology, you are a beginner once and experienced ever after. Tool design should optimize for the experienced practitioner. Immutability removes an enormous source of complexity from programs, so where it is feasible it is often desirable.

poidos

1 replies

15h53m

2024-05-16 02:34:03 UTC

Really nice work as always. I love reading these to learn more about these systems, for little tidbits of writing Clojure programs, and for the writing style. Thanks for what you do!

aphyr

0 replies

12h36m

2024-05-16 05:51:04 UTC

Thank you!

khalidx

1 replies

2h3m

2024-05-16 16:23:57 UTC

Oh, boy, have I been waiting for this one! I've been building my own datomic-like datastore recently and this is going to be useful. Reading it now.

I enjoyed the MongoDB analyses. Make sure to check it out too as well as the one for Redis, RethinkDB, and others.

Would be great if there was an analysis done for rqlite/dqlite or turso/libsql at some point in the future.

otoolep

0 replies

1h51m

2024-05-16 16:36:21 UTC

rqlite creator here. There was a Jepsen-style analysis done with rqlite[1] 2 years ago:

https://www.philipotoole.com/testing-rqlite-read-consistency...

The report itself: https://github.com/wildarch/jepsen.rqlite/blob/main/doc/blog...

[1] https://www.rqlite.io

bfors

1 replies

3h19m

2024-05-16 15:08:04 UTC

For those who aren't aware, the name Jepsen is a play on Carly Rae Jepsen, singer behind "call me maybe". In my opinion a perfect name for a distributed systems research effort.

ndr

0 replies

3h3m

2024-05-16 15:23:55 UTC

I thought the project was old, what I didn't realise is how old the song is.

First commit is from 2013. https://github.com/jepsen-io/jepsen/tree/4b112e7046a20efa80a...

And the song is from 2011. Time flies.

fulafel

0 replies

3h27m

2024-05-16 14:59:53 UTC

The data model in Datomic is pretty intuitive if you're familiar with triple stores / RDF. But these similarities aren't very often referenced in by the docs or online discussions. Is it because people are rarely familiar with those concepts, or is the association with semantic web things considered potentially distracting, (or am I missing something and there are major fundamental differences)?

baq

0 replies

23h0m

2024-05-15 19:27:08 UTC

aphyr you bastard I've got work to do today.

CrazyPyroLinux

0 replies

22h29m

2024-05-15 19:58:08 UTC

aphyr had given some conference talks on previous analyses (available on youtube) that are informative and entertaining