Jepsen: MySQL 8.0.34

Serious question. I have this question for, like 20 years already.

Why would anyone start a new project with MySQL? Is it really superior in anything? I'm in industry for 20+ years and as far as I remember MySQL was always the worst and most popular RDBMS at any given moment.

We use it, we know it and can troubleshoot it if needed, it satisfies our needs and it works. What more do you need?

It also works for others, Github for example.

The only thing I am missing at the moment is a native UUID type so I don't have to write functions that convert 16bit binary to textual representation and back when examining the data manually on the server.

I strongly dislike how people look to github as an example, its the highest appeal to authority.

I know facebook uses mysql, but I also know that it is a bastardised custom version that has known constraints and has limited use (no foreign keys for example).

I spoke to the DBA who first deployed MySQL at Github and the vibe I got from him immediately was that he had doubled down on his prejudice: which is fine, but its not ok to ignore that it can be a lot of effort to work around issues with any given technology.

For a great example of what I mean: most people wouldn’t choose PHP for a new project (despite it having improved majorly) - the appeal to authority there is to say “it works for Facebook” without mentioning “Hack” or the myriad of internal processes to avoid the warts of PHP.

That a large headcount company can use something does not make it immune from criticism.

most people wouldn’t choose PHP for a new project

Most people on HN, or most developers in the world?

PHP is still very popular, and plenty of people start new projects in it all the time.

does not make it immune from criticism

Show me a technology without critics and I'll show you a technology zero people use.

I mean, outside of HN too: new projects are less and less commonly PHP based.

https://madnight.github.io/githut/#/pull_requests/2023/3

I don't think that data is particularly meaningful, unless you're also going to claim that both JavaScript and Ruby are "less and less commonly" used, because they've both had much bigger drops, according to that data.

Pulls, Pushes, Issues and GitHub stars are terrible ways to gauge the popularity of a language.

¯\_(ツ)_/¯

There is no better measure I'm aware of, and I'll take any measure you supply.

I would definitely also argue that Ruby is in pretty significant decline, the majority of Ruby projects were sysadminy projects from the 2010 era and most sysadminy types learned it as an alternative to perl. Web developers who learned it were mostly using Rails which has fallen somewhat out of favour. YMMV obviously, but I can understand it's decline as Python has concretely taken over the working space and devops tools like Chef/Puppet are not en-vogue any longer as Go and Kubernetes/CNCF stuff took the lions share.

Equally: javascript (node, really) is less favourable to many JS devs than Typescript. If you aggregate TS and JS then you'll see that the ecosystem is growing but many people who are JS folks have switched to TS.

I'm taken aback by what you seem to suggest though; Would you seriously claim that most new projects ARE using PHP?

I would happily argue that point with any data you supply, it's completely contrary to my experience and understanding of things and I have a pretty wide and disparate social circle in tech companies.

I'm taken aback by what you seem to suggest though; Would you seriously claim that most new projects ARE using PHP?

No. I didn't say that, and we need to clarify what you meant originally to make sense here.

When you say "most people wouldn't start a project in php", there are two ways to interpret "most" in that sentence: "the majority of" (ie 50%+) or "nearly all of" (ie a much higher percentage). Both are accepted definitions for "most".

I assumed you meant the latter: ie "nearly everyone would not start a project in php", which is what I disagree with, because the former makes little sense in context.

If you did in fact mean "a majority of people would not start a project in php" then of course I agree because that sentence can be substituted to mention any programming language in existence and still be true, because none are ever so dominant over all others in terms of popularity, that more than half of all new projects are written in said language.

it's a little bit hair splitty, but I see what you might be trying to get at.

What I tried to convey is that PHP is not enjoying the development heyday it once had, and the numbers of people choosing PHP for a new project today (even among people who learned development with PHP) is decreasing. It's not popular.

let's try to leave it as: "I believe PHP to be in decline for new projects as a share of total new projects divided by the total number of developers who are starting new projects".

Ruby has had a huge decline in the past ten years, IMO.

Also, note that TypeScript is tracked separately from Javascript, which is likely part of its decline. I wouldn't be surprised if JS backends are ultimately declining as well (perhaps Go and Python are taking its place?)

Ruby has had a huge decline in the past ten years, IMO.

People keep saying that. Ruby has had a "huge decline" if you look at the percentage of commits on Github over the last decade [1], a decrease of more than a factor of 3. However, in that same decade Github has grown (much) more than a factor of 3. So the total number of Ruby commits on GitHub has grown substantially. That's not really what I would call a huge decline.

[1] https://madnight.github.io/githut/#/pull_requests/2023/3

most people wouldn’t choose PHP for a new project

Is this really true?

I used to be a full-time PHP developer but I personally don't touch that language anymore. But it's still very popular around the world, I've seen multiple projects start this year use PHP, because that's the language the founders/most developers in the company are familiar with. Probably depends a lot on where in the world you're located.

Last Stack Overflow survey had ~20% of the people answering the survey saying that they still use PHP in some capacity.

The beauty of PHP is that it is stateless and the end of the run, everything is freed. It is difficult to have memory leaks.

Personally, I like using Typescript/Javascript on both front end and backend, but I don’t look down at PHP backends at all. And it’s come a long way as a language.

I’ve been a fan of rolling your own stdlib as the semantics there are old and weird, but vscode tells you so who cares anymore.

a native UUID type so I don't have to write functions that convert 16bit binary to textual representation and back when examining the data manually on the server

MySQL 8 adds the `BIN_TO_UUID()` function (and the inverse, UUID_TO_BIN), and supports the quasi-standard bit swapping trick to handle time-based UUID's in indexed columns.

Having worked at GitHub, we used MySQL because we've always used MySQL. It works because the business depends on making it continue to work, and switching off at this point is a multi-year effort. Herculean efforts have gone into scaling MySQL, and that is not without its own issues (outages, maintenance cost, etc.).

Isolation level consistency is not a problem I heard anyone talk about, but that's probably because most devs interact with the database via Active Record which is not exactly known for its transactionality guarantees (and is, of course, a source of yet another set of problems).

As opposed to what? Postgres? Isn't InnoDB most performant for read-heavy apps?

MyISAM is actually considerably faster (than InnoDB) for read heavy apps.

InnoDB is comparatively slow, but you get much better transactionality (IE; something that is much closer to ACID compliance). Row level locking is faster for inserts than table level locking, but table level locking is faster for reads than row level locking.

Regardless: Both storage engines do not scale with core count as effectively as postgres due to some deadlocking on update that I have witnessed with MySQL. (not that Postgresql is the only alternative btw).

MyISAM is not a transactional storage engine even to begin with, so saying that you get "much better transactionality with InnoDB" or "MyISAM is actually considerably faster" is either wrong or at best comparing apples to oranges.

Both storage engines do not scale with core count as effectively as postgres due to some deadlocking on update that I have witnessed with MySQL.

Strange take since a deadlock is rather an exceptional event you want never to occur so deadlocking, in algorithm design, wouldn't be considered a reason one would say that the implementation does not "scale with the core count". Whether or not the algorithm scales with the core count is for many other different reasons but not deadlocks.

Considering the "scale with the core count" design problem, Postgres process-per-connection architecture makes it a much less viable option than, say, MySQL so this is wrong as well.

Well, I’ve literally observed it (circa 2016 and design has not significantly changed with this in that time) and confirmed my finding with percona.

deadlock was the wrong terminology to use, apologies, I keep writing from my phone as I am travelling at the moment: I meant lock contention, specifically in memory. A deadlock would be a hard stop but what I observed was a bottleneck on memory bandwidth past a certain number of cores (24) with update heavy workloads.

So, appreciate your points but I don't think I am wrong in the core thesis of my statements. x :)

You would not be able to saturate the memory bus if you have a lock contention. Having a lock contention is usually exhibited in under-utilizing your CPU compute and memory bandwidth resources. So, hitting the limit of the available memory bandwidth still sounds like a misrepresentation of the issue you stumbled upon.

There is no excuse to be using MyISAM instead of InnoDB in 2023. It was a scarcely forgivable mistake in 2013.

The read performance advantages of MyISAM are solved better by using SSDs instead of HDDs.

MyISAM will cost you dearly when performing actions like "trying to create a new replica of an existing database".

First, MySQL is the "devil you know". If you've spent a decade working exclusively with MySQL quirks, you're just gonna be more comfortable with it regardless of quality.

MySQL also tends to be faster for read-heavy workloads and simple queries.

Also replication is easier to setup with MySQL in my (outdated) experience, even though it's gotten better with Postgres recently and I haven't really been able to compare them myself since I'm just using Amazon RDS Postgres these days and haven't had the need to setup master-master replication (which is the pain point in postgres, and was pretty straightfoward with mysql the last time I worked with it). Setting up read-replicas with postgres is still ezpz.

Postgres specific features tend to be much better than MySQL ones, Postgresql JSON(b) support blows MySQL out of the water. And as far as I can remember MySQL still doesn't support partial/expression indexes, which is a deal breaker for me. Especially in my json heavy workloads where being able to index specific json paths is critical for performance. If you don't need that kind of stuff, you might be fine - but I would hate to hit a wall in my application where I want to reach for it and it's not there.

MySQL used to be the only game in town, so it was the "default" choice - but IMO postgres has surpassed it.

And as far as I can remember MySQL still doesn't support partial/expression indexes, which is a deal breaker for me. Especially in my json heavy workloads where being able to index specific json paths is critical for performance.

Do generated column indexes meet this need?

    CREATE TABLE json_with_id_index (
       json_data JSON,
       id        INT GENERATED ALWAYS AS (json_data->"$.id"),

       INDEX id (id)
    )

https://dev.mysql.com/doc/refman/8.0/en/create-table-seconda...

Looks like that would work as an expression index, though i can't tell at a glance if this requires the column to also be stored which would increase storage size (but probably isn't a huge problem if it is). But that likely won't work for dealing with the partial index case where you're only wanting to keep the ones that aren't null in the index to reduce the size (and speed up null/not null checks).

MySQL supports indexing expressions directly, which is effectively the same as indexing an invisible virtual column: https://dev.mysql.com/doc/refman/8.0/en/create-index.html#cr...

MySQL supports "multi-valued indexes" over JSON data, which offer a non-obvious solution for partial indexes, since "index records are not added for empty arrays": https://dev.mysql.com/doc/refman/8.0/en/create-index.html#cr...

MariaDB doesn't support any of this directly yet though: https://www.skeema.io/blog/2023/05/10/mysql-vs-mariadb-schem...

I suppose this is a decent workaround for certain things (i've used it in sqlite before), the main kind of index i'm using with postgres jsonb looks something like this

    create index on my_table(document ->> 'some_key') where (document ? 'some_key' AND document ->> 'some_key' IS NOT NULL);

you can use generated columns to get around the first part of the index, but you can't have the WHERE part of the index in mysql as far as I am aware (but it has been a very long time since I've worked with it so I'm prepared to be wrong).

MySQL has some aggregation performance over postgres. Having done a recent migration of an application two things that come to mind are:

- its case insensitive by default, which can make filtering simpler, without having to deal with a duplicate column where all values are lower/upper cased. - MySQL implements loose index scan and index skip scan, which improves performance of a number of join aggregation operations (https://wiki.postgresql.org/wiki/Loose_indexscan)

| its case insensitive by default

This is obviously up for debate, but subjectively I find this to be an absolutely terrible design decision.

I agree it's debatable. And not intuitive at first.

With that said, in all my years and thousands of tables across multiple jobs, I have yet to see a single case where I had to change a table to be case sensitive. So I guess for me it is a sensible default.

Your memory is failing you maybe you don't remember not too long ago when PG did not have any replications built-in.

Not too long ago MySQL didn't have transactions.

Edit: I would just love a comment from the person who thinks 'missing feature in the past' is wrong, unfair or irrelevant as a reply to a 'missing feature in the past' comment.

There's a non-trivial nine-year difference between the things you're describing: the InnoDB storage engine was released in 2001. Postgres gained built-in replication in 2010.

That said, personally I wouldn't describe either of these as "not too long ago". Technology rapidly changes and many things from either 2001 or 2010 are considered rather old.

Also from an operations point of view it's quite easy to manage. I'm not that experienced with Postgresql, but my understanding is that until recently you had to vacuum it every once in a while. Besides, it's also using some kind of threading model that most people handle by putting a proxy in front of Postgres to keep connections open.

Also, Mysql has had native replication for a very long time, including Galera which does two-step commit in a multimaster cluster. Although Postgres is making some headway in this regard, it is my impression that this is only quite recent and not yet fully up to par with Mysql yet.

my understanding is that until recently you had to vacuum it every once in a while.

You still do. The auto Vacuum daemon was added in 2008ish, so it isn't too bad. Just more complexity to manage.

it's also using some kind of threading model

It does a process per connection just like web servers did back in the day when C10k was a thing. A lot of the buffers are configured per connection so you can get bigger buffers if you keep the number of connections small.

I think what you reference is known as Transaction ID Wraparound, Postgres still needs to be vacuumed to avoid that problem: https://www.crunchydata.com/blog/managing-transaction-id-wra...

This is a great question, and if the choice is between mysql and postgres, I would like to make the case that despite the current popular momentum behind postgres, mysql is a better default. Please note I'm not saying mysql is better, but in the absence of any other criteria, I would suggest stating with mysql.

I have a few reasons for this view, but they mostly revolve around operational complexity. From a developer's point of view postgres is fantastic. Far saner SQL dialect, tons of great features. When it comes to operations though, that's where mysql has the edge, and ops is half of using a database - it's an important facet for a business to consider.

As other commenters have mentioned, postgres requires careful tuning of the autovacuum process, otherwise it can't keep up as the workload grows.

Postgres has a far more advanced query planner, but it comes at the cost of potentially blowing up your app at 3am, and it gives you no tools to patch in a quick fix while you address the root cause. This frankly ignores the reality of operating a business. Sometimes you need a quick fix, even if that might lead to users developing bad habits. Yes there is the pg_hint_plan extension, but that still only helps you later after the problem had happened. You can't pin a query plan. To me the ideal situation would be for postgres to continue to use the old query plan, but emit some structured log to tell you it thinks it's now suboptimal. But I digress.

Thirdly, postgres has no way to have an index clustered table. This lets you trade a small cost on write for greater page locality when reading related rows. Postgres let's you do this as a one time operation that takes the table offline for the duration, which isn't sufficient if you need it.

Fourthly, mysql is still easier to upgrade. You will need to upgrade your database at some point. Mysql has great support for upgrade in place, as well as using replication to build a new db. Mysql replication has always been logical replication, which has tradeoffs of course, but what it buys you is the ability to replicate across different versions. Pg's logical replication still has a bunch of sharp edges.

Ok this rant is long enough already, but I do want to emphasise that this isn't hating on postgres. I know it's controversial to be recommending mysql over postgres, but I do think the ops concerns win out.

Ps the orioledb project is fantastic and I hope it one day becomes the default for postgres.

Isn't working in the first place an ops concern?

Not up to date, but a decade ago MySQL supported pluggable storage engines and so had had some good non-default choices. It was possible to fit really big databases onto small boxes using tokudb, for example.

This doesn't explain why it was so popular for starting new small projects, but people were also choosing mongodb at that time too, so ymmv :)

Nowadays postgres has grown a lot of features but I believe it is still behind on built-in compression?

Added: this old blog post of mine is still getting traffic 10 years later; probably still valid https://williame.github.io/post/25080396258.html

Others mentioned a few reasons already, but compared to postgres (because typically that's the other option) I'll add index selection. Even with the available plugins and stats and everything, I don't want to in an emergency situation spend time trying to indirectly convince postgres that it should use a different index. "A query takes 20x the time and you can't force it back immediately" is a really bad failure mode.

Is it really superior in anything?

Yes, its replication support out of the box is decades ahead of Postgres.

(Mongodb has an even better replication story than Myqsl, but Mongo isn't a real database.)

I don't want to be subjective or start any war, but only to broaden my horizons and perspective. Saying this I want to clear one thing - I use multiple engines, trying to best match one for the problem I'm solving.

What do you recommend these days?

Why would anyone start a new project with MySQL? Is it really superior in anything?

It's the most developer-friendly thing out there. Particularly for a datastore CLI, which is inherently something you use rarely, MySQL's is just a lot nicer, more discoverable.

I think it has the least bad HA story among (free) traditional SQL-RDBMSes too (not that I understand why anyone would start a new project on a traditional SQL-RDBMS at all).

I am going to be controversial for a hot second, and say that in many ways MySQL is a more advanced and better implemented database at its core. Disclaimer: as a developer I love and prefer Postgres. But I've been on many projects where MySQL won for ops-related reasons.

Postgres has a MVCC implementation that is recognized as inferior[0] to what MySQL and Oracle do, and requires dealing with vacuuming and all of its related problems.

Postgres has a process-based connection model that is recognized as less optimal than the thread-based one that MySQL has. There are ongoing efforts[1] to move Postgres to a thread-based model but it's recognized as a large and uncertain undertaking.

Other commenters have also explained the still very noticeable difference in replication support, the lack of query planner hints, the less intuitive local tooling.

One thing to keep in mind is that both databases keep evolving, and old prejudices won't take us far. Postgres is improving its performance and replication support with each release. MySQL 8.0 added atomic DDL and a new query planner (MariaDB did their own query planner rework in 11.0, widening their differences). Both are improving their observability. So the race is far from over. But I definitely wouldn't count MySQL out.

[0] https://ottertune.com/blog/the-part-of-postgresql-we-hate-th... [1] https://www.postgresql.org/message-id/flat/31cc6df9-53fe-3cd...

Just want to add, that comparing to postgresql is a very modern view. There were other databases, not popular today, but quite popular back in the day. To name a few: DB2, InterBase, Firebird, Paradox, Access, SQL Server Compact. MySQL was a really shitty database in early 2000s, still THE most popular.

- It works well enough.

- It scales up fine for 99+% of companies, and for the ones that need to scale beyond that there are battle tested solutions like Vitess.

- It is what people know already so they don't have to learn anything new.

Same reason why people still make new websites in PHP I guess. It's not fancy but it works fine and won't bring any unwelcome surprises.

Once upon a time it was easi-ish to scale out, and was simple to use and fast. There was Percona as well. These days, who knows? Might still be true.

I understand why the default transaction isolation level of most DBMS is weaker than serializable (it's for benchmark purposes), but I'd argue the best default is serializable. Most DBMS users don't even know there are many consistency models [1]. They expect transactions to "just work," i.e. to appear to have occurred in some total order, which is the definition of serializability [2]. And to some who know when to use a weaker isolation level for better performance can always set it per transaction [3].

---

[1] https://jepsen.io/consistency

[2] https://jepsen.io/consistency/models/serializable

[3] https://www.postgresql.org/docs/16/sql-set-transaction.html

It has a very high cost in terms of performance.

And serializable transactions fail all the time. You have to always code so that re-running them is trivial and expected. 99% of the queries I write are fine at the lowest transaction level, and that saves me and the DB lots of time.

If your database client is any good, it should do the retries for you. EdgeDB uses serializable isolation (as the only option), and all our bindings are coded to retry on transaction serialization errors by default.

Transaction deadlocks are another common issue that is triggered by concurrent transactions even at lower levels and should be retried also.

I'm curious how you can handle transaction deadlocks at a low level - there might have been a lot of non-SQL processing code that determined those values and blindly re-playing the transactions could result in incorrect data.

We handle this by passing our transaction a function to run - it will retry a few times if it gets a deadlock. But I don't consider this to be very low level.

“We handle this by passing our transaction a function to run - it will retry a few times if it gets a deadlock. But I don't consider this to be very low level.”

Oh neat, I was just thinking about something like this the other day.

You should anyway...

If you don't, you sooner or later get presented with unexpected 'transaction aborted due to deadlock' errors in prod. Better have someone who's already been through that then, at the very least.

Is there any measurement of the impact you could point at?

For example, I imagine that it depends on the workload. If the workload isn't contentious SERIALIZABLE might not make a big difference? Then again if the workload isn't contentious maybe it doesn't matter?

Either way, I'd love to see numbers. Not because I don't believe anyone but I'm just curious what ballpark we're talking about.

Edit: Also, SQLite and Cockroach only allow SERIALIZABLE transactions so the unviability of SERIALIZABLE seems questionable.

Cockroach only allow SERIALIZABLE transactions so the unviability of SERIALIZABLE seems questionable.

We've actually been hard at work on adding Read Committed and Repeatable Read isolation into CockroachDB. The risks of weak isolation levels are real, but they do have a role in SQL databases. We did our best to avoid the pitfalls and inconsistencies of MySQL and even PostgreSQL by defining clear read snapshot scopes (statement vs. transaction).

The preview release for both will be dropping in Jan. Some links if you're interested: - RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RF... - Hermitage test: https://github.com/ept/hermitage/blob/master/cockroachdb.md

Edit: Also, SQLite and Cockroach only allow SERIALIZABLE transactions so the unviability of SERIALIZABLE seems questionable.

SQLite is unviable in a lot of use cases. Also transactions are mostly a joke anyway, they were completely broken in MySQL for years and no-one cared, real systems don't actually use them much.

You're right it is workload dependent. If you're low write but read heavy, you won't see huge differences in performance between RR and Serializable, so it can make sense to shift exclusively to that. The last benchmark here shows some of that with Postgres if you're looking for numbers (not an exhaustive test by any stretch): https://lchsk.com/benchmarking-concurrent-operations-in-post...

SQLite is single writer, so transaction isolation is easy, writes are linear by their very nature.

Cockroach does some really funky stuff, but its serialization guarantees are only within certain conditions. Traditionally it has also had low write throughput compared to other systems, mainly due to its distributed nature. Jepsen touches on that here https://jepsen.io/analyses/cockroachdb-beta-20160829 though things have vastly improved since then.

To your earlier point, it may not even matter depending on the workload, or if you're aware of your database limitations. In cases where it does matter then being aware of the limitations of something like Repeatable Read makes the trade-off worth it.

The performance hit is likely worth it—considering that the alternative is inconsistent data. In most cases correctness is more, if not much more, important than performance. As I said, those who know what they're doing can always use a weaker isolation in cases where performance is more important than correctness.

As soon as you let users edit data, you can't really benefit from serializable transactions.

Partly because you don't really want arbitrary long transactions that span however long the user wants to be editing for.

Partly because it's rather rude to roll back all the users edits with a "deadlock detected, please reload the form and fill it out again".

IME in this model, the middleware-DB transaction were set to be serializable, but the web-user-edits were done under an optimistic concurrency model, using versions or timestamps. You’d run into edit conflicts, which for many applications is a reasonable compromise.

The DB transactions would need to be kept open for user edits only if one were using a pessimistic model.

Am I thinking about this correctly?

Most systems I've worked on would just let users completely overwrite eachother and would neither hold open a transaction nor use versioning. For those that didn't behave this way, I think versioning is the sanest option (as long as requirements permit it).

I, uh, do want to point out that the alternative here is not "everything is OK". If you don't abort when, say, two users update the same row concurrently, then you might cause (e.g.) silent data loss for one of them. Or you might end up with a record in an illegal state--say, one with two different fields that should never be in their particular states together. You have to look at your transaction structure, intended application invariants, and measured frequency of concurrency to figure out if using a relaxed isolation level is actually safe or not.

Serializable by default doesn't seem feasible when you really dive into the concept.

"Serializable" is a system property that describes how two or more transactions will take effect. In this context, I would define "transaction" as a business activity with a clear beginning, middle & end and exhibiting specific, predictable data dependencies. Without any knowledge of the transaction type(s) and their semantics per the business domain, it would be impossible to make assumptions about logical ordering of anything.

SQLite is the closest thing to what you are asking for. All writes are serialized by default, but this is probably not what you really want. We can ensure multiple concurrent connections don't corrupt the data files, but we aren't achieving anything in business terms with this.

transaction in the way everyone else here is using it is referring to the primitive provided by the database which gives certain guarantees (depending on isolation level) wrt reads and writes.

Even in the context of "transaction" the business activity, they are an extremely useful tool for building up exactly the kind of sequencing and dependency guarantees you refer to.

Serializable by default doesn't seem feasible when you really dive into the concept.

I was expecting you'd argue for a weaker isolation level than serializable, but then you said:

Without any knowledge of the transaction type(s) and their semantics per the business domain, it would be impossible to make assumptions about logical ordering of anything.

Serializable isolation level only guarantees some total order of transactions, and yes, it doesn't guarantee that the order will be exactly what you want (e.g. first come, first serve). So, are you now suggesting strict serializability [1], then?

[1] https://jepsen.io/consistency/models/strict-serializable

Snapshot is good enough. And then do all your related table changes in a single atomic txn and you’re good

That depends! As the article discusses, snapshot allows anomalies--like write skew--which might violate application invariants. Depends on your workload.

> can always set it per transaction [3]

I just found out yesterday[1] that in Postgres “serializable” transactions can still have anomalies if other non-serializable transactions are running in parallel! So check your DBMS very carefully before trying this, I guess.

[1] https://news.ycombinator.com/item?id=38685267

I agree, and for the same reason that people should only use relaxed consistency models for atomics in their code if they really know what they are doing, there really is a need and they have appropriate testing. It's good that the option is available, but the headaches that you can get yourself if you don't know exactly what you are doing are real. Good luck debugging these types of Heisenbugs. You'll need it.

I kind of agree, but given the amount of locking[1] that would entail, the drop-off in performance would have those self-same users squalling about how slow it was . And then they had rewrite everything with a (NOLOCK) without understanding the implications are even worse ("hey guys, I put this hint everywhere and things run really fast now!"). I know this because I've seen it.

IIRC Jim Gray said that Repeatable Read is 99% of Serialisable anyway, all serialisable does is hide phantoms.

[1] speaking for MS SQL, which is mainly locking based.

In my experience, most developers don't even consider isolation level in the first place and just take whatever the default is. Any race conditions are met with an 'oh that's weird', and then they move on.

I wish I could argue with you, but the highly successful early years of MongoDB proves your point nicely.

Mongodb did not have those concerns because work on a single document is atomic and there are no joins.

This depends on what you mean by "atomic". Prior to 5.0, MongoDB's defaults were to use a sub-majority write concern. This allowed all kinds of interesting atomicity violations, even on single documents. For instance, you could write a value, some clients might read it, and then your write would be silently lost as if it never happened. Clients could disagree on whether the write happened or not. A single client could observe, then un-observe the write. It gets weird. :-)

This is actually an evolved model, as the system has to handle data loss as a primitive upfront.

Similar to how Rust forces you to reason about allocators.

Are you saying data loss is a positive feature of MongoDB because it forces systems that use it become more robust to deal with the data loss?

Nah they're super different. Rust failing at compile time means you fix the bug before you ship it, which is helpful. MongoDB failing in production means either you fix the bug via testing before shipping, or you discover the bug after shipping, which is at best extra work and at worst disastrous.

Exactly! That's why I just commented [1] that the default isolation level should be serializable.

[1] https://news.ycombinator.com/item?id=38696421

Completely agree, I've made that exact same argument on HN before.

It becomes so hard to reason about isolation issues that most things below serializable consistency will lead to you getting bitten in various ways, so I would argue most developers shouldn't even consider isolation levels. And that MySQL and some others provide too little guarantees for the average dev.

In my experience almost no developer considers consistency at all.

I have been advocating for the longest time that "repeatable read" is just a bad idea. Even if implementations were perfect. Even when it works correctly in the Database, it is still very tricky to reason about when dealing with complex queries.

I think two isolation levels that make sense are either:

* read committed

* serializable

You either go all the way to have a serializable setup, where there are no surprises. OR, you go in read committed direction where it is obvious that if you want have a consistent view of the data within a transaction, you have to lock the rows before you start reading them.

Read committed is very similar to just regular multi-threaded code and its memory management, so most engineers can get a decent intuitive sense for it.

Serializable is so strict that it is pretty hard to make very unexpected mistakes.

Anything in-between is a no man's land. And anything less consistent than Read Committed is no longer really a database.

Read only snapshot queries are very useful in practical systems.

Here is my issue with this. We are assuming you need to read a "consistent snapshot" in some type of real time application. Because if it isn't real time, you can always have those snapshot type of querying on "replicas", since that is a lot easier to implement correctly without sacrificing performance.

So assuming you are looking at reading "consistent snapshot" in the context of a real time transaction. If the data that you want to read as a "consistent snapshot" is small, locking + reading is good enough in most cases.

If the data to read is too large (i.e. query takes long time to execute, and pulls a lot of data), you are going to have ton of scaling issues if you are depending on something like "repeatable read". Long running transactions, long running queries, etc are bane of all the database scaling and performance.

So you really want to avoid that anyways, you would almost always be much better of changing your application logic to make sure you can have much shorter, time bounded transactions and queries and setup better application level consistency scheme. Otherwise you will at some point hit scaling/performance problems and they will be an absolute nightmare to fix.

Under MVCC this does not require locks which is a non trivial overhead savings for use cases that can tolerate slightly stale data, but want that data to be internally consistent, ie no read skew (and write skew is irrelevant in read only queries). This combination of strict serializable + read only snapshots is quite common with recently developed databases.

Read Uncommitted is fine for aggregate stats, although at that point it would be better to pipe the data into Clickhouse.

Yeah there are many use cases where you aren't really using transactions, or where hard accuracy isn't a requirement (you're taking a percentage out of millions of rows) but you need fast write performance. It's very nice and can help you put off transitioning to a "real" data warehouse or a cache for a while.

But you wouldn’t need to lock them if repeatable read actually worked

That is sort of my point. I think "repeatable read" is a fools gold. You think you wont need to do locking, but it is too easy to make incorrect assumptions about what guarantees "repeatable read" provides and you can make very subtle mistakes which leads to rare, extremely hard to diagnose correctness issues.

Repeatable read type of setups also make it much easier to accidentally create much longer running transactions, and long running transactions/too many concurrent open transactions/etc can create really unexpected, very hard to resolve performance issues in the long run for any database.

I honestly don't think I've seen people reason about read committed well, especially as an application grows it becomes very difficult to understand all the cases in which locks are grabbed/data is accessed. (I feel that way about multi-threaded code and locks too, but that's for another time).

So I really only see serializable to be the only sane isolation model (for r/w transactions), and snapshot isolation is a good model for readonly transactions (basically you get a frozen in time snapshot of the database to work with). This also happens to be the only modes in which Spanner gives you: https://cloud.google.com/spanner/docs/transactions

"SELECT ... FOR UPDATE" seems to be the answer to all these issues right? Lock the rows you're going to be updating and suddenly everything works as advertised.

If you want performance to completely tank, sure.

Afaik, it only locks the row(s) in question. Do you have a setup where the same row is being updated by multiple clients all the time?

SELECT FOR UPDATE means two queries can’t _read_ the same row at the same time, right?

To my understanding, yes. If you have broad select queries across all the rows, that would slow things down quite a bit.

You can use SKIP LOCKED normally. That’s how popular queue implementations usually work. I believe pg even calls this out in their docs

SQL Server will escalate locks. Because keeping a million rows locked is expensive.

This can easily happen if you have multiple services that can write/update in the same table/rows and rely on database coordination instead of using an external queue like Kafka.

Postgres (presumably other databases) can propagate these locks to table locks [1] and cause contention for the whole infra.

[1] https://blog.heroku.com/curious-case-table-locking-update-qu...

In general, stuff that locks rows tends to "snap" values into existence regardless of repeatable read.

If you want to update a record based upon data in another record, you should do a locking read on that something else and maybe the record you're updating. If you run an sql query to update a record based upon some other record using a single query, MySQL will lock both for you anyways.

If you need to update something based upon multiple something elses, in my experience that's very deadlock prone. Instead you should lock some kinda locking record, then do a repeatable read on the data you want, then do an update.

The point in time of the repeatable read isn't established until you perform a consistent read. Select... for update isn't a consistent read. So it works perfectly fine in the face of concurrency while not locking dozens or hundreds of rows using a normal SQL update.

How much of what is contained within this analysis of MySQL is going to be the same-same for MariaDB, given that it uses InnoDB as the default storage engine?

I don't want to speak out of school, because I've never tried to boot up Jepsen for anything, but in theory the purpose of publishing the code for the experiment is that one can replicate its findings in your own environment to see if it impacts you. Yes, I'd guess that custom FUSE will be a PITA to configure but my experience with the AWS RDS setups for kicking the tires on MariaDB is (ahem) just money versus costing huge amounts of glucose

The FUSE stuff actually isn't too bad--Jepsen goes to a lot of trouble to make all this stuff automatic. The test harness pulls dependencies, compiles LazyFS, and mounts the filesystem for you. Just pass `--lazyfs` at the CLI. :-)

I think they've been diverged long enough that we can consider them separate products at this point. (14 years!)

You wouldn't assume that Plex and XMBC shared much compatibility despite forking around the same time.

I guess I was wondering how much of this behavior is endemic to MySQL, per se, and how much was endemic to InnoDB.

Per the article, MariaDB was also tested.

  > We designed a small test suite for MySQL using the Jepsen testing library at version 0.3.4. We used the mysql-connector-j JDBC adapter as our client. We tested MySQL 8.0.34, and MariaDB 10.11.3 on Debian Bookworm. Our tests ran against a single MySQL node as well as binlog-replicated clusters with one or two read-only followers, without failover. We also ran our test suite against a hosted MySQL service: AWS’s RDS Cluster, using the “Multi-AZ DB Cluster” profile. This is the recommended default for production workloads, and offers a binlog-replicated deployment of MySQL 8.0.34 where secondary nodes support read queries.

Most concerning to me is how practically none of these had anything to do with “distributed computing”. It seems that MySQL in single-server mode is still liable to corrupt data with nontrivial workloads.

Basically all of it. MariaDB exhibits every class of anomaly we found in MySQL, all in single-node deployments.

how about that, I planned to do some work today.

aphyr, thank you. call me maybe and later jepsen.io have been consistently some of the best content I've ever read on the internet.

Aw shucks, thanks :-)

I don't use Jepsen, but I love your blog! Your "x the Technical Interview" series are my absolute favorite.

Seriously, thanks for what you do!

I’ve been reading your stuff for almost 10 years and doing work at this level of rigor makes the world a better place.

The RDS replication that stopped working after 5min messing with it, with no alert of failed health check is a bit worrying...

I can assure you, you can not trust any AWS health checks to be a primary alert for something down. You have to do it all yourself, on host, or inside the container.

AWS/Rackspace support just say: "It's your problem as we don't manage what is inside the AWS service".

Obviously the devil's in the details, and it's almost impossible to troubleshoot from a screencast, but my experience has been that AWS is generally pretty liberal with the CloudWatch Metrics, but does place the onus upon the user to dig through the 150++ of them to read the docs to find the one that matters. They also claim <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_...> there's a console table cell for the replication status, but my experience with the console is that often one must opt-in to having that column shown which is suboptimal :-(

That "shared responsibility model," they lean on it heavily

What would be helpful is not just comparison to theoretical definition to isolation modes but rather comparison to other popular relational databases - PostgreSQL, MS SQL, Oracle ? Something developers need to mind if they want to assure compatibility

Beyond the Kleppmann Hermitage work that is linked in this post?

https://github.com/ept/hermitage

Woah that is a great resource, I've been looking for something like this which shows an easy way to exemplify the various anomalies. Thanks for highlighting it!

I appreciate the write-up and the nod to AWS RDS. However, I was wondering if there was any focus on AWS Aurora (MySQL)? For those that don't know, AWS build a protocol compatible database platform that pretends to be MySQL or PostgreSQL. It would be interesting to see if Aurora MySQL has the same "features" as RDS or even MariaDB.

No, because that would be a totally different DB engine with different concurrency issues. Although that would also be super interesting to read, and my intuition is that because Aurora is a much newer DB, it probably has some subtle issues that haven't been discovered yet versus MySQL, which is quite old by now.

Aurora MySQL is based heavily on MySQL/InnoDB's codebase. It's not a complete reimplementation from scratch.

My guess would be that it exhibits some or all of these same issues [edit to add: see footnote 2]. With a single-node cluster, I don't ever recall reading anything about Aurora offering different MVCC or isolation level semantics than upstream InnoDB.

AWS documentation says "These isolation levels work the same in Aurora MySQL as in RDS for MySQL" [1] and keep in mind standard non-Aurora RDS is much closer to unmodified upstream MySQL.

That said, there are some unique wrinkles in Aurora's cluster behavior due to the shared storage. For example, if you keep the default isolation level of repeatable read, long-running queries on Aurora Replicas will inherently block purge of old-row versions on the whole cluster. In contrast, a traditional MySQL replica set (using async binlog replication) does not behave that way, because each replica has its own storage and own purge threads.

[1] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

[2] Re-reading the Jepsen results, it appears all of these anomalies come from the exact same documented InnoDB behavior: "If you update some rows in a table, a SELECT sees the latest version of the updated rows, but it might also see older versions of any rows" as per https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-re... -- and presumably Aurora maintains the same behavior, since otherwise it would break compatibility with MySQL in extremely subtle and confusing ways.

How does append (a) map onto actual SQL operations on the given tables? Are the TEXT fields being used as lists?

Also… I’ve been issues in MySQL repeatable read mode where a single SELECT, selecting a single row, returned impossible results. I think it was:

    SELECT min(value), max(value) FROM table WHERE id = 1;

where id is a primary key. I got two different values for min and max. That was a fun one.

Yup! See https://jepsen.io/analyses/mysql-8.0.34#list-append, which also has a link to the code: https://github.com/jepsen-io/mysql/blob/4c239cb5c66a7f1a55fa...

This isn't CONCAT-specific, BTW--we just use CONCAT because it allows us to infer anomalies in linear, rather than exponential time. Same kinds of behaviors manifest with plain old read/write registers.

Facinating read. I think it is a great illustration to show how many "practically working systems" can be built on the foundation exhibiting so many consistency artifacts

Most systems are practically broken and it’s worked around by human factors

Related: ACIDRain for the repeatable read without explicit "select .. for update" locking gotcha gift that still keeps on giving: https://news.ycombinator.com/item?id=20027532

FOSSDEM-2024 :

Isolation Levels and MVCC in SQL Databases: A Technical Comparative Study

// Oracle, MySQL, SQL Server, PostgreSQL, and YugabyteDB.

https://fosdem.org/2024/schedule/event/fosdem-2024-3600-isol...

In 2022 Jepsen commissioned the University of Porto’s INESC TEC to develop LazyFS: a FUSE filesystem for simulating the loss of un-fsynced writes

I love this; what a great example of pushing the state of the art forward. Kudos!