Serious question. I have this question for, like 20 years already.
Why would anyone start a new project with MySQL? Is it really superior in anything? I'm in industry for 20+ years and as far as I remember MySQL was always the worst and most popular RDBMS at any given moment.
We use it, we know it and can troubleshoot it if needed, it satisfies our needs and it works. What more do you need?
It also works for others, Github for example.
The only thing I am missing at the moment is a native UUID type so I don't have to write functions that convert 16bit binary to textual representation and back when examining the data manually on the server.
I strongly dislike how people look to github as an example, its the highest appeal to authority.
I know facebook uses mysql, but I also know that it is a bastardised custom version that has known constraints and has limited use (no foreign keys for example).
I spoke to the DBA who first deployed MySQL at Github and the vibe I got from him immediately was that he had doubled down on his prejudice: which is fine, but its not ok to ignore that it can be a lot of effort to work around issues with any given technology.
For a great example of what I mean: most people wouldn’t choose PHP for a new project (despite it having improved majorly) - the appeal to authority there is to say “it works for Facebook” without mentioning “Hack” or the myriad of internal processes to avoid the warts of PHP.
That a large headcount company can use something does not make it immune from criticism.
Most people on HN, or most developers in the world?
PHP is still very popular, and plenty of people start new projects in it all the time.
Show me a technology without critics and I'll show you a technology zero people use.
I mean, outside of HN too: new projects are less and less commonly PHP based.
https://madnight.github.io/githut/#/pull_requests/2023/3
I don't think that data is particularly meaningful, unless you're also going to claim that both JavaScript and Ruby are "less and less commonly" used, because they've both had much bigger drops, according to that data.
Pulls, Pushes, Issues and GitHub stars are terrible ways to gauge the popularity of a language.
¯\_(ツ)_/¯
There is no better measure I'm aware of, and I'll take any measure you supply.
I would definitely also argue that Ruby is in pretty significant decline, the majority of Ruby projects were sysadminy projects from the 2010 era and most sysadminy types learned it as an alternative to perl. Web developers who learned it were mostly using Rails which has fallen somewhat out of favour. YMMV obviously, but I can understand it's decline as Python has concretely taken over the working space and devops tools like Chef/Puppet are not en-vogue any longer as Go and Kubernetes/CNCF stuff took the lions share.
Equally: javascript (node, really) is less favourable to many JS devs than Typescript. If you aggregate TS and JS then you'll see that the ecosystem is growing but many people who are JS folks have switched to TS.
I'm taken aback by what you seem to suggest though; Would you seriously claim that most new projects ARE using PHP?
I would happily argue that point with any data you supply, it's completely contrary to my experience and understanding of things and I have a pretty wide and disparate social circle in tech companies.
No. I didn't say that, and we need to clarify what you meant originally to make sense here.
When you say "most people wouldn't start a project in php", there are two ways to interpret "most" in that sentence: "the majority of" (ie 50%+) or "nearly all of" (ie a much higher percentage). Both are accepted definitions for "most".
I assumed you meant the latter: ie "nearly everyone would not start a project in php", which is what I disagree with, because the former makes little sense in context.
If you did in fact mean "a majority of people would not start a project in php" then of course I agree because that sentence can be substituted to mention any programming language in existence and still be true, because none are ever so dominant over all others in terms of popularity, that more than half of all new projects are written in said language.
it's a little bit hair splitty, but I see what you might be trying to get at.
What I tried to convey is that PHP is not enjoying the development heyday it once had, and the numbers of people choosing PHP for a new project today (even among people who learned development with PHP) is decreasing. It's not popular.
let's try to leave it as: "I believe PHP to be in decline for new projects as a share of total new projects divided by the total number of developers who are starting new projects".
Ruby has had a huge decline in the past ten years, IMO.
Also, note that TypeScript is tracked separately from Javascript, which is likely part of its decline. I wouldn't be surprised if JS backends are ultimately declining as well (perhaps Go and Python are taking its place?)
People keep saying that. Ruby has had a "huge decline" if you look at the percentage of commits on Github over the last decade [1], a decrease of more than a factor of 3. However, in that same decade Github has grown (much) more than a factor of 3. So the total number of Ruby commits on GitHub has grown substantially. That's not really what I would call a huge decline.
[1] https://madnight.github.io/githut/#/pull_requests/2023/3
Is this really true?
I used to be a full-time PHP developer but I personally don't touch that language anymore. But it's still very popular around the world, I've seen multiple projects start this year use PHP, because that's the language the founders/most developers in the company are familiar with. Probably depends a lot on where in the world you're located.
Last Stack Overflow survey had ~20% of the people answering the survey saying that they still use PHP in some capacity.
The beauty of PHP is that it is stateless and the end of the run, everything is freed. It is difficult to have memory leaks.
Personally, I like using Typescript/Javascript on both front end and backend, but I don’t look down at PHP backends at all. And it’s come a long way as a language.
I’ve been a fan of rolling your own stdlib as the semantics there are old and weird, but vscode tells you so who cares anymore.
MySQL 8 adds the `BIN_TO_UUID()` function (and the inverse, UUID_TO_BIN), and supports the quasi-standard bit swapping trick to handle time-based UUID's in indexed columns.
Having worked at GitHub, we used MySQL because we've always used MySQL. It works because the business depends on making it continue to work, and switching off at this point is a multi-year effort. Herculean efforts have gone into scaling MySQL, and that is not without its own issues (outages, maintenance cost, etc.).
Isolation level consistency is not a problem I heard anyone talk about, but that's probably because most devs interact with the database via Active Record which is not exactly known for its transactionality guarantees (and is, of course, a source of yet another set of problems).
As opposed to what? Postgres? Isn't InnoDB most performant for read-heavy apps?
MyISAM is actually considerably faster (than InnoDB) for read heavy apps.
InnoDB is comparatively slow, but you get much better transactionality (IE; something that is much closer to ACID compliance). Row level locking is faster for inserts than table level locking, but table level locking is faster for reads than row level locking.
Regardless: Both storage engines do not scale with core count as effectively as postgres due to some deadlocking on update that I have witnessed with MySQL. (not that Postgresql is the only alternative btw).
MyISAM is not a transactional storage engine even to begin with, so saying that you get "much better transactionality with InnoDB" or "MyISAM is actually considerably faster" is either wrong or at best comparing apples to oranges.
Strange take since a deadlock is rather an exceptional event you want never to occur so deadlocking, in algorithm design, wouldn't be considered a reason one would say that the implementation does not "scale with the core count". Whether or not the algorithm scales with the core count is for many other different reasons but not deadlocks.
Considering the "scale with the core count" design problem, Postgres process-per-connection architecture makes it a much less viable option than, say, MySQL so this is wrong as well.
Well, I’ve literally observed it (circa 2016 and design has not significantly changed with this in that time) and confirmed my finding with percona.
deadlock was the wrong terminology to use, apologies, I keep writing from my phone as I am travelling at the moment: I meant lock contention, specifically in memory. A deadlock would be a hard stop but what I observed was a bottleneck on memory bandwidth past a certain number of cores (24) with update heavy workloads.
So, appreciate your points but I don't think I am wrong in the core thesis of my statements. x :)
You would not be able to saturate the memory bus if you have a lock contention. Having a lock contention is usually exhibited in under-utilizing your CPU compute and memory bandwidth resources. So, hitting the limit of the available memory bandwidth still sounds like a misrepresentation of the issue you stumbled upon.
There is no excuse to be using MyISAM instead of InnoDB in 2023. It was a scarcely forgivable mistake in 2013.
The read performance advantages of MyISAM are solved better by using SSDs instead of HDDs.
MyISAM will cost you dearly when performing actions like "trying to create a new replica of an existing database".
First, MySQL is the "devil you know". If you've spent a decade working exclusively with MySQL quirks, you're just gonna be more comfortable with it regardless of quality.
MySQL also tends to be faster for read-heavy workloads and simple queries.
Also replication is easier to setup with MySQL in my (outdated) experience, even though it's gotten better with Postgres recently and I haven't really been able to compare them myself since I'm just using Amazon RDS Postgres these days and haven't had the need to setup master-master replication (which is the pain point in postgres, and was pretty straightfoward with mysql the last time I worked with it). Setting up read-replicas with postgres is still ezpz.
Postgres specific features tend to be much better than MySQL ones, Postgresql JSON(b) support blows MySQL out of the water. And as far as I can remember MySQL still doesn't support partial/expression indexes, which is a deal breaker for me. Especially in my json heavy workloads where being able to index specific json paths is critical for performance. If you don't need that kind of stuff, you might be fine - but I would hate to hit a wall in my application where I want to reach for it and it's not there.
MySQL used to be the only game in town, so it was the "default" choice - but IMO postgres has surpassed it.
Do generated column indexes meet this need?
https://dev.mysql.com/doc/refman/8.0/en/create-table-seconda...Looks like that would work as an expression index, though i can't tell at a glance if this requires the column to also be stored which would increase storage size (but probably isn't a huge problem if it is). But that likely won't work for dealing with the partial index case where you're only wanting to keep the ones that aren't null in the index to reduce the size (and speed up null/not null checks).
MySQL supports indexing expressions directly, which is effectively the same as indexing an invisible virtual column: https://dev.mysql.com/doc/refman/8.0/en/create-index.html#cr...
MySQL supports "multi-valued indexes" over JSON data, which offer a non-obvious solution for partial indexes, since "index records are not added for empty arrays": https://dev.mysql.com/doc/refman/8.0/en/create-index.html#cr...
MariaDB doesn't support any of this directly yet though: https://www.skeema.io/blog/2023/05/10/mysql-vs-mariadb-schem...
I suppose this is a decent workaround for certain things (i've used it in sqlite before), the main kind of index i'm using with postgres jsonb looks something like this
you can use generated columns to get around the first part of the index, but you can't have the WHERE part of the index in mysql as far as I am aware (but it has been a very long time since I've worked with it so I'm prepared to be wrong).MySQL has some aggregation performance over postgres. Having done a recent migration of an application two things that come to mind are:
- its case insensitive by default, which can make filtering simpler, without having to deal with a duplicate column where all values are lower/upper cased. - MySQL implements loose index scan and index skip scan, which improves performance of a number of join aggregation operations (https://wiki.postgresql.org/wiki/Loose_indexscan)
| its case insensitive by default
This is obviously up for debate, but subjectively I find this to be an absolutely terrible design decision.
I agree it's debatable. And not intuitive at first.
With that said, in all my years and thousands of tables across multiple jobs, I have yet to see a single case where I had to change a table to be case sensitive. So I guess for me it is a sensible default.
Your memory is failing you maybe you don't remember not too long ago when PG did not have any replications built-in.
Not too long ago MySQL didn't have transactions.
Edit: I would just love a comment from the person who thinks 'missing feature in the past' is wrong, unfair or irrelevant as a reply to a 'missing feature in the past' comment.
There's a non-trivial nine-year difference between the things you're describing: the InnoDB storage engine was released in 2001. Postgres gained built-in replication in 2010.
That said, personally I wouldn't describe either of these as "not too long ago". Technology rapidly changes and many things from either 2001 or 2010 are considered rather old.
Also from an operations point of view it's quite easy to manage. I'm not that experienced with Postgresql, but my understanding is that until recently you had to vacuum it every once in a while. Besides, it's also using some kind of threading model that most people handle by putting a proxy in front of Postgres to keep connections open.
Also, Mysql has had native replication for a very long time, including Galera which does two-step commit in a multimaster cluster. Although Postgres is making some headway in this regard, it is my impression that this is only quite recent and not yet fully up to par with Mysql yet.
You still do. The auto Vacuum daemon was added in 2008ish, so it isn't too bad. Just more complexity to manage.
It does a process per connection just like web servers did back in the day when C10k was a thing. A lot of the buffers are configured per connection so you can get bigger buffers if you keep the number of connections small.
I think what you reference is known as Transaction ID Wraparound, Postgres still needs to be vacuumed to avoid that problem: https://www.crunchydata.com/blog/managing-transaction-id-wra...
This is a great question, and if the choice is between mysql and postgres, I would like to make the case that despite the current popular momentum behind postgres, mysql is a better default. Please note I'm not saying mysql is better, but in the absence of any other criteria, I would suggest stating with mysql.
I have a few reasons for this view, but they mostly revolve around operational complexity. From a developer's point of view postgres is fantastic. Far saner SQL dialect, tons of great features. When it comes to operations though, that's where mysql has the edge, and ops is half of using a database - it's an important facet for a business to consider.
As other commenters have mentioned, postgres requires careful tuning of the autovacuum process, otherwise it can't keep up as the workload grows.
Postgres has a far more advanced query planner, but it comes at the cost of potentially blowing up your app at 3am, and it gives you no tools to patch in a quick fix while you address the root cause. This frankly ignores the reality of operating a business. Sometimes you need a quick fix, even if that might lead to users developing bad habits. Yes there is the pg_hint_plan extension, but that still only helps you later after the problem had happened. You can't pin a query plan. To me the ideal situation would be for postgres to continue to use the old query plan, but emit some structured log to tell you it thinks it's now suboptimal. But I digress.
Thirdly, postgres has no way to have an index clustered table. This lets you trade a small cost on write for greater page locality when reading related rows. Postgres let's you do this as a one time operation that takes the table offline for the duration, which isn't sufficient if you need it.
Fourthly, mysql is still easier to upgrade. You will need to upgrade your database at some point. Mysql has great support for upgrade in place, as well as using replication to build a new db. Mysql replication has always been logical replication, which has tradeoffs of course, but what it buys you is the ability to replicate across different versions. Pg's logical replication still has a bunch of sharp edges.
Ok this rant is long enough already, but I do want to emphasise that this isn't hating on postgres. I know it's controversial to be recommending mysql over postgres, but I do think the ops concerns win out.
Ps the orioledb project is fantastic and I hope it one day becomes the default for postgres.
Isn't working in the first place an ops concern?
Not up to date, but a decade ago MySQL supported pluggable storage engines and so had had some good non-default choices. It was possible to fit really big databases onto small boxes using tokudb, for example.
This doesn't explain why it was so popular for starting new small projects, but people were also choosing mongodb at that time too, so ymmv :)
Nowadays postgres has grown a lot of features but I believe it is still behind on built-in compression?
Added: this old blog post of mine is still getting traffic 10 years later; probably still valid https://williame.github.io/post/25080396258.html
Others mentioned a few reasons already, but compared to postgres (because typically that's the other option) I'll add index selection. Even with the available plugins and stats and everything, I don't want to in an emergency situation spend time trying to indirectly convince postgres that it should use a different index. "A query takes 20x the time and you can't force it back immediately" is a really bad failure mode.
Yes, its replication support out of the box is decades ahead of Postgres.
(Mongodb has an even better replication story than Myqsl, but Mongo isn't a real database.)
I don't want to be subjective or start any war, but only to broaden my horizons and perspective. Saying this I want to clear one thing - I use multiple engines, trying to best match one for the problem I'm solving.
What do you recommend these days?
It's the most developer-friendly thing out there. Particularly for a datastore CLI, which is inherently something you use rarely, MySQL's is just a lot nicer, more discoverable.
I think it has the least bad HA story among (free) traditional SQL-RDBMSes too (not that I understand why anyone would start a new project on a traditional SQL-RDBMS at all).
I am going to be controversial for a hot second, and say that in many ways MySQL is a more advanced and better implemented database at its core. Disclaimer: as a developer I love and prefer Postgres. But I've been on many projects where MySQL won for ops-related reasons.
Postgres has a MVCC implementation that is recognized as inferior[0] to what MySQL and Oracle do, and requires dealing with vacuuming and all of its related problems.
Postgres has a process-based connection model that is recognized as less optimal than the thread-based one that MySQL has. There are ongoing efforts[1] to move Postgres to a thread-based model but it's recognized as a large and uncertain undertaking.
Other commenters have also explained the still very noticeable difference in replication support, the lack of query planner hints, the less intuitive local tooling.
One thing to keep in mind is that both databases keep evolving, and old prejudices won't take us far. Postgres is improving its performance and replication support with each release. MySQL 8.0 added atomic DDL and a new query planner (MariaDB did their own query planner rework in 11.0, widening their differences). Both are improving their observability. So the race is far from over. But I definitely wouldn't count MySQL out.
[0] https://ottertune.com/blog/the-part-of-postgresql-we-hate-th... [1] https://www.postgresql.org/message-id/flat/31cc6df9-53fe-3cd...
Just want to add, that comparing to postgresql is a very modern view. There were other databases, not popular today, but quite popular back in the day. To name a few: DB2, InterBase, Firebird, Paradox, Access, SQL Server Compact. MySQL was a really shitty database in early 2000s, still THE most popular.
- It works well enough.
- It scales up fine for 99+% of companies, and for the ones that need to scale beyond that there are battle tested solutions like Vitess.
- It is what people know already so they don't have to learn anything new.
Same reason why people still make new websites in PHP I guess. It's not fancy but it works fine and won't bring any unwelcome surprises.
Once upon a time it was easi-ish to scale out, and was simple to use and fast. There was Percona as well. These days, who knows? Might still be true.