return to table of content

Common DB schema change mistakes in Postgres

zer00eyz
26 replies
13h35m

I like Postgres, a lot.

Most of the things in this article are avoidable, and good to keep an eye out for.

But let's be clear we're not talking about the worst part of Postgres: roles. There is a ton of power there, it would be amazing to use it. Making it work feels like black magic. Every bit of the interface around it just seems like esoteric incantations that may or may not do what you expect. It's a terrible way to manage something so important.

THe manual for this section is, thin. It gives you an idea of how things should work, maybe, in a narrow use case. The problem is when they dont your going to spend time doing a lot of trial and error to figure out what you did wrong, and likely not have a clue as to how to do it right. And may god have mercy on your soul if you want to migrate a db with complex user permissions.

I need to sit down with it for a month and write "cookbook". If one person uses it and goes to bed that night without crying them selves to sleep it will have been worth it.

dewey
12 replies
13h15m

I’d read that. Role management for me often involves a lot of guessing and too often roles end up with too many permissions because of that.

yen223
8 replies
13h1m

Hah, role management for us is "create a role for migrations, and a role to do db things, and enforce auth entirely in the web app"

I suspect we aren't alone

jzelinskie
6 replies
12h39m

This is a fairly sane place to be in terms of bang for your buck. It's easy to find yourself in a place where authorization data and logic span multiple services and at that point having everything deeply siloed into Postgres might be a doozy. That being said, there are plenty of times that'll never be the case and you should try to lean on the abstractions that work best for you.

ludston
5 replies
10h28m

Roles are for services not for users. If you have a read-only Web api then it makes sense to use a read-only role regardless of which user is using it.

WinDoctor
4 replies
10h22m

Everything in PostgreSQL is a role.

It's just named such that when a ROLE allows `login` it's considered a user

skissane
1 replies
10h6m

This is part of what many people find so confusing. In most systems “role” is a group (or something closely resembling a group), not a user. The weird terminology confuses beginners

WinDoctor
0 replies
9h51m

It's a bit confusing and legacy.

All roles function like you would expect groups to function

A role that is not allowed to login is a `group`.

While the CREATE USER and CREATE GROUP commands still exist, they are simply aliases for CREATE ROLE.

orthoxerox
0 replies
9h59m

Someone was feeling very clever when they came up with this idea.

cqqxo4zV46cp
0 replies
9h57m

To me, the comment you are replying to is saying that you should^ DIFFERENTIATE roles by service, not ‘end user’.

lelanthran
0 replies
2h2m

Hah, role management for us is "create a role for migrations, and a role to do db things, and enforce auth entirely in the web app"

I suspect we aren't alone

Honestly I'd be happy to spend the time learning the ins and outs of PostgreSQL IAM stuff, but there's two very good reasons why I won't use it:

1. Still need the "role"/user across other services, so I don't save anything by doing ACL inside the DB.

2. I've no idea how to temporarily drop privileges for a single transaction. Connecting as the correct user on each incoming HTTP request is way too slow.

badcppdev
1 replies
8h46m

Oh it's really worrying that you post a link to a SO page as the recommended 'cookbook' for something that's referred to as a powerful feature. And both of the answers only have a single up vote each. Seems like there's a serious gap here.

WinDoctor
0 replies
8h6m

You can help out by posting answer / upvoting good answer.

WinDoctor
6 replies
10h33m

I agree with the sentiment that IAM in PostgreSQL is complex.

What makes it complex is that there are 3 layers of objects (Database, Schema, Tables) and also implicit grants given to DB object owners

To be able to select from a table you need:

* CONNECT on the Database

* USAGE on the Schema (Given implicitly to schema owner)

* SELECT on the Table (Given implicitly to table owner)

To see these privileges we need to understand acl entries of this format

`grantee=privilege-abbreviation[]/grantor:`

* Use \l+ to see privileges of Database

* Use \dn+ to see privileges of Schemas

* Use \dp+ to see privileges of Tables

Privileges are seen [here](https://www.postgresql.org/docs/current/ddl-priv.html)

e.g. in the following example user has been given all permissions by postgres role

`user=arwdDxt/postgres`

If the “grantee” column is empty for a given object, it means the object has default owner privileges (all privileges) or it can mean privileges to PUBLIC role (every role that exists)

`=r/postgres`

Also it's confusing when Public schema is used. You have CREATE permission on schema so when the tables are created with the same user you select data with and you have owner permissions out of the box.

vasco
4 replies
8h40m

I've seen funny things happen when developers think their GRANT is doing what ALTER DEFAULT PRIVILEGES does, and a day later break production because their user doesn't have access to the new tables.

WinDoctor
3 replies
8h29m

What do you mean exactly?

If roles have INHERIT, then doing the following works, no?

* Role A creates table * GRANT A TO B; * ROLE B can read from table just like A can.

Also if Role A creates new table, Role B can read that too no?

vasco
2 replies
8h22m

Remove any "if" from your comment and it seems like you understood what would fail! There's more ways to make misunderstanding grant vs default privileges will create issues as well.

WinDoctor
1 replies
8h18m

Then I would argue for trying to centrally ensure that all roles have INHERIT in your DBs. :thinking:

Doing ALTER DEFAULT PRIVILEGES could be another future footgun of it's own.

csnweb
0 replies
6h45m

I think you probably don’t want the application user to inherit from the superuser you may use for migrations since some migrations require a superuser.

echion
0 replies
5h29m

To be able to select from a table you need

* CONNECT

* USAGE

* SELECT

Isn't LOGIN (https://www.postgresql.org/docs/16/role-attributes.html) also needed?

  Only roles that have the LOGIN attribute can be used as the initial role name for a database connection

7bit
1 replies
11h25m

I recently had to rename an app I was developing. That included the linux user account under which the service ran, and therefore the postgres user.

I don't remember all the details, but I had to also rename the postgres user and role, which seemed a simple thing to do. But for some reason renaming the user didn't include the permissions on the database. I was left with a very confusing state of working table access and denied record access. I decided to backup the data, dropp the database and do an import that didn't include any permissions.

That simple thing turned out a complete shit show and I blame Postgres for making something so simple so complex.

cqqxo4zV46cp
0 replies
9h55m

I never use system accounts for Postgres auth. It feels like coupling that won’t do anything good for most people most of the time, and will only bite you in the ass when you don’t expect it to.

hnben
0 replies
9h40m

Making it work feels like black magic.

Can confirm. Last year I implemented a simple postgREST server with rowlevel security. (The postgREST logs are really good. With cookbook and all)

The path there was somewhat difficult, but once it worked, it was truly magical. And the mechanisms involved were quite simple even.

golergka
0 replies
40m

I need to sit down with it for a month and write "cookbook". If one person uses it and goes to bed that night without crying them selves to sleep it will have been worth it.

Please do. I'd be happy to pay ~$20 for it.

futureduck
0 replies
12h2m

I have seen what you have seen (did several "serious" things using postgrest).

edit: i mean yes, we need that cookbook real bad

egeozcan
0 replies
11h43m

Even the docs of postgrest, which relies on roles for auth, seems to be not very detailed: https://postgrest.org/en/v12/explanations/db_authz.html

Interesting. If you are serious about writing a cookbook for postgres roles, and open something like a kickstarter, I'll be one of the first to pledge!

hot_gril
6 replies
1h36m

The most surprising thing about Postgres indexes as a beginner was how UNIQUE indexes can affect the outcomes of concurrent queries because of the additional locking they add. Something like

  INSERT INTO foo (bar) (SELECT max(bar) + 1 FROM foo);
can insert duplicate `bar` values when run concurrently using the default mode, since one xact might not see the new max value created by the other. You might think adding a UNIQUE index would cause the "losing" xact to get constraint errors, but instead both xacts succeed and no longer have a race condition.

dns_snek
3 replies
1h30m

Are you claiming that both inserts succeed even with a UNIQUE index in place, and they end up inserting duplicate values? That must be a bug if you're right.

saltcured
2 replies
1h22m

I think they're saying that the unique index changes the locking strategy for the queries so they are effectively serialized and will not both read the same max value for existing rows.

dns_snek
1 replies
1h5m

That would make a lot of sense, thanks. I hope that's what the author meant.

hot_gril
0 replies
1h2m

Yeah, that's what I meant.

remram
1 replies
45m

With what isolation level?

hot_gril
0 replies
12m

The default, read committed

Ozzie_osman
6 replies
12h22m

If you are running schema migrations in production, use "lock_timeout". Even seemingly-benign modifications like dropping a table (with foreign keys) or dropping a foreign key, which are generally quick and may run nearly-instantaneously when you're testing them, may end up hitting a lock conflict on a heavily-used production database (with existing transactions, with an autovacuum, etc). That ALTER is then waiting on the first transaction's lock, but it has acquired an ACCESS EXCLUSIVE lock, meaning no queries can run against the locked table.

If you're doing any postgres at scale, it's just a matter of time until you hit one of these conflicts. "lock_timeout" will just cause the migration to fail after the timeout, rather than just blocking all other queries.

incorrecthorse
1 replies
10h6m

Excellent advice.

On the technical side, I believed waiting was due to the lock queue rather that having acquired an ACCESS EXCLUSIVE lock. The ALTER is specifically _waiting_ for any lock lower than ACCESS EXCLUSIVE to be release.

kroolik
0 replies
8h20m

It also makes all new readers/writers to wait for that lock, essentially leading to downtime until the lock is eventually acquired and released. This is the classic readers/writers library example, and you want to avoid starving the writers.

Thats why size of data is the least of your issues - its the access patterns/hotness that are the issue.

glenjamin
1 replies
11h11m

Related to this, there is quite a large variation across Postgres versions about whether a particular DML query will take an exclusive lock or not.

Is there a good way to analyse a query and be informed of what sort of lock it will take?

I’ve always resorted to re-reading docs when I’m unsure.

kroolik
0 replies
11h0m

For me the best way is testing it locally, side-by-side, with two transactions. You dont need any data for this.

After some experience, you start to see the reasons for locks and how they will impact you.

Or read the docs.

vasco
0 replies
8h37m

Sometimes if you do that, you'll never run your ALTER. Just needs enough traffic on the table. In those cases I think the best way is to kill other ongoing queries that block the ALTER, if your app can recover from that.

kroolik
0 replies
11h2m

'statement_timeout' includes lock timeout and lets you better estimate the impact on hot tables. When you set the timeout to 5s, you know the downtime will be max 5s total, and transactions will continue afterwards. With lock timeout, you dont control how long the part after the lock will take - may be fast, may be slow, due to concurrent traffic, for example.

saddist0
3 replies
10h59m

Another common mistake I have seen: duplicating tables without the indexes.

This is not how it works, period.

    CREATE TABLE <abcv2> SELECT * FROM <abc> WHERE <>
People do it all the time, either to create a backup table, or deleting data in bulk, etc.

watt
1 replies
9h23m

Could you also mention what is the appropriate way to go about it?

keeperofdakeys
0 replies
7h5m

Either put machinery in your schema migration tool to create indexes as a separate step, so they are easy to re-apply. This makes keeping indexes in sync between production and staging a lot easier. Or you can use "CREATE TABLE_B (LIKE TABLE_A WITH INDEXES);".

da_chicken
0 replies
40m

If I'm creating a backup table -- as in, I'm going to do some complex nebulous operation that might screw up in ways I don't immediately foresee -- then I don't care at all about indexes or constraints. I want an immediately present copy of the data that I plan to never need but is there so I don't have to restore from DB backup + WAL. Creating the indexes is a waste of both server time and disk space.

If something goes pear-shaped or I really need them, I can create those indexes later.

acimim_ha
3 replies
8h59m

Case 2. Misuse of IF [NOT] EXISTS

The article doesn't provide any good example of misuse. And that's exactly how you use it. It's clean and simple, no hidden pitfalls. Schema migration tools are overhead when you have a few tables.

kemitche
1 replies
1h43m

The pitfall is pretty simply stated: "It masks the problem with logic, adding some risks of anomalies." When one puts a bandaid over bad data, the problem isn't solved, it's masked. Depending on the issue, it could bite you later, in unexpected ways, at the worst possible times.

In this particular case, the "bad data" is a table/column/view that exists (or doesn't) when it should(n't). Why does the table exist when it shouldn't yet exist? Did it fail to get dropped? Is the existing one the right schema? Is the same migration erroneously running twice?

After each migration, your schema should be in a precise state. If the phrase "IF [NOT] EXISTS" is in your migration, then after a previous migration, it wasn't left in a precise state. Being uncertain about the state of your schema isn't great.

hot_gril
0 replies
1h32m

In many cases you can still be ok just using IF NOT EXISTS everywhere, or occasionally IF EXISTS when trying to delete a no-longer-used col or index. Whether you start with a blank DB or any intermediate state, you'd end up with the same thing. But it gets messy after a long enough time.

javajosh
0 replies
6h26m

>The article doesn't provide any good example of misuse.

It did describe the misuse pretty well, though. The idea is that out-of-band schema modifications are a process/workflow issue that needs to be directly addressed. As stated by OP, this is an easy way for anomalies to creep in - what if the already present table has different columns than the one in the migration? IF EXISTS lets a migration succeed but leaves the schema in a bad state. This is an example of where you would prefer a migration to "fail fast".

zigzag312
1 replies
7h6m

A little nitpick about

To those who still tend to use int4 in surrogate PKs, I have a question. Consider a table with 1 billion rows, with two columns – an integer and a timestamp. Will you see the difference in size between the two versions of the table

Wouldn't the important thing be an index size, not a table size? Table size already has 23-byte header + alignment padding. So 4 byte difference doesn't do much for table size. But fitting more of an index into memory could have some benefits. An index entry has 8-byte header.

Secondly, 1 billion rows (used in the example) are far too close to the maximum for int4 for comfort.

Great article nonetheless.

srcreigh
0 replies
4h25m

Yeah, there’s index size, and also disk size. Postgres packs table rows on disk tightly, but not in RAM.

So I guess an 8kb page on disk could be more than 8kb in ram?

Seems to only affect working memory for table row data. Still significant (especially in Postgres where rows are randomly ordered which is horrible for locality for range queries) but not a home run insight imo.

fabianlindfors
1 replies
11h44m

These pitfalls are one reasons why I built Reshape [0], which aims to automate zero-downtime schema migrations.

I can’t say it avoids all of them but we are working on a new product that would. If you are interested in this space (and Postgres specifically), I’d love to hear from you: fabian@reshapedb.com

[0] https://github.com/fabianlindfors/reshape

smw
0 replies
7h9m

Any chance that works for crdb?

scraplab
0 replies
12h38m

I refer to Fly.io’s guide to Safe Migrations in Ecto (Elixir’s DB adapter) multiple times a week. It’s a very useful quick reference to check whether you can get away with a basic migration or if something more involved is required.

https://fly.io/phoenix-files/safe-ecto-migrations/

joeatwork
0 replies
7h23m

I have managed to make a lot of these mistakes, and they’re lead to scary outages and bummers. This is a pretty great article.

h1fra
0 replies
3h52m

I love postgres but I really hate that there is no builtin way to batch update or delete. This is the most annoying things and I need to (re)write a batcher almost every month every time I'm hitting a wall

domo__knows
0 replies
1h11m

I'm a developer who has largely been shielded from database related issues. Within django, I know how to create migrations, model tables, and query using the ORM but a lot of what happens is black magic to me. Now that I'm starting a company, I have a lot of anxiety about running into these issues and then resolving them on my own. How should I go about learning what to do while in a dev environment?