Why not just run Postgres with it's files on a ramdisk?
Update: this can apparently run in a browser/Node environment so can be created/updated/destroyed by the tests. I guess I'm too much of a backend dev to understand the advantage over a more typical dev setup. Can someone elaborate on where/when/how this is better?
That's more or less what happens inside the emulator (the emulated disk is an in-memory 9P file system). It's in WebAssembly because that makes it more portable (same behaviour across platforms, architectures, and even in the browser or edge environments), and there are no external dependencies (not even Docker).
Because the emulator lets us boot an "already launched" state directly, it's also faster to boot up the emulated database than spinning up a real one (or Docker container), but this was more of a happy accident than a design goal.
Can you give a specific / concrete example of why I would want to use this instead of running a postgres server a different way (docker, binary, whatever) and having the tests connect to that server? I really don't understand when this would be useful.
These kinds of in-process/in-memory versions of dependencies take the startup time of those test from minutes/seconds to milliseconds, allowing you to run your tests vastly faster. That's a game changer for developer productivity.
What's great is that your code still just depends on "postgres", so you can test against this in-memory version most of the time then occasionally (such as in CI) run that same suite but either a "real" postgres as a way to make SURE you're not missing anything.
You can bring "real postgres" test startup times back to milliseconds with CREATE DATABASE ... FROM TEMPLATE. Every test gets a fresh database (without having to run migration scripts) and the step takes millis.
Yeah I’d be surprised if this method weren’t just as fast. And if it’s not, would the difference just be to run the server with a ramdisk and to maybe turn off some of the durable setting to speed things up ( https://www.postgresql.org/docs/current/non-durability.html ).
OK, that goal makes sense, thanks for explaining. For what it's worth I'm pretty sure you can do this with postgres, tmpfs, and template databases — see my project, pgtestdb [0]. I haven't done a formal perf measurement, but each test would get a fresh and migrated database in about 20ms on my local machine. The setup described runs postgres in a container for convenience, but you could probably also just run a postgres binary and store the data on a memory-backed filesystem.
[0] https://github.com/peterldowns/pgtestdb
I have a customer that is not allowed to run Postgres natively or docker at all (bc of security). They could use this, I guess.
Wow, ok thanks that makes sense — I never thought of an environment like that.
Me either. It's a couple hundred lines of code to make a very comprehensive fixture using the real postgres, and it supports all extensions, including exotic ones you make yourself.
You could also use memory state dump from a microvm manager like firecracker and have the state replicated
I don't get it either. I feel like this is so much unnecessary code, an emulator, a network stack...
Why not use something like https://testcontainers.com/? Is a container engine as an external dependency that bad?
It is annoying is you want to run your teat inside a container for ci and now you are running a container in a container and all the issues that come with it.
which issues?
Depending on the setup it can be a pain to get nested containers working sometimes. There is, e.g., Docker In Docker but this often required a privileged host container which is often not provided in CI/CD pipelines.
Which issues/pains with getting nested containers?
I am aware of only a few settings that make a container nestle, or not, whether it is a vm, lxc/lxd type container, etc.
Why would the postgres container need to be nested inside another container? Why not just have the CI environment also run a Postgres container, along side your tests, and give your tests a `POSTGRES_URL` environment variable? Or why even bother running Postgres in a container, why not just run the Postgres binary on the host that's running your tests in the container?
The fact that this can run in-process is a big deal, as it means you don't have to worry about cleanup.
As soon as you have external processes that your tests depend on, your tests need some sort of wrapper or orchestrator to set everything up before starting tests, and ideally tear it down after.
In 90% of cases I see, that orchestration is done in an extremely non-portable way (like leveraging tools built in to your CI system) which can make reproducing test failures a huge pain in the ass.
Your test framework probably provides you with hooks for one-time setup and cleanup. This is where you start and delete your external dependencies.
It's the same amount of code and on Mac you still run a full VM to load containers (with a network stack), so I'm not really sure what your point is. If anything it's less code because the notion of the container is entirely abstracted away, and the whole thing is entirely a wasm dependency that you load as a normal import.
The whole purpose of End to End testing is that your testing the system in a real state. It's an emulation of your live environment. Because of that you can do interesting things like find out what happens if you pull the plug or run out of disk or ....
The moment that you shove a mock in there, your unit testing. Effective but not the same. One of the critical points of E2E is that without mocks you know that your tests are accurate. Because this isnt Postgres I'm testing it every time and not that system.
If your building PG for an embedded, light weight, or under powered system then this would make sense for verification testing before real E2E testing that would be much slower. (a use case I have)
Other than that its just a cool project and if you ever need a PG shim it's there.
If this is actually just Postgres running in an x86 emulator (*edit: originally this said "compiled to wasm"), then how could this be faster than Postgres in any given environment? I don't understand — if it were faster, wouldn't you just want to deploy this in prod in your weird environment rather than Postgres? Why limit this to mocking?
Presumably, it's faster to boot and for tests because it doesn't need to access an actual file system; everything is in memory. That doesn't mean it would be any faster in production, and in fact, it wouldn't be useful in that environment even if it was.
Understood, thank you.
I think you're being a little absolutist about this. Swapping out a possibly equivalent database engine does not turn anything into a unit test, which is defined by testing individual units of code in relative isolation. You can argue that it's not true end to end testing. But almost every E2E test I've seen involves some compromises compared with the true production environment to save money, time, or effort.
Until you trust every part of the mock behaves the same as every part of the real database you use… most often the db is your boundary with nothing further downstream. At that point it really is just a faster disposable database, and totally is valid acceptance tests for the e2e system.
Also nothing stops you from using a mock for some tests and a real database for others. It just comes down to trust.
Nah, by having in-memory versions of your dependencies, in-memory versions which fulfill the same interfaces as those used in your E2E tests (or the majority of your E2E tests) you unlock running your entire E2E tests suite in milliseconds-to-seconds instead of minutes-to-seconds. And because they're E2E tests that work with any implementation, you can still run your exact same test suite against your "real" E2E dependencies in a CI step to be super sure both implementations behave the same.
I've done this across multiple jobs, and it's amazing to be able to run your "mostly-E2E" tests in 1-2 seconds while developing and the same suite in the full E2E env in CI. It makes developing with confidence so fast and mostly stress free (diverging behavior is admittedly annoying, but usually rare).
I highly recommend using these if feasible.
It could be useful for test isolation, moving the Redis backend to FakeRedis in tests fixed quite a bit of noise in our test suite. With Postgres we use savepoints which is not very fast, even on a ramdisk.