HN comments for: Show HN: I open-sourced the in-memory PostgreSQL I built at work for E2E tests

rsyring

27 replies

2024-04-07 18:22:57 UTC

Why not just run Postgres with it's files on a ramdisk?

Update: this can apparently run in a browser/Node environment so can be created/updated/destroyed by the tests. I guess I'm too much of a backend dev to understand the advantage over a more typical dev setup. Can someone elaborate on where/when/how this is better?

n2d4

9 replies

23h53m

2024-04-07 18:35:50 UTC

That's more or less what happens inside the emulator (the emulated disk is an in-memory 9P file system). It's in WebAssembly because that makes it more portable (same behaviour across platforms, architectures, and even in the browser or edge environments), and there are no external dependencies (not even Docker).

Because the emulator lets us boot an "already launched" state directly, it's also faster to boot up the emulated database than spinning up a real one (or Docker container), but this was more of a happy accident than a design goal.

peter_l_downs

7 replies

17h18m

2024-04-08 01:11:09 UTC

Can you give a specific / concrete example of why I would want to use this instead of running a postgres server a different way (docker, binary, whatever) and having the tests connect to that server? I really don't understand when this would be useful.

lelandbatey

3 replies

15h53m

2024-04-08 02:35:56 UTC

These kinds of in-process/in-memory versions of dependencies take the startup time of those test from minutes/seconds to milliseconds, allowing you to run your tests vastly faster. That's a game changer for developer productivity.

What's great is that your code still just depends on "postgres", so you can test against this in-memory version most of the time then occasionally (such as in CI) run that same suite but either a "real" postgres as a way to make SURE you're not missing anything.

stickfigure

1 replies

13h43m

2024-04-08 04:46:26 UTC

You can bring "real postgres" test startup times back to milliseconds with CREATE DATABASE ... FROM TEMPLATE. Every test gets a fresh database (without having to run migration scripts) and the step takes millis.

CogitoCogito

0 replies

13h30m

2024-04-08 04:59:36 UTC

Yeah I’d be surprised if this method weren’t just as fast. And if it’s not, would the difference just be to run the server with a ramdisk and to maybe turn off some of the durable setting to speed things up ( https://www.postgresql.org/docs/current/non-durability.html ).

peter_l_downs

0 replies

14h50m

2024-04-08 03:38:52 UTC

OK, that goal makes sense, thanks for explaining. For what it's worth I'm pretty sure you can do this with postgres, tmpfs, and template databases — see my project, pgtestdb [0]. I haven't done a formal perf measurement, but each test would get a fresh and migrated database in about 20ms on my local machine. The setup described runs postgres in a container for convenience, but you could probably also just run a postgres binary and store the data on a memory-backed filesystem.

[0] https://github.com/peterldowns/pgtestdb

augunrik

1 replies

15h8m

2024-04-08 03:21:09 UTC

I have a customer that is not allowed to run Postgres natively or docker at all (bc of security). They could use this, I guess.

peter_l_downs

0 replies

14h47m

2024-04-08 03:42:17 UTC

Wow, ok thanks that makes sense — I never thought of an environment like that.

danmur

0 replies

16h37m

2024-04-08 01:52:01 UTC

Me either. It's a couple hundred lines of code to make a very comprehensive fixture using the real postgres, and it supports all extensions, including exotic ones you make yourself.

gchamonlive

0 replies

20h36m

2024-04-07 21:52:53 UTC

You could also use memory state dump from a microvm manager like firecracker and have the state replicated

orphea

8 replies

23h56m

2024-04-07 18:33:17 UTC

I don't get it either. I feel like this is so much unnecessary code, an emulator, a network stack...

Why not use something like https://testcontainers.com/? Is a container engine as an external dependency that bad?

medellin

4 replies

21h32m

2024-04-07 20:57:23 UTC

It is annoying is you want to run your teat inside a container for ci and now you are running a container in a container and all the issues that come with it.

dorianmariefr

2 replies

21h26m

2024-04-07 21:03:18 UTC

which issues?

c0balt

1 replies

20h42m

2024-04-07 21:47:18 UTC

Depending on the setup it can be a pain to get nested containers working sometimes. There is, e.g., Docker In Docker but this often required a privileged host container which is often not provided in CI/CD pipelines.

j45

0 replies

20h10m

2024-04-07 22:18:42 UTC

Which issues/pains with getting nested containers?

I am aware of only a few settings that make a container nestle, or not, whether it is a vm, lxc/lxd type container, etc.

peter_l_downs

0 replies

17h15m

2024-04-08 01:14:35 UTC

Why would the postgres container need to be nested inside another container? Why not just have the CI environment also run a Postgres container, along side your tests, and give your tests a `POSTGRES_URL` environment variable? Or why even bother running Postgres in a container, why not just run the Postgres binary on the host that's running your tests in the container?

hamandcheese

1 replies

17h41m

2024-04-08 00:47:47 UTC

The fact that this can run in-process is a big deal, as it means you don't have to worry about cleanup.

As soon as you have external processes that your tests depend on, your tests need some sort of wrapper or orchestrator to set everything up before starting tests, and ideally tear it down after.

In 90% of cases I see, that orchestration is done in an extremely non-portable way (like leveraging tools built in to your CI system) which can make reproducing test failures a huge pain in the ass.

orphea

0 replies

9h15m

2024-04-08 09:13:54 UTC

Your test framework probably provides you with hooks for one-time setup and cleanup. This is where you start and delete your external dependencies.

bastawhiz

0 replies

21h58m

2024-04-07 20:31:33 UTC

It's the same amount of code and on Mac you still run a full VM to load containers (with a network stack), so I'm not really sure what your point is. If anything it's less code because the notion of the container is entirely abstracted away, and the whole thing is entirely a wasm dependency that you load as a normal import.

zer00eyz

6 replies

19h53m

2024-04-07 22:35:48 UTC

The whole purpose of End to End testing is that your testing the system in a real state. It's an emulation of your live environment. Because of that you can do interesting things like find out what happens if you pull the plug or run out of disk or ....

The moment that you shove a mock in there, your unit testing. Effective but not the same. One of the critical points of E2E is that without mocks you know that your tests are accurate. Because this isnt Postgres I'm testing it every time and not that system.

> Can someone elaborate on where/when/how this is better?

If your building PG for an embedded, light weight, or under powered system then this would make sense for verification testing before real E2E testing that would be much slower. (a use case I have)

Other than that its just a cool project and if you ever need a PG shim it's there.

peter_l_downs

2 replies

17h16m

2024-04-08 01:13:07 UTC

If your building PG for an embedded, light weight, or under powered system then this would make sense for verification testing before real E2E testing that would be much slower. (a use case I have)

If this is actually just Postgres running in an x86 emulator (*edit: originally this said "compiled to wasm"), then how could this be faster than Postgres in any given environment? I don't understand — if it were faster, wouldn't you just want to deploy this in prod in your weird environment rather than Postgres? Why limit this to mocking?

RussianCow

1 replies

15h33m

2024-04-08 02:56:15 UTC

Presumably, it's faster to boot and for tests because it doesn't need to access an actual file system; everything is in memory. That doesn't mean it would be any faster in production, and in fact, it wouldn't be useful in that environment even if it was.

peter_l_downs

0 replies

14h44m

2024-04-08 03:44:37 UTC

Understood, thank you.

wpietri

0 replies

13h27m

2024-04-08 05:01:41 UTC

I think you're being a little absolutist about this. Swapping out a possibly equivalent database engine does not turn anything into a unit test, which is defined by testing individual units of code in relative isolation. You can argue that it's not true end to end testing. But almost every E2E test I've seen involves some compromises compared with the true production environment to save money, time, or effort.

majikandy

0 replies

12h0m

2024-04-08 06:29:32 UTC

Until you trust every part of the mock behaves the same as every part of the real database you use… most often the db is your boundary with nothing further downstream. At that point it really is just a faster disposable database, and totally is valid acceptance tests for the e2e system.

Also nothing stops you from using a mock for some tests and a real database for others. It just comes down to trust.

lelandbatey

0 replies

15h21m

2024-04-08 03:08:33 UTC

Nah, by having in-memory versions of your dependencies, in-memory versions which fulfill the same interfaces as those used in your E2E tests (or the majority of your E2E tests) you unlock running your entire E2E tests suite in milliseconds-to-seconds instead of minutes-to-seconds. And because they're E2E tests that work with any implementation, you can still run your exact same test suite against your "real" E2E dependencies in a CI step to be super sure both implementations behave the same.

I've done this across multiple jobs, and it's amazing to be able to run your "mostly-E2E" tests in 1-2 seconds while developing and the same suite in the full E2E env in CI. It makes developing with confidence so fast and mostly stress free (diverging behavior is admittedly annoying, but usually rare).

I highly recommend using these if feasible.

aeyes

0 replies

2h10m

2024-04-08 16:19:07 UTC

It could be useful for test isolation, moving the Redis backend to FakeRedis in tests fixed quite a bit of noise in our test suite. With Postgres we use savepoints which is not very fast, even on a ramdisk.

n2d4

21 replies

1d5h

2024-04-07 13:13:51 UTC

Hey HN! For a few months, I've been building an in-memory version of Postgres at work. It has full feature parity with production databases.

The cool thing about it is that you don't need any external processes or proxies. If your platform can run WASM (Node.js, browser, etc.), it can probably run pgmock. Creating a new database with mock data is as simple as creating a JavaScript object.

It's a bit different from the amazing pglite [1] (which inspired me to open-source pgmock in the first place). pgmock runs an x86 emulator with the original Postgres inside, while pglite compiles a Postgres fork to native WASM directly and is hence much faster and more lightweight. However, it only supports single-user mode and a select few extensions, so you can't connect to it with normal Postgres clients (which is quite crucial for E2E testing).

Theoretically, it could be modified to run any Docker image on WebAssembly platforms. Anything specific you'd like to see?

Happy hacking!

[1] https://github.com/electric-sql/pglite

negus

6 replies

20h34m

2024-04-07 21:54:52 UTC

That's great. But doesn't the whole concept of E2E test mean that you use real environment without mocking the components?

refulgentis

5 replies

20h32m

2024-04-07 21:57:05 UTC

Explicitly mentioned in the comment as a drawback. In practice E2E means "E2E as much as humanly possible", and I'm glad to see any work that can help.

sverhagen

4 replies

17h45m

2024-04-08 00:43:58 UTC

It is, however, humanly possible, to run a real Postgres instance in a Docker container. You may use for example Testcontainers. So, I would not resort to an inferior Postgres "mock" for E2E testing.

For unit testing (also mentioned in the tagline of the project on GitHub) I could see you wanting something snappier than a real Postgres in Docker, so then... maybe? Purists of some following will tell you to use a mocking framework instead, but I think that something closer to the real thing would be better in all cases. This might be it, just be careful not to lull yourself into a false sense of security about how close to the real thing this is (isn't).

justinclift

1 replies

15h48m

2024-04-08 02:41:10 UTC

It does seem to be running a real version of Postgres though, unless I'm misreading?

sverhagen

0 replies

15h40m

2024-04-08 02:48:40 UTC

On first reading of the main GitHub page, I didn't take it that way. On second reading, I guess it could go either way. But there's talk about having reimplemented the network stack, so maybe that implies they just repurposed the rest of Postgres. I just don't know enough, happy to take your word.

wpietri

0 replies

13h31m

2024-04-08 04:57:58 UTC

It depends on the humans and what they're up to. Fast feedback is a very valuable attribute of a test suite, so reasonable people may well be willing to go down this road.

cqqxo4zV46cp

0 replies

8h43m

2024-04-08 09:45:55 UTC

The reality is that testing is as nuanced as anything else in software. Tradeoffs are different in different situations, and reasonable people can disagree about appropriate tradeoffs in a given situation.

waldrews

4 replies

1d1h

2024-04-07 17:17:54 UTC

Ooh! The 'docker image on WASM' thing sounds promising for a wide range of problems. Recently I wanted to run a FFMPEG/SoX pipeline on the client - too many dependencies to easily recompile with Emscripten; could your approach help there?

jasonjmcghee

2 replies

22h50m

2024-04-07 19:39:35 UTC

There's already ffmpeg wasm. I've used it in projects. Works great.

https://github.com/ffmpegwasm/ffmpeg.wasm

waldrews

1 replies

18h21m

2024-04-08 00:08:02 UTC

Thanks, looks good... doesn't look like there's a SoX though.

jasonjmcghee

0 replies

12h31m

2024-04-08 05:57:38 UTC

I googled - haven’t tried it https://github.com/rameshvarun/sox-emscripten

There’s also https://github.com/IPS-LMU/wasmsox

Lots of cool software that people hack together for fun. Just site:github and search!

n2d4

0 replies

2024-04-07 18:10:11 UTC

Yeah, that should be possible! Though, for audio processing, the performance will probably be terrible (because it's all emulated).

justinclift

3 replies

21h46m

2024-04-07 20:42:44 UTC

As a data point, the online demo seems broken for queries it doesn't like:

    select foo();
    Error.captureStackTrace is not a function

That's when using Firefox 124.0.2 on Linux.

rezonant

1 replies

20h1m

2024-04-07 22:28:25 UTC

Yes, that's a nonstandard function provided by v8, so it wouldn't work on Firefox. [1]

This can be worked around by just constructing an Error and taking it's stack property, captureStackTrace is just a convenience function, so hopefully they can fix that.

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

justinclift

0 replies

11h41m

2024-04-08 06:48:18 UTC

Thanks for suggesting that. Looks like they've implemented that here:

https://github.com/stackframe-projects/pgmock/commit/f80d9fa...

It seems to have beendeployed out to the demo already as it's working now. :)

n2d4

0 replies

18h2m

2024-04-08 00:26:37 UTC

My bad, gotta fix that one!

nborwankar

1 replies

18h22m

2024-04-08 00:06:49 UTC

If it could support the pgvector extension it would be a super fast vector database with all the power of Pg - the relational aspect brings the ability to add and query using rich domain specific metadata usually contained in relational databases.

J_Shelby_J

0 replies

13h57m

2024-04-08 04:32:24 UTC

I spent last week trying to do that with some of the other pg embeded libs.

And then lancedb released their embedded client for rust, so I went towards that. But it's still lacking FTS. So I fell back to sqlite. have some notes here https://shelbyjenkins.github.io/blog/retrieval-is-all-you-ne...

MuffinFlavored

1 replies

18h35m

2024-04-07 23:54:12 UTC

pgmock runs an x86 emulator with the original Postgres inside

Why can't Postgres compile to WASM instead of x86?

anarazel

0 replies

18h9m

2024-04-08 00:20:36 UTC

Postgres uses multiple processes, shared memory etc. The single user thing that OP referenced is single user because of that...

samwillis

0 replies

21h55m

2024-04-07 20:34:02 UTC

This looks really cool, awesome work!

Correct on PGlite only being single user at the moment, and that certainly is a problem for using it for integration tests in some environments. But I'm hopeful we can bring a multi-connection mode to it, I have a few ideas how, but it will be a while before we do.

There are a few other limitations with PGlite at the moment (related to it being single user mode), such as lacking support for pg_notify (have plans to fix this too). Whereas with this it should "just work" as it's much closer to a real Postgres.

I think there is a big future for these in-memory Postgres projects for testing, it's looks like test run times can be brought down to less than a 1/4 with them.

(I work on PGlite)

danpalmer

4 replies

8h10m

2024-04-08 10:19:24 UTC

by emulating the Postgres server in an x86 emulator

because performance is not usually a concern in tests

I hate to be a downer, especially because it sounds like a lot of complex work has gone into this, but I can't disagree more with test performance. I think for many projects test performance is more important than runtime performance – a 100ms request time is unlikely to cause issues for most applications, but a 10 minute test run can significantly hamper engineering productivity.

I don't understand why, as others have suggested, running Postgres in a RAM disk isn't an option. We did this on my previous team, tuned Postgres for better performance in tests, and ran all unit tests against it, the performance was excellent. Setup was trivial, still containerised like this project aims for.

Why run Postgres on an X86 emulator, in a WASM VM, in a Docker container... when one could just run Postgres in a Docker container and have all the same advantages? Someone commented that you could run it in a browser... but Postgres doesn't run in the browser normally so why would you need that for tests?

thibaut_barrere

1 replies

6h33m

2024-04-08 11:56:26 UTC

If you use languages that compile to WASM (such as Gleam https://gleam.run), and can also run Postgres via WASM, then it opens very interesting offline scenarios with codebases which are similar on both the client and the server, for instance.

danpalmer

0 replies

5h59m

2024-04-08 12:30:20 UTC

To be clear, this is if the client is a browser. If the "client" is a Linux machine you can just run a Docker container, or a full VM, or a Firecracker VM, no need for special languages or running software in a WASM VM. If it's a desktop/laptop you can just run a regular Postgres binary in an app (like Postgres.app does). If it's a phone you definitely don't want to be running the extra overhead of a persistent database process emulated on WASM, or something as heavyweight as Postgres.

I've heard many people talk about "interesting offline scenarios", but I've yet to see an application do them properly. Some embed SQLite to good effect, but I've not yet seen anything running something like an embedded Postgres, and other than it being a clever trick that might be fun to implement, I'm struggling to think of a use-case.

pmarreck

1 replies

4h13m

2024-04-08 14:15:56 UTC

Your tests, or at least your unit tests, should run extremely quickly (on the order of a maximum of a few seconds). If you have a minute+ long test suite, you have already failed IMHO. The utility of a test suite goes down quite steadily IMHO as the run-time goes up.

The road to a slow test suite is reached in very small increments over time, so you have to be fastidious about all of it in order to stave that off as long as possible. A simple optimization that I've seen double some test suite speeds is to swap out slow password hash algorithms on login with a no-op, but just for the test environment.

If running your DB slower significantly slows down your test suite then it's arguable that you are touching the DB too much already and repeatedly testing the same things over and over again (like creating users in the database for each test instead of using something in-memory)

danpalmer

0 replies

2h35m

2024-04-08 15:53:59 UTC

You're right that minutes matter here, and so does correctness. Most applications with a database depend strongly on that database layer to provide various guarantees and correctness.

Personally, I think having a database and then tuning it for performance is the best option, followed by not having a database and being faster. Having the database but then running it in an emulator on a VM on your infra (which is probably itself a VM), as in this case, seems like a bad call.

tumidpandora

2 replies

19h58m

2024-04-07 22:31:09 UTC

Off-topic, but the title confused me a bit - "...I built at work." Doesn't this imply that the intellectual property for this project belongs to your employer, assuming you used resources from work? If so, are you technically allowed to open-source it?

n2d4

0 replies

18h9m

2024-04-08 00:19:56 UTC

We're a startup, open-sourcing was as easy as getting the rest of the team's approval.

Gracana

0 replies

19h31m

2024-04-07 22:57:46 UTC

drzaiusx11

1 replies

20h24m

2024-04-07 22:05:10 UTC

How does this compare to H2 in postgres compatibility mode?

noisy_boy

0 replies

1h42m

2024-04-08 16:47:01 UTC

I may be wrong but I don't think you can use postgresql stored procs with H2.

andrelaszlo

1 replies

23h59m

2024-04-07 18:30:03 UTC

Cool! Which pg version is this based on?

justinclift

0 replies

21h45m

2024-04-07 20:44:00 UTC

The online demo seems to be PG 14.5. Output from "SELECT version()" is:

    "version": "PostgreSQL 14.5 on i686-buildroot-linux-musl, compiled by i686-buildroot-linux-musl-gcc.br_real (Buildroot 2022.08) 12.1.0, 32-bit"

xlii

0 replies

11h59m

2024-04-08 06:30:12 UTC

Hate to be w downer but I’d never consider this for use.

For trivial applications maybe it’d work, but with more complexity like anything that has risk of deadlocking or depends on the database shape and such solution subtracts from value as even small shift in behavior can snowball into critical problems.

Today I lean towards resource constrained E2E environment so that local test runners have opportunity to break if someone write anything grossly underperforming.

Not to mention that snapshotting DB after second and distributing this snapshot to test partitions is super fast and many times shaved multiple minutes from test suites.

It’s an interesting idea and definitely great learning experience but I think that target audience is limited.

vivzkestrel

0 replies

15h56m

2024-04-08 02:33:06 UTC

support for orms like sequelize, prisma, drizzle for testing?

theage

0 replies

10h18m

2024-04-08 08:11:27 UTC

For prisma/nodejs devs who just want postgres-in-a-can for local dev you are better off using the recently released serverizing of pglite, pglite-server: https://github.com/kamilogorek/pglite-server

It's faster, can persist data to fs, though less stable under heavy use than the full x86 emu e2e test server. I found pglite-server uses only 150MB ram compared to 830MB for pgmock-server. You can then use dotenv to checkout a new .env.local with updated DATABASE_URL for all your nextjs/prisma package.json run scripts

  DATABASE_URL="postgresql://postgres@localhost:5432/awesomeproject"
  "db:pushlocal": "dotenv -e .env.local -- pnpm prisma db push"

Very easy to add to any project, No wonder neon is sponsoring this space.

segmondy

0 replies

16h46m

2024-04-08 01:43:18 UTC

Dump your prod data, scrub all the sensitive data, truncate all the unneeded tables like your log tables. You have a good dev copy, replicate for dev ,qa, e2e, etc. Those extensions, triggers, functions, views, indexes, data are what you need for e2e.

rickette

0 replies

23h30m

2024-04-07 18:59:34 UTC

I used to run all kinds of (custom) fake in-memory servers in my tests. Nowadays I just run the real thing using Testcontainers (https://testcontainers.com)

herpdyderp

0 replies

20h22m

2024-04-07 22:07:05 UTC

This might be a stupid question, but do you know how this might be used with a Prisma client?

geuis

0 replies

17h23m

2024-04-08 01:06:16 UTC

I've used pgmem https://github.com/oguimbal/pg-mem for the last couple of years for the same thing.

fullstackchris

0 replies

10h25m

2024-04-08 08:04:16 UTC

This is amazing! I've been looking for an in-memory postgres mock for integration tests for ages!

davvid

0 replies

11h39m

2024-04-08 06:49:45 UTC

If you're into this kinda thing and are in python land you might also appreciate my friend's project in a similar vein

https://github.com/ugtar/pg_temp

crakhamster01

0 replies

17h15m

2024-04-08 01:13:43 UTC

This is pretty neat! Some questions, if you're able to answer:

* What was the inspiration for developing this project at work? Was running Postgres in a Docker container too slow?

* What did your CI setup for E2E tests look like before and after integrating pgmock into the flow?

* Was migrating over to this solution difficult?

Thanks!

camgunz

0 replies

9h43m

2024-04-08 08:46:30 UTC

Wow very cool that you got to open source this, thanks!

andy_ppp

0 replies

12h7m

2024-04-08 06:22:19 UTC

Why not just use Docker and have a different testing database? Elixir does this, and the testing framework wraps each test into a transaction that is rolled back for isolation. Be interesting to know the advantages of this approach!

KingOfCoders

0 replies

11h27m

2024-04-08 07:02:07 UTC

I've been using template databases in the past, to copy a new database for each test.

Trying out some in-memory ideas, there was not too much difference to a fast SSD.