Not sure how I hadn't encountered this before, I LOVE this pattern.
I find integration tests that exercise actual databases/Elasticsearch/Redis/Varnish etc to be massively more valuable than traditional unit tests. In the past I've gone to pretty deep lengths to do things like spin up a new Elasticsearch index for the duration of a test suite and spin it down again at the end.
It looks like Testcontainers does all of that work for me.
My testing strategy is to have as much of my application's functionality covered by proper end-to-end integration-style tests as possible - think tests that simulate an incoming HTTP request and then run assertions against the response (and increasingly Playwright-powered browser automation tests for anything with heavy JavaScript).
I'll use unit tests sparingly, just for the bits of my code that have very clear input/output pairs that afford unit testing.
I only use mocks for things that I don't have any chance of controlling - calls to external APIs for example, where I can't control if the API provider will be flaky or not.
I love integration tests. You know why? Because I can safely refactor all I want!
Unit tests are great, but if you significantly refactor how several classes talk to each other, and each of those classes had their own, isolated unit tests that mocked out all of the others, you're suddenly refactoring with no tests. But a black box integration tests? Refactor all your code, replace your databases, do whatever you want, integration test still passes.
Unit test speed is a huge win, and they're incredibly useful for quickly testing weird little edge cases that are annoying to write integration tests for, but if I can write an integration test for it, I prefer the integration test.
Thanks for saying this out loud. I’m a solo dev and in my project I’m doing exactly this: 90% black box integration tests and 10% unit tests for edge cases I cannot trigger otherwise. It buys me precious time to not adjust tests after refactoring. Yet it made me feel like a heretic: everyone knows the testing pyramid and it comes from Google so I must be very wrong.
You might be interested in the ‘testing trophy’ as an alternative to the traditional pyramid.
https://kentcdodds.com/blog/write-tests
This advice is so misguided that I'm concerned for our industry it's getting so much traction.
Unit tests don't need to test implementation details. You could just as well make that mistake with integration or E2E tests. Black box testing is a good practice at all layers.
What unit tests do is confirm that the smallest pieces of the system work as expected in isolation. Yes, you should also test them in combination with each other, but it serves you no good if you get a green integration test, when it's likely only testing a small fraction of the functionality of the units themselves.
This whole "unit tests slow you down" mentality is incredibly toxic. You know what genuinely slows me down? A suite with hundreds of integration tests, each taking several seconds to run, and depend on external systems. But hey, testcontainers to the rescue, right?
Tests shouldn't be a chore, but an integral part of software development. These days I suppose we can offload some of that work to AI, but even that should be done very carefully to ensure that the code is high quality and actually tests what we need.
Test code is as important as application code. It's lazy to think otherwise.
If by "smallest pieces of the system" you mean something like individual classes then you are definitely testing implementation details.
Whenever you change a method's parameters in one of those internal classes you'll have unit tests breaking, even though you're just refactoring code.
Unit testing at the smallest piece level calcifies the codebase by making refactors much more costly.
If I change something at the lowest level in my well abstracted system, only the unit tests for that component will fail, as the tests that ‘use’ that component mock the dependency. As long as the interface between components doesn’t change, you can refactor as much as you want.
I prefer having the freedom to change the interface between my components without then having to update large numbers of mocked tests.
No, there's nothing definite about that.
The "unit" itself is a matter of perspective. Tests should be written from the perspective of the API user in case of the smallest units like classes and some integration tests, and from the perspective of the end user in case of E2E tests. "Implementation details" refers to any functionality that's not visible to the user, which exists at all levels of testing. Not writing tests that rely on those details means that the test is less brittle, since all it cares about is the external interface. _This_ gives you the freedom to refactor how the unit itself works however you want.
But, if you change the _external_ interface, then, yes, you will have to update your tests. If that involves a method signature change, then hopefully you have IDE tools to help you update all calling sites, which includes application code as well. Nowadays with AI assistants, this type of mechanical change is easy to automate.
If you avoid testing classes, that means that you're choosing to ignore your API users, which very likely is yourself. That seems like a poor decision to make.
If your classes properly specify access modifiers, then no, you're not testing implementation details. You're testing the public interface. If you think you're testing implementation details, you probably have your access modifiers wrong in the class.
in a perfect world each unit would do the obvious thing without many different paths throught it. The only paths would be the paths, that are actually relevant for the function. In such a perfect world, the integration test could trigger most (all?) paths through the unit and separate unit-tests would not add value.
In this scenario unit tests would not add value over integration tests when looking for the existence of errors.
But: In a bigger project you don't only want to know "if" there is a problem, but also "where". And this is where the value of unit tests comes in. Also you can map requirements to unit tests, which also has some value (in some projects at least)
edit: now that I think about it, you can also map requirements to e2e tests. That would probably even work much better than mapping them to unit-tests would.
I don't think that's realistic, even in an imaginary perfect world.
Even a single pure function can have complex logic inside it, which changes the output in subtle ways. You need to test all of its code paths to ensure that it works as expected.
This is also highly unlikely, if not impossible. There is often no way for a high-level integration test to trigger all code paths of _all_ underlying units. This behavior would only be exposed at the lower unit level. These are entirely different public interfaces.
Even if such integration tests would be possible, there would have to be so many of them that it would make maintaining and running the entire test suite practically unbearable. The reason we're able and should test all code paths is precisely because unit tests are much quicker to write and run. They're short, don't require complex setup, and can run independenly from every other unit.
Not just in a "bigger" project; you want to know that in _any_ project, preferably as soon as possible, without any troubleshooting steps. Elsewhere in the thread people were suggesting bisecting or using a debugger for this. This seems ludicrous to me when unit tests should answer that question immediately.
Of course. Requirements from the perspective of the API user.
Yes, you can, and should. But these are requirements of the _end_ user, not the API user.
No, this is where the disconnect lies for me. One type of testing is not inherently "better" than other types. They all complement each other, and they ensure that the code works for every type of user (programmer, end user, etc.). Choosing to write less unit tests because you find them tedious to maintain is just being lazy, and finding excuses like integration tests bringing more "bang for your buck" or unit tests "slowing you down" is harmful to you and your colleagues' experience as maintainers, and ultimately to your end user when they run into some obscure bug your high-level tests didn't manage to catch.
I think having a good architecture plays a big role here.
I've heard this aversion to unit tests a few times in my career, and I'm unable to make sense of it.
Sure, integration tests "save" you from writing pesky unit tests, and changing them frequently after every refactor.
But how do you quickly locate the reason that integration test failed? There could be hundreds of moving parts involved, and any one of them malfunctioning, or any unexpected interaction between them, could cause it to fail. The error itself would likely not be clear enough, if it's covered by layers of indirection.
Unit tests give you that ability. If written correctly, they should be the first to fail (which is a good thing!), and if an integration test fails, it should ideally also be accompanied by at least one unit test failure. This way it immediately pinpoints the root cause.
The higher up the stack you test, the harder it is to debug. With E2E tests you're essentially debugging the entire system, which is why we don't exclusively write E2E tests, even though they're very useful.
To me the traditional test pyramid is still the best way to think about tests. Tests shouldn't be an afterthought or a chore. Maintaining a comprehensive and effective test suite takes as much hard work as, if not more than, maintaining the application itself, and it should test all layers of the system. But if you do have that, it gives you superpowers to safely and reliably work on any part of the system.
It's very simple: most of the time people are told by management that they MUST achieve a 80-90-95% of code coverage (with unit tests), which leads to a lot of absolutely worthless tests - tests for the sake of it. The irony is that the pieces that really count don't get tested properly, because you unit-test the happy-path and maybe 1 or 2 negative scenarios, and that's it, missing out a bunch of potential regressions.
EDIT: This just to say that I don't believe the author of the comment said "don't write unit tests" (I hope not, at least!) but, if I can rephrase it, "well, the integration tests give you a better dopamine effect because they actually help you catch bugs". Which would be partially true also with properly written unit tests (and they would do so in a fraction of the time you need with integration tests).
So strict rules from management in a company that likely doesn't understand software development, and lazy developers who decide to ignore this by intentionally writing useless tests, lead to thinking that unit tests and coverage are useless? That doesn't track at all.
I'd say that the answer is somewhere in the middle. If the company doesn't understand software development, it's the engineer's job to educate them, or find a better place to work at. It's also the engineer's job to educate lazy developers to care about testing and metrics like code coverage.
And unit tests don't? I would argue that unit tests give you much more of that dopamine, since you see the failures and passes much more quickly, and there should be much more of them overall. Not that we should structure our work towards chasing dopamine hits...
I'd say that most of the people who advocate for this position haven't worked with a well tested codebase. Sadly, not all of us have the privilege of working with codebases like SQLite's, which go much beyond 100% line/statement coverage[1]. Is all that work in vain? Are they some crazy dogmatic programmers that like wasting their time? I would say: no. They just put a lot of effort and care in their product, which speaks for itself, and I would think makes working on it much safer, more efficient and pleasant.
I would also argue that the current state of our industry, and in turn everything that depends on software, where buggy software is the norm would be much better overall if that kind of effort and care would be put in all software projects.
[1]: https://www.sqlite.org/testing.html
Linked SQLite page mentions this:
Branch test coverage is different from line coverage and in my opinion it should be the only metric used in this context for coverage.
90-95% line coverage is exactly why many unit tests are garbage and why many come up with the argument "I prefer integration tests, unit tests are not that useful".
I'm not sure if I understand your argument.
It's not different, just more thorough. Line and statement coverage are still useful metrics to track. They might not tell you whether you're testing all code paths, but they still tell you that you're at least testing some of them.
Very few projects take testing seriously to also track branch coverage, and even fewer go the extra mile of reaching 100% in that metric. SQLite is the only such project that does, AFAIK.
Hard disagree. Line coverage is still a useful metric, and the only "garbage" unit tests are those that don't test the right thing. After all, you can technically cover a block of code, with the test not making the correct assertions. Or the test could make the right assertions, but it doesn't actually reproduce a scenario correctly. Etc. Coverage only tracks whether the SUT was executed, not if the test is correct or useful. That's the job of reviewers to point out.
No. Programmers who say that either haven't worked on teams with a strong testing mindset, haven't worked on codebases with high quality unit tests, or are just being lazy. In any case, taking advice from such programmers about testing practices would not be wise.
The DB migration has "ADD COLUMN rateLimit INT"
The application class member is annotated with "@Column(name=”ratelimit”, nullable=true)"
The failure is at the interface between the app and the DB. What testcontainers does is allow you to write a quasi-unit test (not a truly full-blown integration test, but testing a small piece of functionality) across the boundary of two components. I am not aware of a way to reasonably unit test for this error. That might just be me -- seriously if there is a tried strategy for unit testing things like this I'd love to know it.
That's an _integration_ failure. We're talking about testing two entirely different things here. Of course you shouldn't expect to test integration scenarios in unit tests.
But you also shouldn't expect to test all unit scenarios in integration tests. This is what the "trophy testing" model advocates for. That somehow unit tests are not needed if you have thorough integration tests, which is entirely false.
They test the application at different layers, because they're meant to ensure behavior to different types of users. They're both useful for catching bugs and unexpected behavior, and they're meant to complement each other.
If the test fails consistently (as it should) it is usually just a question of using a debugger and stepping through some suspect sections of the code to find the issue.
Compared to the amount of time saved by not rewriting unit tests every time you refactor stuff, it's a great trade-off.
As long as the error is reproducible, never in my career have I had a hard time locating the source of the error. Bisection does wonders (as a general concept, not specifically referring to git bisect).
That said, I have encountered plenty of non-reproducible test failures. Moral of the story: make things reproducible, especially tests.
Easier said than done.
how do you handle resetting a sql database after every integration test? Testcontainers may help here by spinning up a new instance for every test but that seems very slow
If I'm using Django I let Django's default test harness handle that for me - it runs each test in a transaction and rolls it back at the end of the test, which is pretty fast. https://docs.djangoproject.com/en/5.0/topics/testing/overvie...
For my other projects I'm generally using SQLite where starting a new in-memory database is so fast it's effectively free.
How does that work when the system under test uses transactions itself?
A lot of databases these days support nested transactions using savepoints, which Django's test framework can take advantage of.
There's a separate mechanism for writing tests where you need to explicitly test transaction mechanics: https://docs.djangoproject.com/en/5.0/topics/testing/tools/#...
I do this a lot for Postgres testing. In my setup, I create a single database for the entire test run. Each test creates its own schema in that database and applies the latest table definitions.
With this setup, I only eat the container creation once, while allowing every test to operate in isolation from one another, be parallelized, and test against a real database.
I do a similar trick for S3 containers by applying a unique guid prefix to the buckets in each test.
doesn't it take a lot of time to create the schema and populate it with enough data to get going?
It depends on what you’re testing. Applying the schema is pretty fast (30-40ms), compared to the container creation (1-2 seconds). If you need a lot of test data it would take time, but most of the time Im only applying enough rows to hit my test conditions. For crud apps I usually orchestrate the test setup using the public APIs of my application against the fresh instance.
Do the whole test in a transaction and roll it back at the end.
Legit. Probably an unpopular opinion but if I had to chose only one type of test (queue a long discussion with no resolution over defining exact taxonomic boundaries), I'd go with integration over unit. Especially if you're a new contributor to a project. I think it comes down to exercising the flow between... Well, integrations across components.
Even better? Take your integration test, put it on a cronjob in your VPN/vpc, use real endpoints and make bespoke auth credentials + namespace, and now you have canaries. Canaries are IMHO God tier for whole system observability.
Then take your canary, clean it up, and now you have examples for documentation.
Unit tests are for me mostly testing domain+codomain of functions and adherence to business logic, but a good type system along with discipline for actually making schemas/POJOs etc instead of just tossing around maps strings and ints everywhere already accomplishes a lot of that (still absolutely needed though!)
Right. Unit tests are typically a waste of time unless you have some complicated business logic (say, some insurance rates calculation, etc.).
This was advocated long time ago in the (great) book "Next Generation Java Testing: TestNG and Advanced Concepts" by Cédric Beust and Hani Suleiman (old people will remember his (in)famous The Bile Blog...).
Unit tests are fine for testing any kind of logic. Whether you consider the logic important is a different question.
Do you find this results in less overall test code to maintain since you likely have fewer but higher quality/signal tests?
Yeah - I find that sticking to tests like this means I don't have hundreds of tiny unit tests that rely on mocks, and it's still very supportive of refactoring - I can make some pretty big changes and be confident that I've not broken anything because a given request continues to return the expected response.
The choice isn't unit tests vs . end-to-end tests, its between testing things you don't really care about and those you do.
You care about real use cases and verifying design constraints are met. You don't care about internal implementation details.
The nuance is that there are often things one cares about at multiple levels.
Yes, you just focus on a few high level behaviors that you want to validate, instead of the units. It’s more difficult to pull these tests off, as there are more chances for them to become flaky tests, but if they work they provide much more value.
I’d prefer a dozen well written integration tests over a hundred unit tests.
Having said that, both solve different problems, ideally you have both. But when time-constrained, I always focus on integration tests with actual services underneath.
Another technique I've found very useful is generative integration tests (kind of like fuzzing), especially for idempotent API endpoints (GETs).
For example, assuming you have a test database with realistic data (or scrubbed production data), write tests that are based on generalizable business rules, e.g: the total line of an 'invoice' GET response should be the sum of all the 'sections' endpoint responses tied to that invoice id. Then, just have a process that runs before the tests create a bunch of test cases (invoice IDs to try), randomly selected from all the IDs in the database. Limit the number of cases to something reasonable for total test duration.
As one would expect, overly tight assertions can often lead to many false positives, but really tough edge cases hidden in diverse/unexpected data (null refs) can be found that usually escape the artificial or 'happy path' pre-selected cases.
Running unit tests as integration tests will explode in your face. In any decent complex code base testing time will go through the roof and you will have a hard time getting the genie back in the bottle.
Testing that you actually run "sum()" is a unit test.
I once failed a take home assignment because of this. It was writing a couple of api endpoints and for testing, I focused on integration over unit. I even explained my reasoning in the writeup. There was no indication that the company preferred unit tests, but the feedback was "didn't have enough unit tests". What a dumb company.
This is exactly the strategy I have discovered to bring the most value as well. And honestly, something that simplifies the setup of those containers is pretty great.