Testing strategies
In programming, we sometimes perform patterns without thinking about why we do things
in a certain way. Often times this leads to patterns being implemented impractically with the
original intentions getting lost for the reason of just doing the things one
thinks he/she should be doing.
For testing, the problem is even more severe, because of the ambiguous use of terms describing
related practices.
To avoid dogmatic pitfalls when coming up with testing strategies, we need to first clarify what we really want to achieve through automated testing, what it enables us to do and how we can apply the principles in practice.
Why we test in the first place
Ultimately, automated tests are an enabler for
- higher productivity after the initial stages of a project
- Continuous Integration + Continuous Deployment
- A good night’s sleep
We gain these benefits when tests fulfill their purpose to give us high certainty that our program is working as expected. This sounds trivial, but often times this fundamental reasoning gets neglected too much in day to day development. Does it give you a good feeling to know that your software works with very high likelihood or do are your tests more like a coarse grained sieve where bugs slip through?
In the early stages of a project, a proper testing strategy along with all necessary high level utilities should get produced. If this is not done right at the beginning, software patterns may emerge that are harder to test or suffer from a lack of potential optimizations like parallelization. In my experience, the later certain test types are introduced to a project, the lower the incentive to implement them right or implement them at all.
Nomenclature
The testing space suffers a lot of ambiguous naming. For everything except unit tests, people use the same name to describe different concepts. To avoid confusion, I have to give a short definition of the testing terms I am using in the following chapters.
test type | scope |
---|---|
Unit test | Tests a single Unit of a Software. This includes a single function, class or small class compound |
Integration test | Tests the interaction of a single software Unit with an external system |
Contract test | Tests that an API behaves as expected. Can be written by the API providers (provider-based) or by the consumers (consumer-based) |
E2E test | Tests the software by clicking buttons and filling input fields like a real user. The sofware must have been built as a production binary/container |
API E2E test | Tests the logical API flow of a production binary/container |
The testing pyramid
The testing pyramid is a concept originally introduced by Mike Cohn in his book Succeeding with Agile in 2009, but since then the concept has been talked over many times and does not have a strict definition in modern interpretations. Generally the testing pyramid is a guideline for test categories that should be present in a project along with their portion of the total amount of tests.
The bigger the cross section of the pyramid, the more tests should be in the respective category. The foundation consists of unit tests, which must provide very fast feedback.
In the mid area, integration tests have their place. These are more complicated than unit tests, because they require to spin up a single external system like a REST API, an Authentication provider, a message brokers and so on. Ideally these systems are provided as production-ready containers, but may be mocked compromising some confidence for the sake of simplicity.
As we move further up the pyramid, other tests with even broader scope like Contract, API E2E and E2E tests take their place. Of course it is not a necessity (and likely not beneficial) to implement tests for every category, but it always makes sense to follow the general principle to layer tests of increasing complexity in decreasing numbers.
Imagine a case where we fully neglected the higher levels of thy pyramid. Such a testing strategy may only cover Unit and Integration tests. One would not even know if the program would actually start successfully, because the entire logic within the (DI) composition root and the build of the container are left untested! In this scenario executing all tests successfully would not give us much confidence at all.
After E2E tests we can still take it further and take testing to production 🤯. You might think that this is crazy (at least it is scary), but it surely makes sense for systems with high SLAs. Even when your software is working in your test environment, there is no guarantee that this will be the case in production. Test environments behave differently and are tested differently. In big systems, for example, many more machines operate in production than in test environments, simply due to the high cost. Running many more machines inevitably leads to a higher number of machines failing. What happens when a machine shuts down, which was responsible for a customers current session?
One example for testing in production is Netflix’ Chaos Monkey, which randomly shuts down virtual machines in production.
The system is observed at all times and ensures that the systems keeps operative despite loss of individual nodes.
A generally more applicable approach is to test critical paths. For a web shop this could be an E2E test, in which
a hard coded test user adds items to the cart and then performs a checkout. By testing the most used and most critical parts of the
system, costly failures can be spotted early and missed revenue limited.
Choosing the right strategy for tests is a delicate balancing act, like most other complex decisions in software development. The proper strategy depends on multiple factors, the most important ones being
- whether your team owns all services related to a given feature
- whether there are other teams using your API
An example system
I am specialized in Fullstack and Microservice development. This means that to me a common environment includes a frontend communicating with a Microservice via a REST API and some external APIs like other services and authentication providers that the service uses.
The diagram shows a simplified architecture of the Order Service of a Web Shop. In this scenario, the jobs of the Order service are
- to make sure the user is authenticated by the oAuth2/OIDC provider
- manage the user’s shopping cart and persist its state in a PostgreSQL database
- perform the payment through a third party payment provider API
This minimal example covers the most common scenarios we have to test - Authentication, Persistence and an external API (who owns the API does not matter testing-wise).
Firstly, we need unit tests covering the logic that is specific to the frontend and our Microservice. Unit tests are by far the easiest to write and the vast majority of developers has implemented them before, therefore I will not go into more detail. The next chapters will focus on higher level tests.
Integration tests
The important aspect of integration tests it to only test the integration of a single unit of code with one external system at a time.
Considering we are using an OAuth2 provider to authenticate against, we are highly likely also using a library that handles authentication for us. The auth library should already be tested, but we still need to test whether our API enforces the authentication - if we don’t do that, one user may update the cart of another.
We also need to test our calls to the PostgreSQL instance and the Payment provider API. For the DB related tests, we can easily get high confidence tests by testing against the real thing without noteworthy downsides. This would be done by executing specific Units of the persistence layer and checking whether the returned data is what we expect. To start an isolated version of Postgres we can simply create a docker container from the Official Image.
For the payment provider, the case is more complicated. We can’t use the actual provider, otherwise we would have to pay every time we
use the API. Generally, when testing against the real thing is too costly (in terms of money, time or complexity), we need to mock the external system.
To achieve this we need can use a HTTP Mock server. This space has seen several projects come and go, for example Mountebank,
and Mockserver. While not useful for testing, I also want to list JSON-Server,
which is essentically a easily installable web server/mock API, which is nice for simple PoCs.
For proper API mocking, the best maintained tool is currently WireMock, which also comes with Client libraries for Java and Go.
Similar to the database integration tests, we can spin up a WireMock container, instantiate a client and define our expected requests and mock response
to mimic the behavior of the Payment API.
The obvious disadvantage is that this will not give us as much confidence as testing with the real thing, but at least the tests
will let us know if someone makes breaking changes to the client like changing JSON decoding or improper error handling. The best
case scenario is that our imaginary payment provider provides us with a library, so we don’t have to resort to mocking.
You may think why we need to spin up a Wiremock Container, instead of just injecting a HTTP Client stub into the Unit that takes care of
communication with the payment provider. Looking only at the integration tests, this is a bit disadvantageous indeed, but if we take into
account that we will also create higher level tests like E2E, our production ready backend container will need some other host to communicate with - and
if it comes to that, we will be well prepared with testing utilities to spin up this container.
Generally, for all tests that need external dependencies, we should make sure from the start that all
dependent services can be started in parallel. This takes more time and consideration, but the effort will be well worth it if our project grows bigger
and we can keep adding more compute to lower test execution times.
In our testing scenario, the main enabler for running dependencies in parallel is to put all of them in Docker containers and randomize their ports.
TestContainers is currently the state-of-the-art software to do just that and
it seems that it is going to be around for quite some time. It is available
in a vast array of languages, including Java and Go and allows us to configure and manage the lifecycle of containers elegantly.
Before our tests, we start the container, instantiate a DB client and apply the schema to the database, e.g. by using Flyway.
Afterwards we can can test all the functionality of our persistence layer.
E2E, Contract and API E2E tests
For implementing E2E, Contract an API E2E tests, we follow a similar approach as with the integration tests, but take it one step further. The difference is that instead of running just a single service dependency in a docker container, we now run every depdenency our system uses and also run a Container built from image of our Order service itself.
For implementing E2E tests, we need a Webdriver. A Webdriver is a library that lets us control a (headless) browser to navigate through and interact with our web application. In this space popular choices include Selenium, Playwright and Cypress. On the surface these tools provide kind of the same basic API, which is essentially like this:
- Go to URL
- Fill out form
- Click submit
which is of course greatly simplified, but gets the idea across.
E2E tests give us the highest confidence, but they have two problems. The first is that the tests tend to be the most flaky ones.
The second one surfaces only in large scale systems. If a software landscape is so big that hundreds or even thousands of engineers
work on the system, E2E tests usually take so much time that they are only run directly on the testing environment, for example once a day.
In this scenario it becomes less clear as to which team should fix a broken E2E test. In the world
of Microservices, teams control vertical slices of functionality - from frontend to the database. Now what happens when the E2E tests of
your part of the frontend fail, because another team had a breaking change to one of their services’ APIs that you use? Unfortunately
there is no simple answer to this problem. For smaller organizations and teams however, even a small number of E2E tests is very valuable.
When E2E tests prove to be suboptimal, we can resort to API E2E tests, which essentially test the same functionality, but instead
of driving application logic through a browser, we do it with an HTTP client in our tests. This also lets us verify entire business flows
without having to worry about whose responsibility it is to fix broken E2E tests. When settling for this approach, one has to test the
frontend without the actual backend, which in unfortunately lowers our overall confidence in the tests.
To at least come up with a good mock for the backend, Swagger Codegen could be used,
although I have to admit that I did not yet face the case where the size of the organization caused an issue for testing,
so I do not have experience in this area.
If other teams are also using our API, we should additionally ensure that we do not accidentally break their clients. We can do this by creating contract tests. Contract tests can be producer-based or consumer-based, meaning either we write them ourself to specify the entire API behavior or we let the teams that use our API write tests for the functionality they use, which then get executed in our own CI/CD.
The last thing we haven’t discussed is how to handle authentication in our E2E tests. In some applications I have seen flags that disable authentication altogether, but I do not like this solution for two reasons:
- Authentication could get disabled by accident in production
- We need to manually fiddle with the system in places where we need to decode the JWTs, e.g. to determine which roles a user has
I am strong proponent for not adding such flags and just providing an oAuth2 provider at all times. My go-to solution for this purpose
is to run a mock-oauth2-server Container locally. mock-oauth2-server
fully implements
the OIDC protocol. We can send any credentials to the server and it will respond with a valid, signed token that our backend can
verify. Responses can also be customized, giving it the flexibility to act as any oAuth2 provider we could ever need.
Rules for good testing
Having laid out the most important testing categories and how to write tests for them, I want to give some final guidelines for how to write them well:
1. Treat tests with the same care as production code
- Do not copy paste logic. Often times tests are structured similar, which is inherent to their nature, but logic that is included in multiple tests deserves its own function
- Practice writing Clean Code
2. create high level testing utilities
- Starting a dependency container like a database should be possible by executing a single function. Making testing easier and more enjoyable usually leads to people writing better tests. If done right off the bat, common testing utilities for Container interaction may be used for integration and E2E tests
3. Test to increase confidence in correctness, not for dogmatism
- You don’t need to implement every test category to have high confidence in correctness
- Functionality should only be tested once. For example if an object is tightly coupled to another
(calling
new Thing()
), then fully test the functionality on theThing
and only the rough calls on the object using it
4. Define every edge case
- Tests should fully define the behavior of a piece of software, including
null
and other falsy values, especially highly shared code like utilities
5. Group tests into stages
- Having fast executing tests is a necessity for Continous Integration and Continuous Delivery
- execute tests grouped by their category or execution duration, from the bottom of the pyramid to the top and execute the layers separately in CI/CD pipelines
- This ensures fast initial feedback combined with delayed high certainty of correctness
Summary
Effective testing isn’t about checking every box or following dogma—it’s about building confidence in your system’s correctness. By understanding the purpose of each test type, investing in clear testing utilities, and applying a thoughtful, layered strategy from unit to E2E tests, you can create a reliable foundation for continuous integration and deployment. Ultimately, good testing will help you ship faster, sleep better, and adapt with confidence.