Tests organization and naming

Note, this post was originally published on INNOQ blog together with Torsten Mandry and Theo Pack.

As our system grows, so will our test suites. For our production code, we have learned techniques to keep it maintainable. For example, we try to structure our logic into sub-aspects, put them in specific locations and give the units meaningful names. We want to achieve the same for our tests. One of the main goals is that a developer - or generally speaking, the person who has to maintain the test - knows where to find which test. We also want to understand as quickly as possible what the test is for and what might be the reason for a failing test.

All posts in this series:

Consistency

Before we come to the actual topics of this post, we would like to stress one important attribute that is cross-cutting through almost everything: consistency.

A high level of consistency in a project ensures increased confidence and allows us to efficiently develop and maintain our system. This applies not only to the productive code but also to our tests. With a consistent project structure, developers immediately know where to find existing parts or to add new parts. In terms of tests that especially means that if a test file or method does not exist at the expected place, the corresponding feature or aspect is probably not yet covered by a test.

A lack of consistency can lead to a situation where test suites become more and more confusing and unmaintainable over time. The team no longer has an overview of which tests exist and which do not. They would have to search for usages of the suspicious method to find existing tests that call it. If those tests can not be understood immediately, a developer will probably ignore them and add a new one. If there’s no clear idea which aspect of the system should be tested in which way and at which place, developers will decide individually, from case to case. Over time, this will lead to lots of tests that redundantly verify the same aspects/behaviour, most probably in different ways. This will make the situation even worse. In the worst case, the team ends up with a huge number of tests they have to maintain, but with only very little confidence in those tests because the functionality they cover is unclear. Failing tests might be ignored or disabled because no one knows what they test and why they fail. The tests no longer provide any value but only maintenance costs.

The same principles we naturally respect for our production code are also relevant for our test code: We need to have a common understanding of how we want to organize, structure, implement, and consistently name our tests, to keep them understandable and maintainable over time.

One thing we would like to emphasize here is: consistency beats everything else. Regardless of the quality of individual decisions, following them consistently will always help us to keep the overview. It will be much easier to find our ways with a suboptimal organization that is consistent than with one where each test has a, maybe really good, but individual place for itself. The same is true for other aspects like naming, test structure, etc. Consistency simply reduces the cognitive load on the people working with the tests.

Organization

In larger projects with many tests of different types, finding a specific test can easily become a challenge of its own. We all remember situations where we noticed some strange behaviour of our system and asked ourselves: Shouldn’t there be a test to verify this? Where is it? Along with the decision, which test types we want to have, we have to decide how to organize them, so that it’s clear where each test is located.

Location

There are different approaches to where we want to place our tests. We have to consider, do we want the tests close to the actual production code (for example in Go tests are often stored in the same directory) or do we need to keep them completely separate from the production code, like in a separate project. How can we distribute and place the tests within a project or directory? We can try to group tests by technical characteristics (for example by the type of tests, so that isolated, performance and E2E tests are managed separately). But we can also group them based on the domain and functionality. Deciding which approach is right for the project depends on the test types and what we need to test, as well as the technical context and the chosen tools, where and how certain types of tests are usually stored.

It’s probably a good idea to decide that the closer the tests are to the code, so must be their location. In other words, the tests need to be cohesive with other tests, but also with the production code. Isolated tests, focused on specific classes or modules, are usually dependent on the structure of the production code. Such tests, also called white-box tests, should be kept in the same module or package as the code they’re testing. For more grey- or black-box tests, far less reliant or completely ignorant of the structure of the production code, locating them right beside the code might make less sense. Those tests also usually do not focus on the functionality/responsibility of certain classes or modules. For example, performance tests should not need to have any clue about the internals of the system and its modules, they merely need to know how to interact with it. A different example would be architecture tests, written for example in ArchUnit. Although they’re very much interested in the structure of the code, they’re also not “local” to certain components, but cross-cutting through the system as a whole. We could put such tests even into a separate module within a system’s code repository.

Sometimes the languages or test frameworks provide a standard layout, where to place the tests. It is usually a good idea to follow the standard, as the developers will find a familiar structure across projects. But such layouts might collide with our preferred structure. So we should ask ourselves whether the given layout is suitable for our project or if we need a custom one. Here we have to decide whether the benefits of the individual structure outweigh the advantages of a standard layout (for example, certain tools would have to be reconfigured if we use a custom layout). Again, consistency is extremely important. If we decide to go for a non-standard layout, and therefore make our code inconsistent with the majority of other projects, the benefits should outweigh this inconsistency by a big margin.

Regardless of how and where tests are stored and maintained, defining and documenting this is an important part of the testing strategy.

Test class structure

In the case of tests, just like with the production code, a uniform and consistent structure helps us to navigate quickly within the file, but it can also provide us with valuable information at first glance. For example, if we always define all external dependencies to other classes as a block, we can see directly in the test class which other classes we need to prepare for our test. The same goes for setup and teardown functions. They’re usually kept above tests themselves and give us an idea of what happens before and after each test is executed. On the other hand, all helper functions we mentioned in Anatomy of a good test post should go after the tests (or in separate files to be re-used by other tests), they only help the tests to clearly express what they’re about.

Even within a single test file, it might be necessary to group tests if they relate to different aspects of tested functionality. One option - maybe the most intuitive one - is to group the tests by the different methods or functions of the unit under test. So, for example, have a group of tests for the add method and another group for the remove method of our ShoppingCart. Other options would be to group them by their initial state (e.g. one group for testing an empty shopping cart, one for testing a shopping cart with items) or by the expected outcome (e.g. initializing a new shopping cart and removing the last item from it both lead to an empty shopping cart). Depending on the characteristics of the unit under test one option makes more sense, others less. Some testing frameworks support grouping and nesting the tests. If ours doesn’t, we need to find a good way of doing that ourselves. One option is, to have multiple test files/classes for one unit under test, e.g. one test file per method/aspect to be tested.

An example of a file structure for unit tests can look like this. In this example, the developer knows directly where the objects in the class are initialized.

public class ShoppingCartTest {
    // Class under test
    private ShoppingCart shoppingCart;

    // Dependencies for unit under test
    private PriceCalculator priceCalculator;

    // Methods for initialization
    @BeforeEach
    public void init() { ... }

    // Methods for clean up
    @AfterEach
    public void cleanUp() { ... }

    // Tests Methods
    @Nested
    class AddingItems {

        @Test
        public void addTwoItems() { ... }

        ...
    }

    @Nested
    class RemovingItems {

        @Test
        public void removeSingleItem() { ... }

        ...
    }

    // Assert methods
    private void assertItemInShoppingCart(...) { ... }

    // Other helper methods
    private Article createArticle(...) {...}
}

Naming

“There are only two hard things in Computer Science: cache invalidation and naming things.”

– Phil Karlton

Naming plays an extremely important role in software development. Good naming helps us to understand what is happening in the code and what is its responsibility. It also serves as a support to understand the domain (ideally the names reflect the language of the domain).

Functional tests

When we talk about the use of tests, we have one question in mind. What functionality is being tested here? Answering this question helps us to identify which test needs to be adapted when we change existing functionality. But it also helps us to recognise at first glance which functionality no longer works correctly if one of the tests fails and we see the test name in red letters.

We were mentioning putting tests in good locations. But how do we differentiate locations like folders or packages? By their names! And this is our starting point. It is, of course, not limited to tests, but names of all the higher-level structures like modules, packages, etc. influence our understanding. The name of a module or package already defines the context of our tests.

At this level, consistency with the production code is very important. Let’s again take our example from the post Anatomy of a Good Test. We want to test our cart functionality. The ShoppingCart itself is in the package com.innoq.shop.cart. The most natural place for our test is then the same package. Any other would be very confusing to anyone unfamiliar with the code.

At the next level, the class or file name should tell us which aspect of the package, module or overall system is being tested. Depending on the test, the file name does not always have to include the component names, like ShoppingCartTest. Especially for more complex tests that check the interaction of several components, naming the test class is no longer trivial. In the best case, terms already exist for certain processes in the domain (e.g. for features or use cases), which can then be adopted.

Naming specific test methods is even more difficult. Whereas for packages or classes we could get away with simply naming them after the aspect we’re testing, the methods need to tell us exactly what we’re testing. The goal is to be able to tell exactly what is wrong in our system once we get a report from our tests with some failures. Usually, our tools will tell us which tests failed using their method names or some short descriptions we provide. Those method names or descriptions should be expressive enough to say what is being tested, they also need to be distinct enough for us not to confuse two tests. At the same time, they need to be short enough to still be readable.

Let’s look again at our example from the post Anatomy of a Good Test. In the post, we want to test the behaviour of our ShoppingCart when we add two items. The simplest way to name this test would be to just name it after the method that is tested. In our ShoppingCartTest example we would name the test method add. This way, the test name indicates clearly which method is tested. But nothing more. We would have to look into the test code to identify how the method is tested, what aspects are verified, what are the starting conditions, etc. Moreover, for non-trivial methods, having one test usually is not sufficient to test every desired behaviour. So we would have to find other test names, anyway.

That’s why in our ShoppingCartTest we have named the test addTwoArticles. If the test addTwoArticles fails, but the test addOneArticle was successful, we probably have a problem when we add more than one article to our shopping cart. If we have chosen good names for our tests and can be confident that the tests are testing exactly what we say they are testing, then we would ideally not have to look at the test code at all and could focus directly on the actual ShoppingCart implementation.

The name addTwoArticles seems quite expressive. If it fails we would know where to look for a bug, especially after seeing addOneArticle succeeded. Still, the name could become not expressive enough once we introduce some “special” articles, requiring special handling by the cart. We would then maybe need to rename addTwoArticles to addTwoRegularArticles and add a new test addOneRegularAndOneSpecialArticle. Unfortunately, good names are not given forever, they will need to evolve along with the system.

Naming styles

For different types of tests, the names can also be structured differently. Most often, we see scenario-based naming patterns that briefly describe the scenario that is tested. For example addNull or addMultiple or, a little more precisely, addTwoArticles. Following this kind of naming scheme, we immediately understand what scenario is tested. Although, we still have to look into the test code to see how it is tested and which aspects are verified.

The most precise (but also most unusual) naming pattern seems to be to take a full specification sentence as a test name. We briefly mentioned this pattern in the Anatomy of a Good Test post

@Test
public void calculates_subtotal_and_total_amount_if_two_items_with_different_prices_and_quantities_are_added()
    throws Exception {
    ...
}

Formulating it this way, our test method name tells us exactly what it is about. In this concrete example, the test name implicitly refers to the name of the class under test (ShoppingCart) which is indicated by the name of the test class (ShoppingCartTest) the test is located in. Together, all of its test methods describe how this class behaves (ShoppingCart calculates subtotal and total amount …). Sometimes, this reference is included more explicitly like in it_calculates_… or even ShoppingCart_calculates_…. In the JavaScript world test tools like Jasmine or Cypress are specially designed to use this kind of naming pattern.

describe('ShoppingCart', () => {
  it('calculates subtotal and total amount \
    if two items with different prices and quantities are added', () => {
    ...
  })
})

There’s also a more uncertain form of test names we regularly see, which uses the prefix should (e.g. should calculate …). For example, in Cypress assertions start with .should(...). Besides the fact that it sounds like we’re not sure if those assertions will succeed, we don’t see any value in using this form. As the prefix is used everywhere, it becomes just noise that nobody cares about.

With tools like Cucumber, we’re completely free to formulate our test scenarios as we see fit. Such tools completely separate the text of a scenario from the implementation executing test code. Such tools may help us express better our expectations at the cost of more effort required to write such working tests. Also, those tools enable us to write scenarios in a BDD way using given…when…then. We covered that in more detail in our Anatomy of a good test post.

Another aspect that should be considered when naming tests is the distinction between a technical and a functional formulation. We already mentioned in our previous posts, that we always prefer to focus on the functionality rather than on technical details. Let’s illustrate what we mean by taking a very simple example: calling isValid() on an address. For better illustration, we will choose the expressive specification form in our example, although we see this rather rarely in projects. The same distinction should be observable in other naming styles as well, though.

We start with a technical formulation, because that’s the form we often see in projects:

isValid_returns_false_when_called_on_an_address_with_missing_street()

The reason why we often see this kind of formulation probably is that it directly reflects the implementation of the test, so the separate steps that are executed to verify this behaviour:

  • create an address instance that contains no street
  • call isValid() on that address
  • assert that result is false

In this test name, we refer to a lot of technical details

  • the method name isValid
  • the fact that the method “returns” something
  • the fact that the method “is called”
  • the primitive value false (and implicit its type boolean) that is returned
  • and maybe some more

We can hide all this and focus on the functional aspect that is covered by this method by formulating it in another way:

an_address_with_missing_street_is_invalid()

We still test the same behaviour and the test name still describes it very clearly. But there are no technical details in the name. It’s purely the functional aspect that is described. We might say that’s also a question of preference which form to choose. But, in this case, it’s a little more. All those technical aspects might change over time. Imagine that you decide that the isValid method should no longer return a primitive boolean value but a result object instead. The functional test name can stay exactly the same in that case because the functional aspect did not change. It’s the technical implementation that changes and so the test that describes those technical details has to be renamed to reflect the new implementation. Now, one could say “it’s only renaming, that’s easily done”. Yes, that’s true, but no compiler would force us to change a test name. Chances are high that in those cases, we only adapt the test implementation and leave the method name as it is, which leads to misleading and simply wrong test names. And last but not least, the functional form is usually much shorter.

You might have noticed that we changed another detail in the specification test name: we switched from the usual camelCase to the snake_case format. Although it’s rather unusual in the Java world, in our eyes it improves the readability of those long method names a lot. Depending on the language and tools we use, we might even be able to use spaces within the test names like with Jasmine or Cypress in the JavaScript world, or with languages like Kotlin or Groovy. Other languages are less flexible when it comes to naming tests. The Go language, for example, expects the prefix Test for unit tests. Again, we have to respect the constraints of the tools or languages we use and find valid options to formulate our tests (names) as expressively as possible.

The main reason why the test name in the previous example is getting very long is that the test verifies several details. Looking at the full test implementation in the Anatomy of a Good Test post we can see multiple asserts

assertEquals(2, shoppingCart.items().size());

assertEquals(1, shoppingCart.items().get(0).quantity());
assertEquals(BigDecimal.valueOf(9.95), shoppingCart.items().get(0).amount());

assertEquals(3, shoppingCart.items().get(1).quantity());
assertEquals(BigDecimal.valueOf(22.50), shoppingCart.items().get(1).amount());

assertEquals(BigDecimal.valueOf(32.45), shoppingCart.subtotalAmount());
assertEquals(BigDecimal.valueOf(3.5), shoppingCart.shippingAmount());
assertEquals(BigDecimal.valueOf(35.95), shoppingCart.totalAmount());

There are details verified by this test that are not even included in the proposed name like the number of items and their quantity. The more details a test covers, the harder it becomes to include all those in an extensive specification test name. So for more integrated component or end-2-end tests, we will usually have to fall back to the scenario-based naming pattern.

For other types of tests, for example, performance tests, scenario-based names are often used as well. Sometimes, expressive naming does not seem to be so important because the number of tests is manageable (for example, some projects have only one performance test, which simulates a normal day for the entire platform). Nevertheless, at least initially, it is worthwhile for all test types to think about this carefully.

Failure Messages

We also want to have a brief look at the messages we get if a test fails. Although it is not really a naming topic, it supplements the name of the test and addresses the problem of “what to do when a specific test becomes red”. For very focused tests, that only verify a single aspect and with expressive names, we can probably ignore those messages. The name of the failing test already tells us what’s going on. But for tests with a broader focus, that verify (assert) several things the failure messages have to tell us which of those did not match the expectation.

In most test frameworks, the default failure message of an assertion at least says what was expected and was actually found. For example, the default failure message of the JUnit 5 assertEquals assertion looks as follows:

org.opentest4j.AssertionFailedError:
Expected :2
Actual   :0

<Stack trace>

I got this from our addTwoArticles test. Knowing that, could you tell what exactly was going wrong? Obviously not. Neither the test nor the failure message gives us a clear hint. We have to jump into the test implementation, to the line we find somewhere in the stack trace below the failure message, to see what exactly was asserted.

assertEquals(2, shoppingCart.items().size());

By adding a short description of the assertion we can improve the failure message a lot.

assertEquals(2, shoppingCart.items().size(), "Number of items in ShoppingCart");
org.opentest4j.AssertionFailedError: Number of items in ShoppingCart ==>
Expected :2
Actual   :0

Now, we immediately see what’s the problem and can directly jump to our ShoppingCart implementation to analyze and fix it.

Depending on the test framework and maybe additional libraries we use we might get better or worse default failure messages. Not only for the failure message, it’s a good idea to see a test fail while we’re implementing it. In this case, we can verify if the resulting failure message along with the name of the test tells us what the problem is. If not, we should think about improving the test to get a more precise failure message.

Always keep in mind: the test that we currently write might fail in a couple of months, maybe right before an important release, and we might be the ones that have to fix it.

Summary

Organizing all the tests for our systems is a difficult problem. Such an organization needs to help us understand what tests we have, and where they are. Naming the tests is probably even more difficult. Names should be expressive enough to tell us what functionalities our system already provides. They also need to provide us with as much information as possible when they turn red. The same goes for failure messages. In addition, we need to keep all of the above consistent and cohesive, also with the production code. All that requires a lot of attention, but can greatly help us work with the code.

Many thanks to Joachim Praetorius for his feedback and suggestions to improve this post.