We never have enough time for testing, so let’s just write the test first.

—Kent Beck


Test-First is a Built-In Quality practice derived from Extreme Programing (XP) that recommends building tests before writing code to improve delivery by focusing on the intended results.

Agile testing differs from the big-bang, test-at-the-end approach of traditional development. Instead, code is developed and tested in small increments, often with the development of the test itself ahead of writing the code. This way, tests help elaborate and better define the intended system behavior even before the system is coded. Quality is built in from the beginning. This just-in-time approach to elaboration of the proposed system behavior also mitigates the need for the long, detailed requirement specifications and sign-offs that are often used in traditional software development to control quality. Even better, these tests, unlike traditional requirements, are automated wherever possible. And when they’re not, they still provide a definitive statement of what the system actually does, rather than a statement of early thoughts about what it was supposed to do.

This article describes a comprehensive approach to Agile testing, and testing first, based on Brian Marick’s four-quadrant Agile Testing Matrix. Quadrants one and two define the tests that support the development of the system and its intended behavior; they are described in this article. Quadrants three and four are more fully described in the “Release on Demand” and “Nonfunctional Requirements” (NFRs) articles, respectively.

Note: Of course, not all these tests can be written or executed first, but we believe that test-first captures the proper sentiment!


Not a one-time or end-game event, Agile testing is a continuous process. It’s integral to Lean and built-in quality. In other words, Agile Teams and Agile Release Trains (ARTs) can’t go fast without endemic quality, and they can’t achieve that quality without continuous testing and, wherever possible, testing first.

The Agile Testing Matrix

XP proponent and Agile Manifesto coauthor Brian Marick helped pioneer Agile testing by describing a matrix that guides the reasoning behind such tests. This matrix was further developed in Agile Testing and extended for the scaled Agile paradigm in Agile Software Requirements [1, 2].

Figure 1 describes and extends the original matrix with guidance on what to test and when.

Figure 1. Agile Testing Matrix
Figure 1. Agile Testing Matrix

The horizontal axis of the matrix contains business- or technology-facing tests. Business-facing tests are understandable by the user and are described in business language. Technology-facing tests are written in the language of the developer and are used to evaluate whether the code delivers the behaviors the developer intended.

The vertical axis contains tests supporting development (evaluating internal code) or critiquing the solution (evaluating the system against the user’s requirements).

Classification into these four quadrants (Q1 – Q4) enables a comprehensive testing strategy that helps ensure quality:

  • Q1 – Contains unit and component tests. To confirm that the system works as intended, developers write these automated tests to run before and after code changes.
  • Q2 – Contains functional tests (user acceptance tests) for Stories, Features, and Capabilities, to validate that they work the way the Product Owner (or Customer/user) intended. Feature- and capability-level acceptance tests validate the aggregate behavior of many user stories. Teams automate these tests whenever possible and use manual tests only when there is no other choice.
  • Q3 – Contains system-level acceptance tests to validate that the behavior of the whole system meets usability and functionality requirements, including scenarios that may be encountered in actual use. These may include:
    • Exploratory testing
    • User acceptance testing
    • Scenario-based testing
    • Final usability testing

Because they involve users and testers engaged in actual or simulated deployment and scenarios, these tests are often done manually. They’re used for final system validation and are required before delivery to the end user.

  • Q4 – Contains system qualities testing to verify the system meets its nonfunctional requirements, as exhibited in part by Enabler tests. They are typically supported by a suite of automated testing tools, such as load and performance, designed specifically for this purpose. Since any system changes can violate conformance with NFRs, they must be run continuously, or at least whenever it’s practical.

Quadrants one and two test the functionality of the system. When the tests are developed before the code is committed, it’s described as test-first. These methods include both test-driven development (TDD) and acceptance test–driven development (ATDD). Both use test automation to support continuous integration, team velocity, and development effectiveness. Quadrants one and two are described below. Quadrants three and four are described in the companion articles, Release on Demand and Nonfunctional Requirements, respectively.

Test-Driven (Test-First) Development

Beck and others have defined a set of XP practices described under the umbrella label of test-driven development. As described below, the focus is on writing the unit test before the code [3]:

  • Write the test first. This ensures that the developer understands the required behavior of the new code.
  • Run the test and watch it fail. Because there is no code to be tested yet, this may seem silly initially, but it accomplishes two useful objectives: It tests the test itself and any harnesses that hold the test in place. It also illustrates how the system will fail if the code is incorrect.
  • Write the minimum amount of code needed to pass the test. If the test fails, rework the code or the test as necessary until a module is created that routinely passes the test.

In XP, this practice was designed primarily to operate in the context of unit tests, which are developer-written tests (also code) that evaluate the classes and methods that are used. These are a form of ‘white-box testing,’ because they test the internals of the system and the various code paths that may be executed. Pair work is used extensively as well; when two sets of eyes have seen the code and the tests, it’s probable that the module is high quality. Even when not pairing, the test is the first ‘other set of eyes’ that sees the code. Developers note that they often refactor the code in order to pass the test as easily and elegantly as possible. This is quality at the source—one of the main reasons that SAFe relies on TDD.

Unit Tests

Most TDD is done in the context of unit testing, which prevents quality assurance (QA) and test personnel from spending most of their time finding and reporting on code-level bugs. This allows additional focus on system-level testing challenges, where more complex behaviors are found based on the interactions between unit code modules. To support this, the open source community has built unit testing frameworks to cover most languages, including Java, C, C#, C++, XML, HTTP, and Python. Now there are unit-testing frameworks for the languages and coding constructs a developer is most likely to encounter. These frameworks provide a harness for the development and maintenance of unit tests and for automatically executing them against the system under development.

Because unit tests are written before or concurrently with the code, and their frameworks include test execution automation, unit testing can be accomplished within the Iteration. Moreover, the unit test frameworks hold and manage the accumulated unit tests. As a result, regression testing automation for unit tests is largely free for the team. Unit testing is a cornerstone of software agility, and any investment made in comprehensive unit testing will be well rewarded with quality and productivity.

Component Tests

Similarly, teams use component tests to evaluate larger-scale components of the system. Many of these are present in various architectural layers, where they provide services needed by features or other components. Testing tools and practices for implementing component tests vary. For example, unit testing frameworks can hold arbitrarily complex tests written in the framework language (Java, C, C#, and so on). As a result, many teams use their unit testing frameworks to build component tests. They may not even think of them as separate functions, as it’s simply part of their testing strategy. In other cases, developers may incorporate other testing tools or write fully customized tests in any language or environment that is most productive for them to test larger system behaviors. These tests are automated as well, where they serve as a primary defense against unanticipated consequences of refactoring and new code.

Acceptance Test–Driven Development

Quadrant two of the Agile Testing Matrix shows that test-first applies as well to testing stories, features, and capabilities as it does to unit testing. After all, the goal is to have the whole system, and not simply the code, work as intended. This is called acceptance test–driven development. And whether it’s adopted formally or informally, many teams simply find it more efficient to write the acceptance test first, before developing the code. Pugh notes that the emphasis is more on expressing requirements in unambiguous terms than on focusing on the test per se [4]. He further observes that there are three alternative labels to this detailing process: ATDD, specification by example, and behavior-driven design. There are some slight differences in these versions, but they all emphasize understanding requirements before implementation. In particular, specification by example suggests that Product Owners should provide examples, as they often do not write the acceptance tests themselves.

Whether it’s viewed as a form of requirements expression or as a test, the understanding is that the result is the same. Acceptance tests serve to record the decisions made in the conversation between the team and the Product Owner, so that the team understands the specifics of the intended behavior the story represents. (See the 3Cs in the “Writing Good Stories” section of Story, referring to card, conversation, and confirmation.)

Functional Tests

To ensure that each new user story implemented delivers the intended behavior, story acceptance tests are functional. The testing is performed during an iteration. If all the new stories work as intended, then it’s likely that each new increment of software will ultimately satisfy the needs of the users.

Feature and capability acceptance testing is performed during the course of a Program Increment. The tools used are generally the same, but these tests operate at the next level of abstraction, typically showing how some stories work together to deliver a larger value to the user. Of course, there can easily be multiple feature acceptance tests associated with a more complex feature. And the same goes for stories. This way, there is strong verification that the system works as intended at the feature, capability, and story levels.

The following are characteristics of functional tests. They are:

  • Written in the language of the business
  • Developed in a conversation between developers, testers, and the Product Owner
  • ‘Black-box tested’ to verify only the outputs of the system and to meet conditions of satisfaction, without concern for how the result is achieved
  • Run during the course of the iteration in which the story is implemented

Although everyone can write tests, the Product Owner as Business Owner/customer proxy is generally responsible for the efficacy of the tests. If a story does not pass its test, the teams get no credit for that story, and it’s carried over into the next iteration, when the code and/or the test are reworked until the story passes.

Until they pass one or more acceptance tests, features, capabilities, and stories cannot be considered done. Stories realize the intended features and capabilities. And there can be more than one test associated with a particular feature, capability, or story.

Automating Acceptance Testing

Because acceptance tests run at a level above the code, there are a variety of approaches to executing them, including handling them as manual tests. However, manual tests pile up very quickly (The faster you go, the faster they grow, the slower you go.) Eventually, the number of manual tests required to run a regression slows down the team and causes delays in value delivery.

Teams know that to avoid this they have to automate most of their acceptance tests by using a variety of tools. This includes the target programming language (Perl, Groovy, Java) or natural language as supported by specific testing frameworks, such as Robot Framework or Cucumber. Or perhaps they use table formats as supported by the Framework for Integrated Testing (FIT). The preferred approach is to take a high level of abstraction that works against the business logic of the application. This prevents the presentation layer or other implementation details from hampering the test.

Acceptance Test Template/Checklist

An ATDD checklist can help the team consider a simple list of things to do, review, and discuss each time a new story appears. Agile Software Requirements provides an example of a story acceptance-testing checklist [2].

Learn More

[1] Crispin, Lisa and Janet Gregory. Agile Testing: A Practical Guide for Testers and Agile Teams. Addison-Wesley, 2009.

[2] Leffingwell, Dean. Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Addison-Wesley, 2011.

[3] Beck, Kent. Test-Driven Development. Addison-Wesley, 2003.

[4] Pugh, Ken. Lean-Agile Acceptance Test-Driven Development: Better Software Through Collaboration. Addison-Wesley, 2011.

Last update: 27 October, 2017