Learn how Stryker4s uses mutation switching to improve performance.
We are very happy with Stryker's new friends! One of those new friends is Stryker4s(cala). Scala developers can now use mutation testing to improve their tests! Creating a mutation testing framework for Scala comes with many challenges. One of those challenges is the Scala compiler itself. We all know it's not the fastest in its kind. One of the main goals of Stryker is to be fast. This means we need to come up with an intelligent way to introduce the mutants into the source code.
One way of introducing mutants to a codebase is by mutating one statement, compile the code, run tests, gather the results and repeat. This seems like a logical choice because it mimics the way a developer would go about it.
Let's look at an example.
As you can see, there are three possible mutants:
If we apply the mutation one by one, we would need to compile the code base three times. If we assume the compile time of this program is 10 seconds, we already have 30 seconds of compile time for one full mutation run. This would quickly get out of hand when the codebase is bigger and generates more mutants.
As you might know, Scala gets compiled to Java bytecode. This gives us an alternative way to introduce mutations in a codebase. We would be able to mutate the bytecode directly, eliminating the need for recompiling.
The main challenge with this approach is that Scala doesn't guarantee the bytecode output for each version of the compiler (or even JDK version). Even the jump from Scala 2.12 to 2.13 produces different bytecode. This would make manipulating bytecode complicated, unpredictable and hard to maintain.
Furthermore, if you mutate the bytecode, it can be difficult to reproduce the exact Scala code that you changed.
Details, like the exact location, are not represented in bytecode.
Scala makes this extra challenging, as 1
.scala file can easily result in 100
.class files in bytecode.
For performance reasons, mutating bytecode might sound like a fast solution, but you would still need to load (or hot reload) the mutated class files for each mutant.
There should be a better solution out there, right?
Mutation switching to the rescue! So how is mutation switching both faster and more reliable than compiling each mutation or mutating bytecode? The steps are quite similar to "Compiling each mutant", but with some big differences:
- All mutants are identified for the whole codebase.
- All mutants are applied to the codebase at the same time using a Scala Pattern match.
- All mutants are tested one by one, with only one mutant active at a time, using an environment variable.
Step 2 is where the magic happens. Let's take a look at the same code example as used previously, right after the mutations are applied.
All possible mutations are implemented in the pattern match. An identifier is used to turn on/off, or switch, specific mutations. The default case will be used when none of the mutants are active. Now the code base only needs to be compiled once. The extra time compilation takes because of its increased size is negligible compared to the overhead of compiling each mutant. For example, if the compilation time for this code base would be 15 seconds we will still gain 15 seconds compared to compiling each mutation.
We gain performance without losing flexibility. It's a win-win scenario.
Mutation switching sure is great, but let's take a look at a more complex example.
With this code base
filterNot could be mutated to there counterparts.
This would give us the following code base if we implement the pattern match at the direct position.
Because we wrapped the functions right on the spot we produced code that doesn't even compile!
To make the code compile we need to take a closer look at the abstract syntax tree.
We are searching for the parent statement in this abstract syntax tree, which is
numbers in our case.
If we implement mutation switching using the parent statement, we can generate the following code.
This enables us to get clean, readable pattern matches and avoid compilation errors.
With mutation switching in place, the road is clear for even bigger performance improvements. Right now, we're not
keeping the testing process alive. We simply run
sbt test with the correct mutant switched on.
Keeping the test process alive and rerunning the tests after switching mutants is where we can really put the pedal to the metal!
With the combination of mutation switching and traversing to the parent statements Stryker4s is able to apply mutations to the codebase in a clean and understandable fashion and keep the chances of compilation errors to a minimum. We hope this blog gave some insight on mutation switching works and how Stryker4s uses this to its advantage. Happy mutating!