In the previous
section, we developed the score function iteratively at the REPL and saw it
work correctly with a few example inputs. It doesn’t take much commitment to
quality to want to do more validation than that! Let’s begin by teasing apart
some of the things that people mean when they say “testing.”
Testing includes
the following:
·
Thinking through whether the code is correct
·
Stepping through the code in a development
environment where you can
see everything that is happening
see everything that is happening
·
Crafting inputs to cover the various code paths
·
Crafting outputs to match the crafted inputs
·
Running the code with various inputs
·
Validating the results for correctness
·
Automating the validation of results
·
Organizing tests so that they can be
automatically run for regression purposes in the future
This is hardly an
exhaustive list, but it suffices to make the point of this section. In short,
testing is often complex, but it can be simple. Traditional unit-testing
approaches complect many of the testing tasks listed earlier. For example,
input, output, execution, and validation tend to be woven together inside
individual test methods. On the other hand, the minimal REPLtesting we did
before simply isn’t enough. Can we get the benefit of some of the previous
ideas of testing, without the complexity of unit testing? Let’s try.
Crafting Inputs
We have already
seen the score function work for a few handcrafted inputs. How many inputs do
we need to convince ourselves the function is correct? In a perfect world, we
would just test all the inputs, but that is almost always computationally
infeasible. But we are lucky in that the problem of scoring the game is
essentially the same for variants with a different number of colors or a
different number of pegs. Given that, we actually can generate all possible
inputs for a small version of the game.
The branch of
math that deals with the different ways of forming patterns is called
enumerative combinatorics. It turns out that the Clojure library math.combinatorics
has the functions we need to generate all possible inputs.
Add the following
form under the :dependencies key in your project.clj file, if it isnot already
present:
[org.clojure/math.combinatorics "0.0.1"]
The selections
function takes two arguments (a collection and a size), and it returns every
structure of that size made up of elements from that collection.
Try it for a tiny
version of Clojurebreaker with only three colors and two columns:
(require '[clojure.math.combinatorics :as comb])
(comb/selections [:r :g :b] 2)
-> ((:r :r) (:r :g) (:r :b)
(:g :r) (:g :g) (:g :b)
(:b :r) (:b :g) (:b :b))
So, selections
can give us a possible secret or a possible guess. What about generating inputs
to the score function? Well, that is just selecting two selections from the
selections:
(-> (comb/selections [:r :g :b] 2)
(comb/selections 2))
-> (81 pairs of game positions omitted for brevity)
(comb/selections 2))
-> (81 pairs of game positions omitted for brevity)
Let’s put that
into a named function:
clojurebreaker/src/clojurebreaker/game.clj
(defn generate-turn-inputs
"Generate all possible turn inputs for a clojurebreaker game
with colors and n
columns"
[colors n]
(-> (comb/selections colors n)
(comb/selections 2)))
[colors n]
(-> (comb/selections colors n)
(comb/selections 2)))
All right, inputs
generated. We are going to skip thinking about outputs (for reasons that will
become obvious in a moment) and turn our attention to running the scorer with
our generated inputs.
Running a Test
We are going to
write a function that takes a sequence of inputs and reports a sequence of
inputs and the result of calling score. We don’t want to commit (yet) to how
the results of this test run will be validated. Maybe a human will read it.
Maybe a validator program will process the results. Either way, a good
representation of each result might be a map with the keys secret, guess, and
score.
All this function
needs to do is call score and build the collection of responses:
clojurebreaker/src/clojurebreaker/game.clj
(defn score-inputs
"Given a sequence of turn inputs, return a lazy sequence of
maps with :secret, :guess, and :score."
[inputs]
(map
(fn [[secret guess]]
{:secret (seq secret)
:guess (seq guess)
:score (score secret guess)})
inputs))
Try it at the
REPL:
(->> (generate-turn-inputs [:r :g :b] 2)
(score-inputs))
-> ({:secret (:r :r), :guess (:r :r),
:score {:exact 2, :unordered 0}}
{:secret (:r :r), :guess (:r :g),
:score {:exact 1, :unordered 0}}
;; remainder omitted for brevity
If a human is
going to be reading the test report, you might decide to format a text table
instead, using score print-table. While we are at it, let’s generate a bigger
game (four colors by four columns) and print the table to a file:
(use 'clojure.pprint)
(require '[clojure.java.io :as io])
(with-open [w (io/writer "scoring-table")]
(binding [*out* w]
(print-table (->> (generate-turn-inputs [:r :g :b
:y] 4)
(score-inputs)))))
-> nil
(score-inputs)))))
-> nil
If you look at
the scoring-table file, you should see 65,536 different secret/guess
combinations and their scores.
Validating Outputs
At this point, it
is obvious why we skipped crafting the outputs. The program has done that for us.
We just have to decide how much effort to spend validating them. Here are some
approaches we might take:
·
Have a human code-breaker expert read the entire
output table for a small variant of the game. This has the advantage of being
exhaustive but might miss logic errors that show up only in a larger game.
·
Pick a selection of results at random from a
larger game and have a human expert verify that.
Because the
validation step is separated from generating inputs and running the program, we
can design and write the various steps independently, possibly at separate
times. Moreover, the validator knows nothing about how the inputs were
generated. With unit tests, the inputs and outputs come from the same
programmer’s brain at about the same time. If that programmer is systematically
mistaken about something, the tests simply encode mistakes as truth. This is
not possible when the outputs to validate are chosen exhaustively or randomly.
We will return to
programmatic validation later, but first let’s turn to regression testing.
Regression Testing
How would you
like to have a regression suite that is more thorough than the validation
effort you have made? No problem.
·
Write a program whose results should not change.
·
Run the program once, saving the results to a
(well-named!) file.
·
Run the program again every time the program
changes, comparing with the saved file.
If anything is different, the program is
broken.
The nice thing
about this regression approach is that it works even if you never did any
validation of results. Of course, you should still do validation, because it
will help you narrow down where a problem happened. (With no validation, the
regression error might just be telling you that the old code was broken and the
new code fixed it.)
How hard is it write a program that should produce exactly the same output?
How hard is it write a program that should produce exactly the same output?
Call only pure
functions from the program, which is exactly what our scoreinputs function
does.
Wiring this kind
of regression test into a continuous integration build is not difficult. If you
do it, think about contributing it to whatever testing framework you use.
Now we have
partially answered the question, “How do I make sure that my code is correct?”
In summary:
·
Build with small, composable pieces (most should
be pure functions).
·
Test forms from the inside out at the REPL.
·
When writing test code, keep input generation,
execution, and output validation as separate steps.
This last idea is
so important that it deserves some library support. So, before we move on, we
are going to introduce test.generative, a library that aspires to bring
simplicity to testing.
No comments:
Post a Comment