Thursday, April 26, 2012

Testing the Scorer a Clojurebreaker Game


In the previous section, we developed the score function iteratively at the REPL and saw it work correctly with a few example inputs. It doesn’t take much commitment to quality to want to do more validation than that! Let’s begin by teasing apart some of the things that people mean when they say “testing.”
Testing includes the following:
·        Thinking through whether the code is correct
·        Stepping through the code in a development environment where you can
see everything that is happening
·        Crafting inputs to cover the various code paths
·        Crafting outputs to match the crafted inputs
·        Running the code with various inputs
·        Validating the results for correctness
·        Automating the validation of results
·        Organizing tests so that they can be automatically run for regression purposes in the future
This is hardly an exhaustive list, but it suffices to make the point of this section. In short, testing is often complex, but it can be simple. Traditional unit-testing approaches complect many of the testing tasks listed earlier. For example, input, output, execution, and validation tend to be woven together inside individual test methods. On the other hand, the minimal REPLtesting we did before simply isn’t enough. Can we get the benefit of some of the previous ideas of testing, without the complexity of unit testing? Let’s try.
 
Crafting Inputs
We have already seen the score function work for a few handcrafted inputs. How many inputs do we need to convince ourselves the function is correct? In a perfect world, we would just test all the inputs, but that is almost always computationally infeasible. But we are lucky in that the problem of scoring the game is essentially the same for variants with a different number of colors or a different number of pegs. Given that, we actually can generate all possible inputs for a small version of the game.
The branch of math that deals with the different ways of forming patterns is called enumerative combinatorics. It turns out that the Clojure library math.combinatorics has the functions we need to generate all possible inputs.
Add the following form under the :dependencies key in your project.clj file, if it isnot already present:

[org.clojure/math.combinatorics "0.0.1"]
The selections function takes two arguments (a collection and a size), and it returns every structure of that size made up of elements from that collection.
Try it for a tiny version of Clojurebreaker with only three colors and two columns:

(require '[clojure.math.combinatorics :as comb])
(comb/selections [:r :g :b] 2)
-> ((:r :r) (:r :g) (:r :b)
(:g :r) (:g :g) (:g :b)
(:b :r) (:b :g) (:b :b))
So, selections can give us a possible secret or a possible guess. What about generating inputs to the score function? Well, that is just selecting two selections from the selections:
 
(-> (comb/selections [:r :g :b] 2)
(comb/selections 2))
-> (81 pairs of game positions omitted for brevity)
Let’s put that into a named function:

clojurebreaker/src/clojurebreaker/game.clj
(defn generate-turn-inputs
"Generate all possible turn inputs for a clojurebreaker game
with colors and n columns"
[colors n]
(-> (comb/selections colors n)
(comb/selections 2)))
All right, inputs generated. We are going to skip thinking about outputs (for reasons that will become obvious in a moment) and turn our attention to running the scorer with our generated inputs.
 
Running a Test
We are going to write a function that takes a sequence of inputs and reports a sequence of inputs and the result of calling score. We don’t want to commit (yet) to how the results of this test run will be validated. Maybe a human will read it. Maybe a validator program will process the results. Either way, a good representation of each result might be a map with the keys secret, guess, and score.
All this function needs to do is call score and build the collection of responses:

clojurebreaker/src/clojurebreaker/game.clj
(
defn score-inputs
"Given a sequence of turn inputs, return a lazy sequence of
maps with :secret, :guess, and :score."
[inputs]
(map
(fn [[secret guess]]
{:secret (seq secret)
:guess (seq guess)
:score (score secret guess)})
inputs))
Try it at the REPL:

(->> (generate-turn-inputs [:r :g :b] 2)
(score-inputs))
-> ({:secret (:r :r), :guess (:r :r),
:score {:exact 2, :unordered 0}}
{:secret (:r :r), :guess (:r :g),
:score {:exact 1, :unordered 0}}
;; remainder omitted for brevity
If a human is going to be reading the test report, you might decide to format a text table instead, using score print-table. While we are at it, let’s generate a bigger game (four colors by four columns) and print the table to a file:

(use 'clojure.pprint)
(require '[clojure.java.io :as io])
(with-open [w (io/writer "scoring-table")]
(binding [*out* w]
(print-table (->> (generate-turn-inputs [:r :g :b :y] 4)
(score-inputs)))))
-> nil
If you look at the scoring-table file, you should see 65,536 different secret/guess combinations and their scores.
 
Validating Outputs
At this point, it is obvious why we skipped crafting the outputs. The program has done that for us. We just have to decide how much effort to spend validating them. Here are some approaches we might take:
·        Have a human code-breaker expert read the entire output table for a small variant of the game. This has the advantage of being exhaustive but might miss logic errors that show up only in a larger game.
·        Pick a selection of results at random from a larger game and have a human expert verify that.
 
Because the validation step is separated from generating inputs and running the program, we can design and write the various steps independently, possibly at separate times. Moreover, the validator knows nothing about how the inputs were generated. With unit tests, the inputs and outputs come from the same programmer’s brain at about the same time. If that programmer is systematically mistaken about something, the tests simply encode mistakes as truth. This is not possible when the outputs to validate are chosen exhaustively or randomly.
We will return to programmatic validation later, but first let’s turn to regression testing.
 
Regression Testing
How would you like to have a regression suite that is more thorough than the validation effort you have made? No problem.
·        Write a program whose results should not change.
·        Run the program once, saving the results to a (well-named!) file.
·        Run the program again every time the program changes, comparing with the saved file.

 If anything is different, the program is broken.
The nice thing about this regression approach is that it works even if you never did any validation of results. Of course, you should still do validation, because it will help you narrow down where a problem happened. (With no validation, the regression error might just be telling you that the old code was broken and the new code fixed it.)
How hard is it write a program that should produce exactly the same output?
Call only pure functions from the program, which is exactly what our scoreinputs function does.
Wiring this kind of regression test into a continuous integration build is not difficult. If you do it, think about contributing it to whatever testing framework you use.
Now we have partially answered the question, “How do I make sure that my code is correct?” In summary:
·        Build with small, composable pieces (most should be pure functions).
·        Test forms from the inside out at the REPL.
·        When writing test code, keep input generation, execution, and output validation as separate steps.
This last idea is so important that it deserves some library support. So, before we move on, we are going to introduce test.generative, a library that aspires to bring simplicity to testing.
 

No comments:

Post a Comment