30 June 2020

Advanced Software Engineering - Functional Test-Driven Development with Haskell


The following report was originally submitted as coursework for the Advanced Software Engineering module of my Bachelor's Degree in Computer Science.

The report is in three parts:

  • An overview and evaluation of Test-Driven Development
  • A practical case study, including examples of both Unit Testing and Property-based Testing
  • A reflection on using Test-Driven Development, Functional Programming, and testing tools



Test-Driven Development

What is Test-Driven Development?

Test Driven Development is an approach to software development in which automated Unit Tests are written before code is, and then code is written to pass the new Unit Tests (Janzen, D. and Saiedian, H., 2005). Once this has been done, the code is refactored to remove duplication, and all tests are re-run to check that no regressions have occurred (Beck, K., 2003).

Test Driven Development thus gives programming the following structure (Beck, K., 2003):

  1. Write a small test that does not pass
  2. Make the test pass
  3. Refactor the code, eliminating any duplication in the process

Critical Evaluation of Test-Driven Development

Test Driven Development is likely to be more useful the larger the program, and particularly when the program is maintained by many different developers (or even organisations). This is supported by the findings of the meta-analysis conducted by Rafique, Y. and Mišić, V.B. (2012), which found that Test-Driven Development produces more significant improvements in quality for larger tasks.

However, Rafique, Y. and Mišić, V.B. (2012) also finds the general positive effect on quality to be small, and the effect on productivity to be minimal at best. The findings of Siniaalto, M. and Abrahamsson, P. (2007) go further than this, and suggest that program code produced by Test Driven Development is not only not necessarily clearly better than iterative test-last development, but may actually degrade over time.

One issue from my own experience in industry, is that if not appropriately named, structured, and commented, Unit Tests may be removed by future developers if they fail. They are particularly likely to do this if they are not able to quickly discern the purpose of the test, and if they see no obvious symptoms. This defeats the objective of the original test writing. As such this is a process which requires not just discipline, but that it becomes a habit. The usage of this method in an organisation therefore requires that it becomes thoroughly inculcated in the organisation and its culture.

Another possible issue with with Test-Driven Development, based on my experience with Unit Tests in industry, is that for a code base which contains many impure elements, a large amount of time is spent mocking the behaviour of impure elements, such as Data Access Objects, so that test data can be generated for the pure functions to be tested. With certain types of application, such as a CRUD Web Service, this is going to be a larger issue than with others, such as implementations of sorting and searching algorithms.

Despite these issues, Test-Driven Development has been shown to reduce defects: Williams, L., Maximilien, E.M. and Vouk, M. (2003) found 40% fewer defects in a TDD-produced product than in a previous product used as a baseline. This is supported by the findings in, Maximilien, E.M. and Williams, L. (2003), which details how IBM reduced their defect rate by 50% by using Test Driven Development as opposed to an ad-hoc unit testing approach. However, whilst George, B. and Williams, L. (2004) also found TDD produced code that was of a higher quality (it passed 18% more functional black-box test cases), it is admitted that the code also took 16% longer to write. George, B. and Williams, L. (2004) also suggests that "waterfall-like approaches do not encourage adequate testing", testing not being core to the process, but rather being an afterthought. From this George, B. and Williams, L. (2004) suggests that increased usage of Test Driven Development would increase usage of Unit Testing generally.

Bhat, T. and Nagappan, N. (2006) also found "a significant increase in quality of code" when using Test-Driven Development, but the projects took 15% more time to write the tests. Bhat, T. and Nagappan, N., (2006) also remarks that the unit tests produced are useful as "auto documentation".

Overall, the main benefit of Test-Driven Development is that the code produced has fewer defects, but this is at the cost of time spent writing tests, and the overall effect on productivity may not be significant.




Using Test-Driven Development

To try out Test Driven Development, I worked on developing a program intended to find the shortest path between two nodes in a network.

The network must consist of Nodes, which must have directional weighted Edges to other Nodes. To find the shortest path, a Genetic Algorithm was to be used.

This has the following process:

  1. Generate pseudo-random Paths, made of selections of Edges from the Network
  2. Score the paths
  3. If not at the final generation, generate new Paths as follows, then return to step 2:
  4. Take the top ranking Paths, and to fill the whole of a new set with duplicates of these
  5. Add mutations to these by randomly adding and removing edges
  6. Mate these by taking pairs of Paths, slicing the Paths at random positions and fusing the paths back together
  7. Return the best scoring path

HUnit

The first function to implement was one which returns all the Edges of the Network. The Network is composed of a list of Nodes, each of which has a list of Edges. The function getAllEdges should return a list of all of the Edges of a provided Network. I started by writing a first Unit Test to test that an empty list will return an empty list, and then implementing a skeleton implementation to make this pass.

First Implementation:

getAllEdges :: Network -> [Edge]
getAllEdges [] = []

Once this passed, I created another test, getAllEdgesReturnsListOfEdges, to check whether getAllEdges did actually return all the edges of an example network. To pass this test I added the following line to the function:

getAllEdges (n:ns) = (edges n) ++ (getAllEdges ns)

At this point, it was considered whether any refactoring was appropriate, and came to the conclusion that none was.

As it is possible for there to be many direct links between two nodes, and no point in including the heavier links, a function had to be written to remove all but the lightest edge.

This function, removeHeavierDuplicates, should return an empty list when given an empty list, as the previous function, so a test was written, it failed, the code modified, and the test rerun to pass.

If no parallel links (i.e. pairs of edges with the same start and end nodes) exist, the function should return the same list it was passed. Another Unit Test was written to test this, and the function modified to always return the same list it was given. The tests were then all run and passed.

The function was not yet finished, as it also has to actually remove the heaviest of duplicate edges. So a Unit Test was written to check that this happens.

The function was implemented, and then the tests re-run. They passed, and so the code was examined for opportunities to refactor it. The test to check if two edges have the same starts and ends was then refactored out into a separate function. First, a test was written to check that this new function, sameEnds, returns false if the two edges do not have the same starts and ends. The function was made to return false, this passed the test. Another test was written to check that it would return true when the edges do have the same ends, and the function was modified to pass this test using the code extricated from the removeHeavierDuplicates function. sameEnds was then used to replace this code in the removeHeavierDuplicates function, and all the tests re-run and passed.


QuickCheck

In order to test the function removeHeavierDuplicates with a wider range of test data, the QuickCheck testing library was used.

As the function takes a list of Edges as a parameter, Edge needed an instance declaration of Arbitrary. This allows QuickCheck to generate arbitrary Edges.

instance Arbitrary Edge where
    arbitrary = do
        n <- arbitrary
        Positive s <- arbitrary
        Positive e <- arbitrary
        Positive w <- arbitrary
        return (Edge n s e w)

Above: instance Arbitrary Edge declaration, n s e w being name start end weight

One of the properties of removeHeavierDuplicates is that it never returns a longer list than it was passed as a parameter. A property test was written to check this, and this failed. However, upon closer inspection, it immediately became apparent that this was because the >= operator had been used instead of <= in the test itself. This was amended, and the test rerun and passed.

prop_removeHeavierDuplicatesDoesNotReturnLongerList :: [Edge] -> Bool
prop_removeHeavierDuplicatesDoesNotReturnLongerList edges = length (removeHeavierDuplicates edges) <= length edges

Above: Property test checking removeHeavierDuplicates never returns a longer list

Another of the properties of removeHeavierDuplicates is that no two Edges in the returned list should have the same start and end values, so a property test was written to ensure this.

The next function to implement was chooseRandomPath, which should take a random number generator of type StdGen and a list of Edges, and should return a list of Edges containing a subset of the initial list. For generators with different seed values, different selections of edges should be generated.

Writing property tests for this function proved to be more complicated. One initial idea was to simply call the function twice with generators which have seeds which are guaranteed to be different, and test if the outputs differ. The problem with this is that the outputs may well be the same for different seeds, especially with shorter lists of Edges (and must be the same if the list is empty).

A simpler property to test is that chooseRandomPath does not return a greater number of edges than it is given. A test was written to test this, and the function was made to pass this. As this requires a StdGen as a parameter, StdGen also needed an instance declaration of Arbitrary. This was done in the same way as it was for Edge. However, this test could be passed by just making the function return its input.

chooseRandomPath :: StdGen -> [Edge] -> Path
chooseRandomPath _ a = a

Above: chooseRandomPath initial implemention

One way to guarantee the outputs are different is to provide two StdGens which are known to return different values when they are run. To do this, a new type StdGenDiffPair, isomorphic with (StdGen, StdGen), was declared; the suchThat predicate was used in the Arbitrary instance declaration of this new type to ensure that the first Bool values generated by each of the StdGens are different. This was done by using the next function to look ahead. However, if the list is empty, the return values must still be equal, so a generator had to be defined to generate a list which is guaranteed not to be empty.

suchThat (arbitrary :: Gen (StdGen, StdGen)) (
        \(g0,g1) -> 
          let (v0,_) = random g0 :: (Bool, StdGen)
              (v1,_) = random g1 :: (Bool, StdGen)
          in v0 /= v1)

Above: The generator used by StdGenDiffPair

As such, EdgeListNonEmpty was declared and set up in much the same way as StdGenDiffPair. These types could then be used by a new property test to test that so long as the StdGens return different values, and the list of edges is not empty, the chooseRandomPaths function returns different paths. This new property test failed on running, so the function was modified so that this would pass.

prop_chooseRandomPathReturnsDifferentPathsForDifferentRngs :: StdGenDiffPair -> EdgeListNonEmpty -> Bool
prop_chooseRandomPathReturnsDifferentPathsForDifferentRngs (StdGenDiffPair (g0, g1)) (EdgeListNonEmpty edges) = (chooseRandomPath g0 edges /= chooseRandomPath g1 edges)

Above: The new property test

As the returned list of edges represents a path, a type synonym Path = [Edge] was added at this point, and used in place of [Edge] as the return value of chooseRandomPath. After this change, the tests were all re-run.

In order to create a list of candidate paths, a function had to be created to return a list of random paths, given a StdGen, a list of edges, and an Int representing the number of paths to be created. The returned list must have a length the same as the number of paths to be created, so this made a logical first property test. In the case that the number of paths to be created is zero or less, an empty list should be returned. This test was created, failed, and the function was implemented to pass this test by returning a list of paths of the correct length, but simply using the full list of edges as each path, instead of a random subset of these. This test initially used guards to check if num < 0, and to return True if this is the case. The problem with this is that the test itself became too large, so this was replaced with another newtype and instance declaration, this time for IntAtLeastOne, which allows for the generation of ints which are guaranteed to be at least one.

chooseRandomPaths :: StdGen -> [Edge] -> Int -> [Path]
chooseRandomPaths g edges num 
       | num <= 0 = []
       | otherwise = edges : (chooseRandomPaths g edges (num-1))

Above: Unfinished chooseRandomPaths

The sets of paths should differ for generators which are guaranteed to give different values, so a test prop_chooseRandomPathsDiffersWithG was created to test this. The previously created StdGenDiffPair was used again here, as was EdgeListNonEmpty. Test failed, code changed to use chooseRandomPath function, and the test passed.

       | otherwise = (chooseRandomPath g edges) : (chooseRandomPaths g0 edges (num-1))
       where (_,g0) = next g

Above: changes to chooseRandomPaths to make it work

The paths within the set should not all be identical, so another test was written to check that at least one of the paths is different from the first in the list. This failed, and the function was modified so that the paths differ. A problem encountered, however, was that the arbitrary StdGens used in by the tests may happen to return the same value for the first two random numbers. If this happens, and the number of paths to generate is only two, then the two paths will be the same if the number of edges is only one. The reason for this is that there is only one edge, and this either is or is not in a path, and if the first two values returned by the generator are the same then whether this is present will be the same in the first two paths. In addition, if only one path is generated, there cannot be variation between paths. To resolve this, the test was improved by adding one to the IntAtLeastOne, therefore making it at least two, and another newtype & instance were added to create a new generator which creates StdGens which are guaranteed to return different boolean values the first two times they are used.

The test generators at this point had become to take up quite a lot of the tests file, so at this point they were refactored out to a separate file.

The next step was to create a function to return the best paths from a set. This should take a list of paths and the number to return. The length of the returned list should be the lesser of either the length of the original list or the number to return, so a test was written to test this. The bestPaths function was implemented to pass this test.

This report follows the development of this software no further, but the next step to take will be to write another test, to test whether the bestPaths function actually returns the best paths. To implement the function to do this, a scorePath function will also need developing to rank the paths, taking into account four factors: whether they start at the start node, whether they end at the end node, whether they connect the two, and their weight.

After this, the Test-Driven Development cycle of "write test, pass test, and refactor" will be followed to add functions which implement the rest of the algorithm.




Reflection on using Test-Driven Development, Functional Programming, and testing tools

Functional Programming

I have found Functional Programming to produce very clean and concise code, particularly when combined with Test-Driven Development. A particular advantage of Haskell is that the type system makes it very clear what a function can affect. This is because the type signature indicates whether a function is pure, and if it is then it will not mutate the values passed to it. This is an advantage over languages like C++, where many pointers may refer to the same object, and this object may be changed (possibly inadvertantly) from many places in the program. This makes the behaviour of functions harder to predict, as it is harder to tell what side effects they have.

A strong disadvantage I found with Functional Programming was the initial difficulty in not just the slightly unusual syntax of Haskell (as compared to C-like languages), but also the challenge of learning to "Think Functionally". The difference in syntax has a silver lining however, in that I found it to removed the temptation to write C/Java -like code in Haskell. This is an advantage that Haskell has over Scala, as Scala allows for many things to be done in the "FP way" or the "OOP way". This can result in functions which should be pure being written impurely. However, Scala and other hybrid languages are less of a jump to move to from other paradigms, and Scala interoperates seamlessly with Java.

I enjoy working with functional languages, and find Functional Programming an interesting way to work. As such I do intend to use it in future personal projects. Furthermore, the impression left by the the median salaries listed at It Jobs Watch (n.d.), a selection of which are shown in the table below, strongly inclines me towards working with functional (or at least hybrid) languages professionally. Haskell has the highest median salary, and Scala, a functional-heavy hybrid, also does well. It is worth also noting that functional-style code is possible in other languages in the list - Python includes lambdas, as do the more recent versions of JavaScript, C++, and Java.

LanguageMedian Salary
Haskell£85,000
Scala£72,500
Java£65,000
Python£62,500
C++£55,000
C£52,500
JavaScript£52,500

Source: itjobswatch.co.uk

This is a one dimensional view, and does not take into account cost of living in areas where these jobs tend to be, ease with which the jobs can be found, or the other skills developers are likely to have (e.g. Python Software Engineers who happen to be proficient in Machine Learning techniques).

The higher price point of functional programmers would also leave me disinclined to choose such a language to build a product in if I were running a software company, unless there was an overwhelming advantage to choosing such a language. However, I would still encourage the usage of functional-style code within hybrid languages. Even if a language's type system does not strongly encourage writing functions without side effects, it is still possible to write functions without side effects in languages other than Haskell (Martin, R.C. (2010), a book mainly targeted at Java developers, states as a rule that functions should "Have No Side Effects"). Properly considering what is to be written before starting writing it ("measure twice cut once") is also encouraged by Functional Programming. These lessons learned can be transferred to work done in other languages.

Test-Driven Development

I found the biggest advantage of Test-Driven Development to be that it forces thorough consideration of the code to be written. In addition, (well-chosen) test names force clarity in what is to be achieved by changes to a codebase, and so naturally lend themselves to good git commit messages.

A disadvantage I found with Test-Driven Development is that it requires a large proportion of time spent to be spent on writing test cases. It was particularly challenging to write tests for the functions which used random number generators, and test data generators had to be written for the random number generators which made guarantees about the state of the random number generators used to test the function.

In the future, I would use Test-Driven Development in my own personal projects, especially in backend components. Even in frontend applications, such as webapps written in Angular, it is possible to write unit tests for certain components. REST APIs can also be automatically tested using tools such as Gatling (n.d.). Even if Unit and Property testing of code is not able to be done in the same way as can be done with purely functional code, it is still possible to express programmatically what should happen to the system given certain inputs, and to adopt a test-first approach. An issue with using Test-Driven Development in commercial projects is that it requires a strong commitment from all developers, and may be perceived to be an obstruction in the short-term. However, the case for it is particularly strong for applications which have high reliability and stability requirements. I would be strongly tempted to not follow Test-Driven Development practices when rapidly prototyping. The problem this could cause is that it can then be tempting to reuse code from a prototype, the result of this being that the project has already failed to follow Test-Driven Development practices.

Tools Used

HUnit

HUnit is very quick and simple to get started with, and is not much of a jump from simply using the REPL to check that the correct value is returned for a given input. This makes it easy to introduce to beginners. But it relies on the test-writer properly considering edge cases. Moving forwards, I will continue to use HUnit and Unit Testing when writing code.

QuickCheck

QuickCheck requires more careful consideration of the tests to be performed than HUnit. Simple tests are not difficult to write, but custom data types also require custom data generators. It took a while to figure out how to do this, but once I had a couple of working examples, adding more types became trivial. I expect that as a code base expands, the number of opportunities for reusing the generators will increase. I would use QuickCheck and Property Testing in future projects, particularly if a function has a high arity and therefore many possible inputs. Property testing is another thing that I will carry over to work I do using other languages, as I was not previously aware of it, but now realise both how useful it is, and also the fact that other languages also have similar libraries (Java has JUnit-Quickcheck).

Other Tools

In the course of this case study, I also used GHCI, Cabal, and GVim. I did not find myself missing the full IDEs such as IntelliJ, which I would normally use with other languages. The reason for this is that Haskell is very concise, and so I did not feel the need for the code completion and generation tools which I would normally use when working in languages like Java. I also did not at any point wish I had a debugger, and cannot see a debugger being likely to be very useful when using Test-Driven Development in the future. I instead used tests to check things worked, and the REPL to experiment to try to get them to work.




Libraries Used

Test.QuickCheck

Test.HUnit

System.Random

Data.Sort

References

Beck, K., 2003. Test-driven development: by example. Addison-Wesley Professional.

Bhat, T. and Nagappan, N., 2006, September. Evaluating the efficacy of test-driven development: industrial case studies. In Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering (pp. 356-363).

Gatling. (n.d.). Gatling Open-Source Load Testing – For DevOps and CI/CD. [online] Available at: https://gatling.io/ .

George, B. and Williams, L., 2004. A structured experiment of test-driven development. Information and software Technology, 46(5), pp.337-342.

IT Jobs Watch. (n.d.). IT Jobs Watch | Tracking the IT Job Market. [online] Available at: https://www.itjobswatch.co.uk .

Janzen, D. and Saiedian, H., 2005. Test-driven development concepts, taxonomy, and future direction. Computer, 38(9), pp.43-50.

Martin, R.C. (2010). Clean code a handbook of agile software craftmanship. Upper Saddle River [Etc.] Prentice Hall.

Maximilien, E.M. and Williams, L., 2003, May. Assessing test-driven development at IBM. In 25th International Conference on Software Engineering, 2003. Proceedings. (pp. 564-569). IEEE.

Rafique, Y. and Mišić, V.B., 2012. The effects of test-driven development on external quality and productivity: A meta-analysis. IEEE Transactions on Software Engineering, 39(6), pp.835-856.

Siniaalto, M. and Abrahamsson, P., 2007, October. Does test-driven development improve the program code? Alarming results from a comparative case study. In IFIP Central and East European Conference on Software Engineering Techniques (pp. 143-156). Springer, Berlin, Heidelberg.

Williams, L., Maximilien, E.M. and Vouk, M., 2003, November. Test-driven development as a defect-reduction practice. In 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003. (pp. 34-45). IEEE.

Tags: Uni