In praise of property-based testing (2019)

Vinnl · on April 3, 2021

I've been aware of property-based testing for a number of years now, but never had a good opportunity to give it a try. Then the past year I had a piece of serialisation/de-serialisation code, which was the perfect opportunity for a rather simple property-based test. That gave me the hang of it, and found two (minor, but still) bugs.

Then recently I had a fairly larger, more error prone piece of work that lend itself very well to property-based testing, and it's been a godsend. It helped me discover a number of bugs, this time with the risk of causing privilege escalation. And since the proptests started succeeding reliably, I've been very confident that a rather complex piece of code now actually does what it's supposed to.

If you're working in JavaScript, I can recommend fast-check [1].

Another interesting approach, that I haven't yet tried, is Quickstrom [2], basically Puppeteer for proptests. It opens a webpage in a browser, performs some random interactions (pressing buttons, entering data, etc.), and then verifies that properties you specified still hold.

[1] https://dubzzz.github.io/fast-check.github.com/

[2] https://quickstrom.io/

marcosdumay · on April 3, 2021

I was hoping from some insight, but as happens every time I decide to try property testing, the examples on your links are trivial to either enforce the property at type or construction, or to separate it from the rest of the code so that their implementation becomes at least as obvious and error-prone as the tests.

Property based testing looks like a really good idea, but I never gained anything from applying it on practice. There are probably some application domains that they are good for, but I still didn't find them.

felixhuttmann · on April 3, 2021

Property-based testing is good when there is a property to assert that emerges in a non-trivial manner from the code under test. If you do not have such a problem, property-based testing provides no value over a simple, example-based test.

In the case of the parsing code from the article, the emerging property is that a serialization followed by a deserialization should always yield the original result.

In the case of the 'binary or not' case from the article, the non-trivial, emergent property is that the function never fails with an exception.

Most modern software development is plumbing, stitching together platforms, libraries and frameworks, and there are rarely non-trivial, emergent properties where property-based testing is useful.

vbrandl · on April 3, 2021

here's an example I had a few month ago: Password Requirements, merging of requirements and generating passwords that fulfill the requirements:

- merging/combining of requirements is a monoid operation with an "empty requirement" (e.g. one that accepts every password) as the neutral element. Finding this requirement, I could write properties for the monoid properties (a + b = b+ a, a + 0 = a, and so on) - when generating a password from a requirement, the same requirement must accept the generated password (`requirement.accept(requirement.generate()`)

In general: if you find mathematical rules to your code (commutativity, associativity,. ..), these make a great starting point for property tests.

pfdietz · on April 3, 2021

It's great for compiler testing.

The first approach is to generate random correct programs and see if they do the same thing with different compilers and/or different optimization flags.

The second approach is to take a correct program, note that some parts of it are not executed on some particular input, then mutate that part and run on the same input. The output should be the same.

Also, any program (not just compilers) should satisfy the properties that no assertions fail and that no sanitizer failures occur.

Jtsummers · on April 3, 2021

If you're open to Erlang/Elixir I liked [0]. Obviously a bigger commitment than some blog posts and tutorials, but worth it in my opinion. It more clearly (to me) presents the case for property-based testing and goes through more complex examples than most blog posts which help to illustrate the utility more effectively.

[0] https://pragprog.com/titles/fhproper/property-based-testing-...

ghayes · on April 3, 2021

Obvious code should have obvious properties, in which case you should be able to gain value from property tests. If you make every little function high-quality and well tested, you should end up with better code overall. The only functions, IMO, that don’t deserve properties as much as functions which are exclusively the composition of other functions without any other logic. In that case, the function is often, by definition, its result.

Rendello · on April 3, 2021

I had a similar experience after watching a Computerphile video with John Hughes, one of the authors of the original propert-based testing tool, QuickCheck [1]. Running hundreds of thousands of unique tests and finding the simplified cases was mind blowing to watch.

I loved the video but my first experience with this method was with Python's Hypotheis [2], I haven't used it a ton but it's great for finding parsing errors. In the words of Python core developer Raymond Hettinger:

> It's not quite fuzzing, but it hits it with the kind of test cases that a good QA engineer would typically come up with, and it does it in automated fashion.

1. https://www.youtube.com/watch?v=AfaNEebCDos

2. https://hypothesis.readthedocs.io/en/latest/

3. https://youtu.be/ARKbfWk4Xyw?t=319

yakshaving_jgt · on April 3, 2021

I checked out Quickstrom, and I thought "Wow, this looks amazing!"

Then I noticed who wrote it and thought "Ah. Well that makes sense."

jrockway · on April 3, 2021

Neither the bad example nor good example of the "add contributors up to the limit" test check the state change where users stop being accepted. You really want: add user 1 -> ok, add user 2 -> ok, add user 3 -> ok, add user 4 -> fail.

Doesn't matter what methodology you use if your test doesn't test what you want it to.

I also fear that in general, people try to make their tests too clever. I don't want the test to generate random data that's different every time, unless it's a fuzz test. I want to see the inputs and the expected outputs as clearly as possible, so that when the test fails I'm not guessing whether or not there is some bug 30 helpers deep, or if I simply added a bug with my change. The real world is complicated and people are going to enter a lot of invalid data into your application, so it is crucial that you test that. But you also need the basics to work before you worry about the advanced cases.

diurnalist · on April 3, 2021

Thank you for sharing this! I've been intrigued by the idea of property tests for a while but in my mind it's relegated to the "mad science" corner of tools I would use, partly because most examples or cases made for it that I've seen have used examples and use cases that didn't translate easily to the day-to-day systems (html web servers mostly) I work on. I like that this post uses Django as the motivating example.

The "shrinking" capability of the test library highlighted is brilliant.

I'm inspired to think of how to start to leverage something like this on some upcoming work.

pfdietz · on April 3, 2021

Hypothesis does shrinking in an interesting way.

The first idea you think of for shrinking it to take the randomly generated values and try to make them smaller. But the generator may be imposing constraints on the values, and if you lose those constraints, the input becomes invalid.

An example of this problem is generating valid C programs to test C compilers (by compiling with different compilers or different optimization settings and seeing if the behavior differs). The constraint there is that the C program not show undefined or implementation-specific behavior. Naively shrinking a C program will not in general preserve this property.

Hypothesis takes a different approach to shrinking: it records the sequence of random values used by the generator, and replays the generator on mutations of that sequence that do not increase its length. The only way this can fail is if the generator runs out of values on the mutated sequence. Otherwise, the new output will always satisfy the constraints imposed by the generator. Hypothesis does various clever things to speed this up.

adkadskhj · on April 3, 2021

I'm quite new to property testing, first introduced recently via a Rust property testing framework proptest[0]. So far i've had the feeling that property testing frameworks need to include a way to rationalize complexity, as their assumptions can easily fall short as you illustrated.

Eg the simplest example might be an application where you input an integer, but a smaller int actually drives up the complexity. This idea gets more complex when we consider a list of ints, where a larger list and larger numbers are simpler. Etcetc.

It would be neat if a proptest framework supported (and maybe they do, again, i'm a novice here) a way to rank complexity. Eg a simple function which gives two input failures and you can choose which is the simpler.

Another really neat way to do that might be to actually compute the path complexity as the program runs based on the count of CPU instructions or something. This wouldn't always be a parallel, but would often be the right default i imagine.

Either way in my limited proptest experience, i've found it a bit difficult to design the tests to test what you actually want, but very helpful once you establish that.

[0]: https://crates.io/crates/proptest

dfee · on April 3, 2021

There are a lot of words without a strong definition, but it seems to summarize to:

“Property testing is where you vary the parameters, perhaps within certain well defined constraints, and through randomization ensure certain coverage.”

I think there are other ways of handling this, beyond the way described, but I’m not sure if that’s because I just read a field guide to Hypothesis (again, without a firm definition), or if other approaches - such as test parameterization at key boundaries is more effective.

My hunch is that property based testing provides less value on unit tests - where it could be used to reduce coverage - and more value in integrating those units together - where the number of permutations can grow very quickly.

BerislavLopac · on April 3, 2021

> Property testing is where you vary the parameters, perhaps within certain well defined constraints, and through randomization ensure certain coverage.

No - this would be akin to saying "unit testing is when you use assert statements".

Varying the parameters -- and using far more than simple randomisation -- is just a technique; the purpose of property testing is that the behaviour of the unit (or system) under test fulfills certain properties, in ma mathematical sense of the word. For a simple example: if you're writing a function that is adding two integer numbers, you want to test that it satisfies all the properties of addition [0]. So your tests need to validate against a number of combinations of input values, many of which are not random (e.g. to test for the "successor" property one of them needs to be 1).

[0] https://en.wikipedia.org/wiki/Addition#Properties

Jtsummers · on April 3, 2021

PBT is nice in place of (some) unit tests in that you can describe immediately the properties you expect without needing to produce a collection of examples (or write a custom generator that produces a limited set of examples, at which point you're halfway to PBT anyways).

It's also helpful to use it in a piece wise fashion if you're doing TDD. An illustrative example (though perhaps not stellar as it is a synthetic, non-real-world example) uses the diamond kata, TDD, and PBT together [0]. None of the tests on their own fully specify the system, but in total they do.

If you're doing TDD (or attempting to) I think this is an interesting case. Many TDD methods have you start off with an example case like (to stick with this kata, and using Pythonesque pseudocode because I'm still not awake this Saturday morning):

  diamond-kata-test-a:
    assert(diamond('A') == 'A')

Great, so now someone makes that absolute simplest solution:

  diamond(c):
    return 'A'

Now repeat with a second test case:

  diamond-kata-test-b:
    assert(diamond('B') == ' A \nB B\n A ')

And the function is duly complicated:

  diamond(c):
    switch c:
      case 'A': return 'A'
      case 'B': return ' A \nB B\n A '
      default: return 'blah' // or error, doesn't matter it's not tested

But not actually generalized to reflect the intent of the system. By focusing on properties, I've found, the progression of the UUT is a bit better/more natural.

Another interesting thing to do with PBT is model-based testing [1]. The useful thing here is that sometimes the errors are triggered by a peculiar, though plausible, sequence of commands to your system. We've all worked with that one guy who somehow manages to find exactly the right sequence that triggers weird edge cases and errors, but unless we're him having a system which will generate arbitrary execution traces for you. (I actually used FsCheck for this last year in trying to sell PBT to my colleagues and was able to identify where a known issue originated as well as several other problems that hadn't been found by users or testers yet.)

In the end, when these failures are found you can always turn them into distinct unit tests in order to preserve them and prevent regressions. The two modes of testing fit well together.

[0] http://christophethibaut.com/programming/2020/03/18/Diamond-...

[1] https://fscheck.github.io/FsCheck/StatefulTesting.html

garethrowlands · on April 3, 2021

The best thing, in my view, about property testing is that it allows you to state the properties you're testing for, as opposed to some examples of them.

For example if I'm making a sqrt function, then I want sqrt(x) * sqrt(x) == x, for any x>=0.

Human beings are good at inferring the general rules from examples but sometimes it's easier to understand if you just say what the general rules is.

Also, unit tests that include example data can sometimes be dominated by that data. Removing the particular examples can sometimes remove a lot of distraction and verbosity.

It's not just that property tests find more bugs.

artemonster · on April 3, 2021

I wonder when the software industry finally "reinvents" and starts using random-constraint generated stimulus from hardware verification methodologies :)

jenkings · on April 4, 2021

Is there an example of how many function executions property based testing causes?

Jtsummers · on April 5, 2021

That's typically configurable with the test system.