A TUTORIAL // PROPERTY-BASED TESTING

Hypothesis,
and how HypoFuzz turns up the volume.

A whirlwind tour of the decorators that let you describe what your code should do — and the fuzzer that takes those same decorators and spends hours hunting for cases where it doesn't.

DRAW → INVOKE → ASSERT → SHRINK

Every property test is this loop, repeated. The first three happen on every example. The fourth fires only when something fails — and it's where most of the magic lives.

§ 01The decorator

An ordinary unit test asserts a fact about one specific input. A property test asserts a fact about all inputs of a certain shape, and lets Hypothesis manufacture the inputs for you.

from hypothesis import given
from hypothesis import strategies as st

@given(st.lists(st.integers()))
def test_sort_idempotent(xs):
    assert sorted(sorted(xs)) == sorted(xs)

Two pieces. The @given decorator promotes a regular function into a property test. Its argument is a strategy — a recipe for generating values. st.lists(st.integers()) reads as lists of integers, of varying lengths and varying contents. Hypothesis runs the function many times — about 100 by default — and each run walks through the loop above: DRAW a list, INVOKE the function, ASSERT the property, and (if anything misbehaves) SHRINK.

§ 02Strategies are a small language

Strategies compose. You build complex generators from simple ones the way you build complex types from simple ones. Try drawing some yourself — pick a strategy, then click DRAW a few times and watch how the values cluster around boundaries:

click DRAW to see a value

You'll notice the values are weird. Empty strings. Zero. The smallest negative integer your platform supports. Surrogate-pair regions of Unicode. That's not random — Hypothesis biases hard toward edge cases on purpose. Real bugs cluster at the boundaries, and the strategies know it.

§ 03The same property, three rounds

This is where the tutorial earns its keep. Here's a function that looks right and almost is:

def dedupe_sort(xs):
    # sort and remove duplicates
    return sorted(set(xs))

And here's the property we'll check: the result has the same length as the input. The function violates this property — but only on inputs that contain duplicates. Watch what each tier of testing does with it.

ROUND 01

Hand-picked examples

ROUND 02

Hypothesis @given

ROUND 03

HypoFuzz, hours later

Round one passes — every hand-picked input happened to have unique values, so the bug was invisible. Round two finds it: random generation stumbles onto a duplicate within the first hundred draws, and shrinking carves the failure down to a two-element list. Round three is where the conversation shifts. With enough time and coverage feedback, HypoFuzz can stretch into corners the random generator visits only by accident — rare branches, type combinations, latent assumptions about input distribution.

§ 04Shrinking — the part that earns the name

When Hypothesis catches a failure, it doesn't just hand you the offending input. It hands you the simplest input it can find that still fails. The replay below shows what that looks like — an arbitrary 8-element list, whittled down step by step to the irreducible minimum:

Shrinking is what makes property tests debuggable. A 47-element list with the failure buried inside is operationally useless; a two-element list [0, 0] tells you instantly what the problem is. The shrinker is generic — Hypothesis knows how to reduce any value any strategy can produce, because the strategies and the shrinker are two faces of the same machinery.

§ 05HypoFuzz — same decorators, longer view

Here is the move. HypoFuzz reads your existing @given tests — the same files, the same decorators, no rewrites — and runs them continuously, with a coverage feedback loop bolted onto the side of the standard Hypothesis loop.

Hypothesis (in CI)

DRAW → INVOKE

ASSERT → SHRINK

~100 examples per test. Finishes in seconds. Stops.

HypoFuzz (in the background)

DRAW → INVOKE

ASSERT → SHRINK

+ COVERAGE FEEDBACK

Millions of examples. Steers toward unexplored branches. Runs as long as you let it.

The coverage loop is the trick. After every example, HypoFuzz observes which lines and branches the function exercised. Inputs that hit something new get prioritized — saved as seeds, mutated, returned to. Inputs that retread known ground fall down the queue. Over hours and days, the coverage map fills in:

Coverage exploration

elapsed: 00:00 covered: 0%

Each cell stands for a code path. Amber lights up quickly — these are the easy paths a normal CI run would already reach. Red is the deeper material: rare branches, error handlers, weird state combinations. A regular pytest invocation rarely lights up the red cells. That's where the unexplored bugs live.

The cost of all this? Almost nothing on your end. The decorators, the strategies, the properties — all written. HypoFuzz inherits them. The only new artifact is a dashboard you check on Monday morning to see what shook loose over the weekend.

§ 06What you actually got

If you walked the loop with me, you wrote three things and got four. The three: a strategy (what shape are my inputs?), a property (what stays true?), and the function under test. In return you have example tests, plus a generator, plus a shrinker, plus a fuzzing target — all from the same six lines.

That asymmetry — small surface, large yield — is the entire pitch. @given is the smallest interface that lets a tool do this much work for you. HypoFuzz is what happens when you let that work run for longer than your CI budget.

Hypothesis,and how HypoFuzz turns up the volume.