Testing non-deterministic systems with confidence

Bringing statistical rigour to software testing — so engineering teams can satisfy the regulatory demands of AI performance measurement and regression testing.

Get punit Our projects

The Problem

AI systems, probabilistic models, and non-deterministic processes are increasingly subject to regulatory scrutiny. Organisations must demonstrate measurable, reproducible evidence that these systems perform within acceptable bounds — not just once, but continuously.

Traditional unit testing assumes deterministic outcomes. In reality, that assumption never withstood scrutiny. But AI means we have no choice but to manage uncertainty professionally, and that means statistically.

Our Approach

Measure

Define statistical expectations for your system's behaviour. Assert against distributions, not exact values. punit gives you the vocabulary to express what "correct" means in a non-deterministic context.

Regress

Detect when your system drifts beyond acceptable bounds. Run repeatable hypothesis tests in CI/CD and catch performance degradation before it reaches production.

Trust

Produce auditable, structured evidence that your AI systems perform as expected. Give regulators, auditors, and risk committees the confidence they need.

punit

Probabilistic unit testing for Java

punit is a JUnit 5 extension that runs tests multiple times and applies statistical inference to determine whether a non-deterministic system is behaving acceptably. Explore configurations, measure empirical baselines, and run regression tests in CI/CD — with configurable confidence levels, latency percentile assertions, and auditable verdicts.

View on GitHub

Complementary Projects

punit examples

A complete example application demonstrating punit's capabilities — including an LLM-powered shopping basket tested with explore, measure, and optimize experiments, and a payment gateway verified against SLA thresholds.

View on GitHub

outcome

A Java framework that bridges deterministic application code with fallible, non-deterministic operations. Replaces try/catch with type-safe Outcome values, structured failure classification, policy-driven retries, and built-in observability.

View on GitHub

Insights

6 May 2026

Shewhart, Toyota, and the Probabilistic Turn

LLMs break software’s binary view of correctness. The discipline that replaces it already exists - it was built at Bell Labs, refined in Nagoya, and has been waiting for software to need it.

19 April 2026

Deming, Challenger, and the Watershed Moment of AI

A watershed-moment essay on why AI amplifies, rather than creates, software quality - and why the responsibility rests with those who own the system.

All signals