punit examples is a companion repository containing a fully worked example application that demonstrates the punit framework across all its major capabilities.
Two example domains
Shopping Basket (empirical approach)
An LLM translates natural language instructions (e.g. “Add 2 apples”) into structured JSON actions for a shopping basket API. Because LLM behaviour is inherently probabilistic — it may hallucinate fields, produce malformed JSON, or invent invalid actions — success rates are established empirically through measurement experiments rather than predetermined.
Payment Gateway (SLA-driven approach)
A simulated external payment service with contractual SLA obligations (99.99% availability, sub-second latency). Tests verify compliance against these mandated thresholds with full provenance tracking.
What’s demonstrated
- Explore experiments — compare LLM models (GPT-4o, Claude Sonnet, Claude Haiku) and temperature settings with small sample sizes to identify the best configuration
- Measure experiments — establish statistical baselines with 1000+ samples, producing YAML spec files with confidence intervals and latency distributions
- Optimize experiments — iteratively refine temperature and prompt parameters to maximise success rates
- Threshold approaches — sample-size-first, confidence-first, and threshold-first configuration patterns
- Budget management — time budgets, token budgets, and budget exhaustion behaviours (FAIL vs. EVALUATE_PARTIAL)
- Rate pacing — requests-per-second and requests-per-minute limits for API cost control
- Covariate tracking — temporal, configuration, and infrastructure covariates with baseline matching
- Verification vs. smoke intent — evidential compliance testing vs. lightweight sentinel checks
- Exception handling modes — FAIL_SAMPLE (continue) vs. ABORT_TEST (stop immediately)
Running the examples
The project includes a mock LLM that requires no API keys, simulating realistic behaviour including temperature-sensitive reliability. Real LLM providers (OpenAI, Anthropic) can be enabled via environment variables.
Browse the source and instructions on GitHub.
