Sequential Probability Ratio Test — Wald and mixture variants¶
Module: shadow.statistical.sprt
Classes: SPRTDetector, MSPRTDetector, MSPRTtDetector, MultiSPRT
What they compute¶
Three sequential detectors for streaming behavioral scores:
| Detector | Tests | When to use |
|---|---|---|
SPRTDetector |
H0: μ=μ0 vs H1: μ=μ0+δ (point alternative) | Pre-specified effect size; want fastest stopping |
MSPRTDetector |
H0: μ=μ0 vs H1: μ≠μ0 (Gaussian prior over δ, known σ) | Continuous monitoring; can't commit to a δ |
MSPRTtDetector |
Same as mSPRT but with running sample variance | Unknown σ; can't run a long warmup |
All three accept a score per-update and return an SPRTState with
the current decision ("continue" / "h0" / "h1") and the
log-likelihood ratio.
Guarantees¶
Wald SPRT¶
For known μ0 and σ, with boundaries log A = log(β/(1−α)) and
log B = log((1−β)/α):
P(reject H0 | H0) ≤ α and P(accept H0 | H1, true effect = δ) ≤ β
Decisions are absorbing: once a boundary is crossed, the test stops. Continuing to accumulate after a decision invalidates the (α, β) bounds.
Mixture SPRT (Robbins 1970)¶
With a Gaussian prior δ ~ N(0, τ²σ²) and known σ²:
Λ_n = sqrt(σ²/(σ² + nτ²σ²)) · exp(n²(x̄−μ0)²τ²/(2(σ² + nτ²σ²)))
Λ_n is a non-negative martingale under H0, so by Ville's inequality:
P(sup_{n ≥ 1} Λ_n ≥ 1/α) ≤ α
This holds simultaneously over all n — no multiple-testing penalty for peeking. The standard choice for production A/B testing (Johari, Pekelis & Walsh 2017).
t-mixture mSPRT (variance-adaptive)¶
Uses Welford-updated running sample variance s²_n in the mSPRT
formula. The bound is asymptotic, not exact: for finite warmup
the running variance estimator breaks the strict martingale
property. Use when σ is unknown and a long warmup isn't available;
otherwise prefer MSPRTDetector with warmup ≥ 100.
Algorithm (mSPRT)¶
- Estimate (μ̂0, σ̂²) from the first
warmupobservations. - After each post-warmup observation x_t:
- Update running sum and mean x̄_t.
- Compute log Λ_t via the closed form above.
- If log Λ_t ≥ log(1/α), reject H0 and freeze the decision.
References¶
- Wald, A. (1945). "Sequential Tests of Statistical Hypotheses." Annals of Mathematical Statistics 16.
- Robbins, H. (1970). "Statistical methods related to the law of the iterated logarithm." Ann. Math. Statist. 41(5).
- Johari, Pekelis & Walsh (2017). "Always Valid Inference: Bringing Sequential Analysis to A/B Testing." KDD 2017.
- Lai, T. L. & Xing, H. (2010). "Sequential Analysis: Some Classical Problems and New Challenges." Statistica Sinica.
Caveats — plug-in σ̂¶
Both Wald SPRT and mSPRT have exact error bounds only when σ is known. Shadow estimates σ̂ from the warmup buffer, so:
- With warmup ≥ 100 and post-warmup streams ≤ a few hundred observations, empirical Type-I rate is within ~2× nominal α.
- With warmup = 20, Type-I can be substantially inflated.
- For accurate Type-I control in production, use warmup ≥ 100.
The empirical validation suite (@pytest.mark.slow) verifies
Type-I, power, and always-valid bounds at the asymptotic regime
where these guarantees hold tightly.
For exact always-valid inference under unknown σ, the literature
(Lai & Xing 2010; Howard et al. 2021 nonparametric supermartingales)
gives more sophisticated tests. They are not currently implemented;
MSPRTtDetector is the closest practical approximation Shadow ships.