Sample PR comment¶
What Shadow's default shadow diagnose-pr flow lands on a real pull request. Verdict-first; metrics second; fix command at the bottom. Reviewers don't need to learn Shadow vocabulary to act on it.
The block between the horizontal rules below is the literal markdown the shadow-diagnose-pr GitHub Action posts.
Shadow verdict: STOP¶
This PR violates a critical policy and must not merge as-is.
This PR changes agent behavior on 3 / 3 production-like traces.
:warning: Low statistical power — fewer than 30 traces in the sample. Treat the verdict as advisory; widen
--max-tracesfor more confidence.:information_source: Synthetic mock backend. Cause magnitudes below come from a deterministic per-delta heuristic (not real LLM behavior). Re-run with
--backend livefor a grounded estimate.
Dominant cause¶
prompt.system appears to be the main cause.
- Axis:
trajectory - ATE:
+0.60 - 95% CI:
[0.60, 0.60] - E-value:
6.7
Why it matters¶
3 traces violate the confirm-before-refund policy rule.
Suggested fix¶
Review the prompt change at prompts/candidate.md — restore the instruction or constraint it removed.
Verify the fix¶
Generated by Shadow.
Legacy nine-axis-only flow¶
The pre-3.0 shadow report --format github-pr shape still ships behind shadow init --github-action --legacy-diff for repos that prefer it. It posts the nine-axis severity table directly with no causal-attribution layer. See shadow diff reference and Hierarchical diff for the output shape; for a runnable example, the refund-causal-diagnosis demo writes a baseline.json and a candidate.json side-by-side under both flows.