Production readiness — security review brief¶

A 30-minute read for security reviewers, CISOs, and procurement. If you're sitting in a one-pass meeting deciding whether to approve Shadow for production use, this is the document.

The fast path: Shadow runs entirely on your machine. No telemetry is enabled by default. No agent traces leave your infrastructure unless you explicitly export them. Everything below is the evidence supporting that one paragraph.

Data flow at a glance¶

flowchart LR
    subgraph YOUR_INFRA[Your infrastructure]
        AGENT[Your agent code]
        SHADOW[Shadow SDK / CLI]
        LOGS[(.agentlog files<br/>on disk)]
        REDACTOR[Redactor]
        DIFFER[9-axis differ +<br/>causal attribution]
        CERT[Behavior certificate<br/>signed via sigstore]
        AGENT -->|every chat call| SHADOW
        SHADOW -->|payloads| REDACTOR
        REDACTOR -->|swept| LOGS
        LOGS --> DIFFER
        DIFFER --> CERT
    end

    subgraph THIRD_PARTY[Third-party endpoints — only when YOU configure them]
        OPENAI[OpenAI API<br/>only on shadow_diff replay]
        SIGSTORE[Sigstore<br/>only on shadow certify --sign]
        GH[GitHub Actions<br/>only when you commit a workflow]
    end

    SHADOW -.optional `--backend live`.-> OPENAI
    CERT -.optional `--sign`.-> SIGSTORE
    SHADOW -.optional `shadow init --github-action`.-> GH

    style THIRD_PARTY stroke-dasharray: 5 5

Solid lines stay on your machine. Dashed lines are external network calls — every one is opt-in via a flag or an explicit config knob.

What stays on disk¶

Artifact	Where	Format	Sensitive?
`.agentlog` files	`.shadow/traces/` (or wherever you point)	JSONL, content-addressed sha256	Pre-redacted by `Redactor`; see below
`shadow.yaml`	repo root	YAML	Project config — baseline pin, paths, `--policy` defaults
`report.json`	`.shadow/diagnose-pr/report.json`	JSON	Diff summary; same content as the PR-comment markdown
`*.cert.json`	wherever `--out` points	JSON	Behavior certificate; see "Signing chain"
`index.sqlite`	`.shadow/index.sqlite`	SQLite	Local trace index; never transmitted

Nothing is uploaded. No telemetry endpoint, no metrics collector, no auto-update channel. The only place network traffic can originate from Shadow is when you explicitly opt into one of the dashed lines above.

Redaction — what the SDK sweeps before write¶

Every record written through shadow.sdk.Session runs through a Redactor first. The redactor walks the payload (dict / list / str), finds matches, and substitutes [REDACTED:<pattern_name>] — the sha256 content-id is computed after redaction so the on-disk hash reflects the redacted bytes only.

Default pattern set (shadow.redact.DEFAULT_PATTERNS):

Pattern	What it catches
`private_key`	PEM-armoured RSA / EC / ED25519 / OpenSSH / encrypted / PGP private keys
`jwt`	Three-segment base64url JWTs (header.payload.signature, ≥10/10/20 chars)
`anthropic_api_key`	`sk-ant-…` (matched first so it doesn't fall to the broader OpenAI pattern)
`openai_api_key`	`sk-…`, `sk-proj-…`, `sk-svcacct-…`, `sk-admin-…` (any ≥20 chars)
`aws_access_key_id`	`AKIA` / `ASIA` / `AIDA` / `AROA` + 16 uppercase alnum
`github_token`	`ghp` / `gho` / `ghu` / `ghs` / `ghr` + 36-251 alnum
`email`	RFC-5322-ish addresses
`phone`	E.164 `+<digits>` (10-15 digits)
`credit_card`	13-19 digits in contiguous, dash, or space layout — gated by Luhn

Add company-specific patterns by passing a custom Redactor to your Session:

from shadow.sdk import Session
from shadow.redact import Redactor, DEFAULT_PATTERNS, Pattern
import re

redactor = Redactor(patterns=DEFAULT_PATTERNS + (
    Pattern(name="acme_internal_token",
            regex=re.compile(r"acme-[a-z0-9]{32}"),
            replacement="[REDACTED:acme_internal_token]"),
))
with Session(output_path="trace.agentlog", redactor=redactor):
    ...

Defense in depth — `shadow scan`¶

The redactor is the first line; shadow scan is the second:

shadow scan baseline_traces/ candidate_traces/

Walks every committed .agentlog, runs the same pattern set in detect-only mode, exits non-zero on any match. Belongs in your CI before shadow gate-pr — so a Session that was misconfigured (forgot a custom pattern, used a stale default set) gets caught before traces are merged into the repo.

Add --patterns ci/extra-secrets.txt for project-specific patterns:

# ci/extra-secrets.txt — one rule per line
acme_internal_token=acme-[a-z0-9]{32}
session_cookie=session=[A-Za-z0-9_\\-]{40,}

Signing chain — `shadow certify`¶

For supply-chain-of-behavior auditability, Shadow can produce a content-addressed JSON certificate covering an entire release:

shadow certify trace.agentlog \
  --agent-id refund-bot --output release.cert.json

Certificate fields include the trace's content-id, the model name, sha256 of the system prompt, sha256 of every tool schema, optional sha256 of the policy file, and an optional baseline-vs-candidate nine-axis regression-suite rollup. The certificate is itself content-addressed: any tampering changes cert_id and is caught by shadow verify-cert.

Pass --sign to add a sigstore keyless signature (requires pip install 'shadow-diff[sign]' and a federated identity, e.g. a GitHub OIDC token in CI). shadow verify-cert --verify-signature --cert-identity <id> checks both content-addressing and signature against a specific signer identity.

The full feature page: docs/features/certificate.md.

Offline operation¶

Shadow has zero hard dependency on any external service. To verify:

# Air-gap by setting these env vars then running the full pipeline.
export NO_PROXY='*' HTTP_PROXY='' HTTPS_PROXY=''
unset OPENAI_API_KEY ANTHROPIC_API_KEY

# All of these run with no network:
shadow record -- python my_agent.py    # if your agent uses stub backends
shadow diff baseline.agentlog candidate.agentlog
shadow gate-pr ...
shadow inspect trace.agentlog
shadow scan path/to/traces/
shadow certify trace.agentlog --output cert.json   # without --sign

The only commands that ever touch the network:

Command + flag	Endpoint
`shadow diagnose-pr --backend live`	OpenAI API (your `OPENAI_API_KEY`)
`shadow record -- <cmd>` (when agent calls OpenAI / Anthropic)	Whatever your agent calls
`shadow certify --sign`	Sigstore Fulcio + Rekor (public infrastructure)
`shadow init --github-action`	None at scaffold time; only at PR time

shadow diagnose-pr --backend recorded (the default) and --backend mock both stay fully offline.

Threat model in one paragraph¶

The threat we model: an agent regression that silently changes behavior between PRs in a way that violates a documented contract (safety, format, tool-call ordering, semantic intent). We do not model: a malicious build of Shadow itself (use pip install --require-hashes and pin our package); a compromised CI runner (treat trace files as code review-able artifacts, not authoritative state); or model-side prompt injection (Shadow is the pre-deploy gate, not the runtime sandbox). Sigstore signing makes release-time tampering detectable; shadow verify-cert is the audit point.

Compliance question mapping¶

Common audit questions and where to point reviewers:

Question	Where
"What customer data does this tool see?"	"Whatever your agent processes; redacted by default for the patterns above; full pattern set in `docs/security/production-readiness.md`."
"Where is data stored, and for how long?"	"On your filesystem under paths you configure. Retention is governed by your repo's git-history policy. No external storage."
"How do we audit a behavior change after merge?"	"Every `shadow gate-pr` run writes a `report.json` with a content-addressed hash; the same hash lands in the PR comment and (optionally) in a signed certificate. `shadow trail <trace-id>` walks the audit chain."
"What happens on a key leak?"	"`shadow scan` detects, the Redactor blocks at write time. The fix is to add the missing pattern to the Redactor and re-record; `shadow scan` is the safety net."
"Can we run without internet?"	"Yes. `--backend recorded` (default) is fully offline; only `--backend live` and `--sign` touch the network."

Updating this document¶

When the on-disk artifact set changes, the threat model changes, or the network-touching commands change, update this page. The Redaction and Network endpoints tables are the key parts — keep them aligned with shadow.redact.patterns and the CLI's actual flags.