Shadow, SOC 2 readiness notes¶

This doc is for customers deploying Shadow and for their auditors. Shadow is open-source software; it is not a SOC 2 certified service. The adopting organization retains responsibility for their own audit. This doc describes the primitives Shadow provides that help you satisfy the relevant Trust Services Criteria.

What Shadow ships that maps to SOC 2¶

Criterion	Shadow provides	Location
CC6.1 (logical access controls)	Redaction at record boundaries; per-key allowlist	`shadow.redact.Redactor`
CC6.7 (data transmission confidentiality)	Keys and PII redacted before canonical-JSON serialization	`shadow.sdk.Session._redact`
CC7.1 (system monitoring)	Tamper-evident append-only audit log with SHA-256 chain	`shadow.enterprise.AuditLog`
CC7.2 (authenticated + authorized access)	HTTP principal via `X-Shadow-Principal` header + IP fallback	`shadow.enterprise.AccessLogMiddleware`
CC7.3 (processing logged)	One audit event per dashboard request	same
CC7.4 (anomaly response)	Audit log `verify()` detects tampering, deletion, reordering	`AuditLog.verify()`
CC8.1 (change management)	Content-addressed traces (SHA-256 ids), any modification changes the id	spec §6

Audit-log data model¶

Every event is an append-only line of canonical JSON in a .auditlog file. Fields:

ts, RFC 3339 UTC timestamp, millisecond precision
actor, user:<name> | service:<name> | IP | "unknown"
action, canonical action verb (e.g. session.open, http.GET, trace.read)
resource, path / URL / trace id being acted on
outcome, ok | denied | error
reason, optional free text (max 240 chars for the access-log middleware)
prev_hash, SHA-256 of the canonical bytes of the previous event, or "" for the first

The chain is verifiable offline: AuditLog(path).verify() → (bool, reason). Any tampering, modification, deletion, reordering, breaks a subsequent prev_hash and is detected.

Access-log middleware¶

Install on the shadow serve dashboard (or any FastAPI app):

from shadow.enterprise import AuditLog, AccessLogMiddleware

audit = AuditLog(".shadow/audit.auditlog", actor="service:shadow-serve")
app.add_middleware(AccessLogMiddleware, audit_log=audit)

This records one audit event per request: - action = "http.GET" / "http.POST" etc. - resource = request.url.path - actor = X-Shadow-Principal header || client IP - outcome = "ok" if status<400 else "error" - extra = {"status_code": int, "latency_ms": int}

Redaction coverage matrix¶

The default Redactor() catches these classes before content is hashed into .agentlog:

Class	Pattern	Behavior
OpenAI API key	`sk-(proj-\\|svcacct-\\|admin-)?…` (≥20 chars)	replaced with `[REDACTED:openai_api_key]`
Anthropic API key	`sk-ant-…` (≥20 chars)	`[REDACTED:anthropic_api_key]`
AWS access key id	`(AKIA\\|ASIA\\|AIDA\\|AROA)…` (16 alnum)	`[REDACTED:aws_access_key_id]`
GitHub token	`(ghp\\|gho\\|ghu\\|ghs\\|ghr)_…` (36-251 alnum)	`[REDACTED:github_token]`
Private key	PEM-armoured RSA / EC / ED25519 / OpenSSH / encrypted / PGP	`[REDACTED:private_key]`
JWT	Three-segment base64url (header.payload.signature)	`[REDACTED:jwt]`
Email	RFC-5321-ish	`[REDACTED:email]`
Phone (E.164)	`+<country><digits>`	`[REDACTED:phone]`
Credit card	Luhn-valid, contiguous OR hyphen/space-separated	`[REDACTED:credit_card]`

Gaps, classes NOT redacted by default. Customers needing these must install custom Redactor patterns OR rely on field-level application-layer redaction before handing payloads to Session.record_chat:

US SSN (9-digit)
IBAN (bank account numbers)
IPv4 / IPv6 addresses
Free-text dates of birth
Driver's license / passport / national IDs
Internal / proprietary employee or customer IDs

The conformance test suite at python/tests/test_redaction_conformance.py asserts both what IS covered and what is NOT covered. Changing either is a deliberate coverage expansion and must be accompanied by a CHANGELOG entry.

Per-key allowlist¶

Some keys (e.g. internal_email sent from service A to service B) are safe to leave un-redacted. Configure at Redactor construction:

from shadow.redact import Redactor
r = Redactor(allowlist_keys=frozenset({"internal_email", "trace_id"}))

What Shadow does NOT do (customer responsibility)¶

Encryption at rest. .agentlog and .auditlog files are cleartext on disk. Use full-disk encryption, encrypted volumes, or deploy to a KMS-backed filesystem per your requirements.
Authentication. The dashboard middleware logs the caller via the X-Shadow-Principal header, but it does not verify the identity. Terminate authN at your ingress (OAuth2, mTLS, header-injection from a SSO proxy).
Authorization. Per-resource RBAC is out of scope for v0.1. Callers with access to the .shadow/ dir can read anything under it.
Retention / data-lifecycle. Shadow writes files; your organisation's retention policy is out of scope.
Key rotation. API keys Shadow may record are redacted before write; rotation of YOUR application's keys is your operational concern.

Evidence bundle for auditors¶

When handing evidence to a SOC 2 auditor, assemble:

The docs/SOC2-READINESS.md file (this doc) as scope.
The python/tests/test_redaction_conformance.py output (last run).
The python/tests/test_enterprise_audit.py output (tamper detection).
A sample .auditlog from a production period + the result of AuditLog(path).verify() showing (True, "").
The deployment topology showing where encryption-at-rest, authN, and retention policies are enforced (customer-specific).