Skip to content

Framework adapters

Shadow's core instrumentation patches the Anthropic and OpenAI SDKs directly, which covers the majority of agents since those are the underlying LLM clients. Agents built on a framework layer above that client also get captured, but the framework's own structure (graph nodes, crew kickoffs, agent conversations) is invisible to direct SDK patching. The adapters in shadow.adapters.* hook the framework's native extension surface so that structure lands in the .agentlog too.

Three framework adapters ship in v1.4:

Framework Module Install
LangGraph / LangChain shadow.adapters.langgraph pip install 'shadow-diff[langgraph]'
CrewAI shadow.adapters.crewai pip install 'shadow-diff[crewai]'
AG2 (formerly AutoGen) shadow.adapters.ag2 pip install 'shadow-diff[ag2]'

LangGraph / LangChain

The adapter is an AsyncCallbackHandler subclass. Drop it into RunnableConfig.callbacks:

The [langgraph] extra pulls langchain-core, langgraph, and langchain-openai (the chat provider most LangGraph users pick). To run against Anthropic, Bedrock, or something else, add the matching LangChain integration alongside — pip install 'shadow-diff[langgraph]' langchain-anthropic and so on. The adapter itself is provider-neutral and doesn't care which chat model you instantiate.

from shadow.sdk import Session
from shadow.adapters.langgraph import ShadowLangChainHandler

with Session(output_path="trace.agentlog") as s:
    handler = ShadowLangChainHandler(s)
    result = await graph.ainvoke(
        {"messages": [HumanMessage("...")]},
        config={
            "callbacks": [handler],
            "configurable": {"thread_id": "t-42"},
        },
    )

Hooks:

  • on_chat_model_start / on_llm_end / on_llm_error produce the chat_request and chat_response pair
  • on_tool_start / on_tool_end / on_tool_error produce tool_call / tool_result records

The handler pair-buffers by LangChain's run_id so concurrent graph branches (which LangGraph can spawn for fan-outs) never cross-contaminate. The thread_id from the config's configurable block carries through as the session boundary, so one graph invocation is one session even across tool loops.

Works under both sync invoke and async ainvoke. Subclassing AsyncCallbackHandler avoids the known sync-on-async race where LangChain dispatches sync callbacks through loop.run_in_executor.

CrewAI

The adapter is a BaseEventListener subclass wired to CrewAI's crewai_event_bus. Instantiate it inside your Session:

from shadow.sdk import Session
from shadow.adapters.crewai import ShadowCrewAIListener

with Session(output_path="trace.agentlog") as s:
    ShadowCrewAIListener(s)
    result = crew.kickoff(inputs={"topic": "..."})

Wired events:

  • LLMCallStartedEvent / LLMCallCompletedEvent / LLMCallFailedEvent produce the chat pair
  • ToolUsageStartedEvent / ToolUsageFinishedEvent / ToolUsageErrorEvent produce the tool pair
  • CrewKickoffStartedEvent writes an authoritative metadata marker so Shadow's session detector treats one kickoff as one session, even though every LLMCallCompleted ends with end_turn

The call_id field on the LLM events is the pairing key for the chat pair, so concurrent crews never mix responses.

If you're driving the event bus with synthetic events in tests, pass quiet_internal_listeners=True to silence CrewAI's built-in telemetry handlers that expect real Crew objects:

ShadowCrewAIListener(s, quiet_internal_listeners=True)

AG2

AG2's ConversableAgent.register_hook is the canonical extension surface. The adapter wraps it and captures every LLM call routed through the registered agents:

from autogen.agentchat import ConversableAgent
from shadow.sdk import Session
from shadow.adapters.ag2 import ShadowAG2Adapter

planner = ConversableAgent(name="planner", llm_config={...}, human_input_mode="NEVER")
executor = ConversableAgent(name="executor", llm_config={...}, human_input_mode="NEVER")

with Session(output_path="trace.agentlog") as s:
    adapter = ShadowAG2Adapter(s)
    adapter.install_all([planner, executor])
    planner.initiate_chat(executor, message="plan the migration")

Hooks:

  • safeguard_llm_inputs captures the full messages list going into each LLM call
  • safeguard_llm_outputs captures the response

AG2 also ships an OTel exporter as of Feb 2026 (autogen.opentelemetry), which emits GenAI-compliant spans but redacts message bodies by default. The adapter grabs the bodies the exporter won't give you. If you want both (timing metrics plus content), run the OTel exporter alongside Shadow and use shadow import --format otel on the exported file; the v1.40 importer reads whatever the exporter emits.

Per-agent registration is per-instance. A GroupChat of five agents needs five install() calls, or one install_all([...]).

Session grouping and policy evaluation

All three adapters feed through the same Session.record_chat / record_tool_call / record_tool_result contract as direct SDK instrumentation, so every downstream feature works identically:

  • shadow diff produces the nine-axis behavioral diff
  • shadow check-policy with scope: session evaluates each framework's natural session boundary correctly
  • shadow mine clusters turn-pairs by tool sequence and selects representative cases
  • shadow mcp-serve exposes the captured traces to any MCP-aware client

If your trace contains multiple sessions (many kickoffs, many threads, many conversations) and you want a per-session breakdown, session-scoped policy rules work out of the box for LangGraph (via thread_id) and CrewAI (via kickoff markers). AG2 infers session boundaries from the prior response's stop_reason, so one initiate_chat cycle is one session.