LangGraph + Claude Agent SDK: The Ultimate Guide to Multi-Agent Systems in 2026
The agent era isn't coming. It's here. And if you're building serious agentic systems in 2026, two tools belong in your arsenal: LangGraph and Anthropic's Claude Agent SDK (which Anthropic just renamed from the Claude Code SDK — we'll get into that). This isn't a "hello world" primer. This is the guide I wish existed when I started going deep on these.
Let's go.
The Multi-Agent Landscape
Before we dive into the tools, let's zoom out. The market right now looks something like this:
| Framework | Philosophy | Best For |
|---|---|---|
| LangGraph | Graph-based, low-level, model-agnostic | Complex workflows, durable execution, fine-grained control |
| Claude Agent SDK | Agent loop in a box, Claude-native | Code tasks, file ops, rapid multi-agent builds |
| AutoGen (Microsoft) | Conversation-driven, role-based agents | Research automation, conversational teams |
| CrewAI | Role-playing agent crews | Business process automation with natural-language roles |
| Haystack | Pipeline-based, retrieval-focused | RAG, search, document intelligence |
| Semantic Kernel | Microsoft-backed, .NET/Python | Enterprise teams already in Azure |
The leaders are LangGraph and the Claude Agent SDK — one for control freaks (affectionate), one for velocity.
Part 1: LangGraph
What It Is
LangGraph is a low-level orchestration framework for stateful, long-running agents. It's built by the LangChain team but is fully independent — you don't need LangChain to use it. Companies like Klarna, Replit, and Elastic run it in production.
The mental model: your agent is a directed graph. Nodes are functions (think: LLM calls, tool invocations, conditional checks). Edges define flow. State is typed and persisted across the entire execution.
This is fundamentally different from a simple chain or loop. You get:
- Branching — conditional logic based on state
- Cycles — agents that loop back and retry
- Checkpointing — resume from exactly where you left off after a crash
- Human-in-the-loop — pause at any node, get a human decision, continue
Core Concepts
StateGraph
Everything starts with StateGraph. You define a typed state schema, add nodes, connect them with edges.
from langgraph.graph import StateGraph, MessagesState, START, END
def agent_node(state: MessagesState):
# call your LLM here
return {"messages": [{"role": "ai", "content": "..."}]}
def tool_node(state: MessagesState):
# execute tools based on last message
...
graph = StateGraph(MessagesState)
graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)
# conditional routing: if agent called a tool, go to tools; else end
graph.add_conditional_edges("agent", should_use_tools, {
"tools": "tools",
"end": END
})
graph.add_edge("tools", "agent") # tools feed back into agent
graph.add_edge(START, "agent")
app = graph.compile()
That loop — agent → tools → agent — is the ReAct pattern. LangGraph makes it explicit and controllable.
Custom State
MessagesState is the default. But you can define any typed state:
from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[list, add_messages]
plan: str
iterations: int
approved: bool
State is immutable by merge — each node returns a partial update, LangGraph merges it. This makes concurrent subgraph execution safe.
Durable Execution
LangGraph's killer feature for production: checkpointing.
from langgraph.checkpoint.memory import MemorySaver
checkpointer = MemorySaver() # or SqliteSaver, PostgresSaver
app = graph.compile(checkpointer=checkpointer)
# each run gets a thread_id — resume with same id after failure
config = {"configurable": {"thread_id": "run-42"}}
result = await app.ainvoke({"messages": [...]}, config=config)
If your agent crashes mid-execution? It resumes from the last checkpoint on retry. For long-running tasks (minutes, hours), this is non-negotiable.
Human-in-the-Loop
Pause execution at any node. Wait for human input. Continue.
from langgraph.graph import interrupt
def review_node(state: AgentState):
# this pauses and surfaces the state to a human
decision = interrupt({"plan": state["plan"], "action": "approve?"})
return {"approved": decision["approved"]}
This maps directly to real-world workflows: code review approval, compliance check, content moderation. Not a hack — built into the framework at the runtime level.
Multi-Agent Patterns in LangGraph
LangGraph supports three main multi-agent topologies:
1. Supervisor Pattern A central supervisor routes tasks to specialized subagents:
def supervisor(state):
# decide which agent handles next task
next_agent = llm.invoke(routing_prompt + str(state))
return {"next": next_agent}
builder = StateGraph(State)
builder.add_node("supervisor", supervisor)
builder.add_node("coder", coding_agent)
builder.add_node("researcher", research_agent)
builder.add_conditional_edges("supervisor", lambda s: s["next"],
{"coder": "coder", "researcher": "researcher", "FINISH": END})
2. Network Pattern Agents communicate peer-to-peer. Each agent decides where to route next.
3. Hierarchical Trees of agents — supervisors managing sub-supervisors managing workers. LangGraph handles the recursion.
LangSmith Integration
LangGraph's observability story is via LangSmith. Set two env vars:
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY=your-key
Every graph execution is traced — node inputs/outputs, latency, token usage, state diffs. For debugging multi-agent systems, this is invaluable. You can't debug what you can't see.
When to Use LangGraph
- You need precise control over agent behavior at each step
- Your workflow has complex conditional logic (not just loops)
- You need durable execution (long-running, resumable)
- You're model-agnostic (OpenAI, Anthropic, Gemini, local models)
- You need to pause for humans mid-execution
- You're building a production system that needs to handle failures gracefully
Part 2: Claude Agent SDK
Wait — It Got Renamed
Hot off the presses: Anthropic just renamed the Claude Code SDK → Claude Agent SDK. If you're on old docs, you'll see migration notes. The packages:
# TypeScript
npm install @anthropic-ai/claude-agent-sdk
# Python
pip install claude-agent-sdk
The rebrand signals Anthropic's intent: this isn't just a tool for coding tasks. It's a general-purpose agent runtime.
What It Is
The Claude Agent SDK gives you the same agent loop, tools, and context management that power the Claude Code CLI — but programmable. You get an autonomous agent that can:
- Read and write files
- Run terminal commands
- Search the web
- Edit code with surgical precision
- Spawn subagents for parallel work
The core API is dead simple:
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Audit src/ for security vulnerabilities and write a report",
options: { allowedTools: ["Read", "Grep", "Glob", "Write"] }
})) {
if ("result" in message) console.log(message.result);
}
query() returns an async iterator that streams messages as Claude works. You get Claude's reasoning, tool calls, tool results, and the final output — all in one stream.
Built-in Tools
No glue code. No tool execution to implement yourself. These are ready:
| Tool | What It Does |
|---|---|
Read | Read any file in working dir |
Write | Create new files |
Edit | Make precise, surgical edits |
Bash | Run terminal commands, scripts, git |
Glob | Find files by pattern (**/*.ts) |
Grep | Regex search across file contents |
WebSearch | Live web search |
WebFetch | Fetch and parse web pages |
AskUserQuestion | Ask for clarification with multiple choice |
Task | Invoke a subagent |
This is the same toolset powering Claude Code. Battle-tested, production-hardened.
Hooks — The Control Plane
Hooks let you intercept and modify agent behavior at key lifecycle points:
from datetime import datetime
from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher
async def audit_file_changes(input_data, tool_use_id, context):
file_path = input_data.get("tool_input", {}).get("file_path", "unknown")
with open("./audit.log", "a") as f:
f.write(f"{datetime.now()}: modified {file_path}\n")
return {}
async def block_dangerous_commands(input_data, tool_use_id, context):
cmd = input_data.get("tool_input", {}).get("command", "")
if "rm -rf" in cmd or "DROP TABLE" in cmd:
return {"decision": "block", "reason": "dangerous command detected"}
return {}
async for message in query(
prompt="Refactor the auth module",
options=ClaudeAgentOptions(
permission_mode="acceptEdits",
hooks={
"PostToolUse": [HookMatcher(matcher="Edit|Write", hooks=[audit_file_changes])],
"PreToolUse": [HookMatcher(matcher="Bash", hooks=[block_dangerous_commands])],
}
)
):
...
Available hooks: PreToolUse, PostToolUse, Stop, SessionStart, SessionEnd, UserPromptSubmit. This is where you enforce safety policies, log for compliance, or inject dynamic context.
Subagents — The Big One
This is where the Claude Agent SDK gets serious. You can define named subagents with specialized prompts and tool restrictions, and Claude orchestrates delegation automatically:
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition
async for message in query(
prompt="Review the auth module for security issues and test coverage",
options=ClaudeAgentOptions(
# Task tool is required — it's how Claude invokes subagents
allowed_tools=["Read", "Grep", "Glob", "Bash", "Task"],
agents={
"security-reviewer": AgentDefinition(
description="Expert in security vulnerabilities. Use for auth, crypto, injection risks.",
prompt="""You are a security specialist. Review code for:
- SQL injection, XSS, CSRF
- Insecure crypto (MD5, SHA1 for passwords)
- Hardcoded secrets
- Auth bypass patterns
Be specific: file name, line number, severity.""",
tools=["Read", "Grep", "Glob"], # read-only — can't accidentally modify
model="sonnet",
),
"test-runner": AgentDefinition(
description="Runs test suites and analyzes coverage. Use for test execution.",
prompt="""You are a test specialist. Run tests, analyze output, report failures with context.""",
tools=["Bash", "Read", "Grep"],
),
}
)
):
...
Claude reads the description fields and decides which subagent to delegate to. You can also request one explicitly: "Use the security-reviewer agent to audit auth.py."
Three ways to define subagents:
- Programmatic (above) — recommended for SDK apps
- Filesystem — markdown files in
.claude/agents/(for Claude Code projects) - Built-in — a general-purpose subagent always available via the
Tasktool
Permission Modes
ClaudeAgentOptions(permission_mode="acceptEdits") # auto-approve file writes
ClaudeAgentOptions(permission_mode="default") # ask before each tool use
ClaudeAgentOptions(permission_mode="bypassPermissions") # full auto (use with care)
For CI/CD pipelines, acceptEdits or bypassPermissions makes sense. For interactive sessions, default keeps you in the loop.
Cloud Provider Support
The SDK supports multiple backends:
# Amazon Bedrock
CLAUDE_CODE_USE_BEDROCK=1
# Google Vertex AI
CLAUDE_CODE_USE_VERTEX=1
# Microsoft Azure AI Foundry
CLAUDE_CODE_USE_FOUNDRY=1
Enterprise teams already on AWS/GCP/Azure can use the SDK without hitting Anthropic's API directly. Data stays in your cloud.
When to Use the Claude Agent SDK
- Your tasks are code-centric (read, edit, run, test)
- You want a batteries-included agent with no tool implementation overhead
- You need fast iteration (one
query()call and you're running) - You want subagent delegation without building your own orchestration layer
- You're all-in on Claude (not model-agnostic)
- You're building CI/CD integrations, code review bots, dev tooling
Part 3: Using Them Together
Here's the move nobody talks about: LangGraph for the workflow skeleton, Claude Agent SDK for the heavy lifting inside nodes.
from langgraph.graph import StateGraph, START, END
from claude_agent_sdk import query, ClaudeAgentOptions
class ReviewState(TypedDict):
pr_url: str
security_findings: list
test_results: str
approved: bool
async def security_audit_node(state: ReviewState) -> dict:
findings = []
async for msg in query(
prompt=f"Security audit the changes in PR: {state['pr_url']}",
options=ClaudeAgentOptions(
allowed_tools=["Read", "Grep", "Glob", "WebFetch"],
)
):
if hasattr(msg, "result"):
findings.append(msg.result)
return {"security_findings": findings}
async def test_runner_node(state: ReviewState) -> dict:
output = ""
async for msg in query(
prompt="Run the test suite and report coverage",
options=ClaudeAgentOptions(
allowed_tools=["Bash", "Read"],
permission_mode="bypassPermissions"
)
):
if hasattr(msg, "result"):
output = msg.result
return {"test_results": output}
# LangGraph orchestrates, Claude Agent SDK executes
builder = StateGraph(ReviewState)
builder.add_node("security_audit", security_audit_node)
builder.add_node("test_runner", test_runner_node)
builder.add_node("human_review", human_review_node) # interrupt here
# run security + tests in parallel, then human review
builder.add_edge(START, "security_audit")
builder.add_edge(START, "test_runner")
builder.add_edge("security_audit", "human_review")
builder.add_edge("test_runner", "human_review")
builder.add_edge("human_review", END)
pipeline = builder.compile(checkpointer=checkpointer)
LangGraph gives you: parallelism, checkpointing, human approval gates, conditional routing. Claude Agent SDK gives you: autonomous code execution inside each node with zero boilerplate.
Lethal combination.
Framework Decision Matrix
Do you need multi-LLM support?
├── Yes → LangGraph (model-agnostic)
└── No → Either works
Are tasks primarily code/file operations?
├── Yes → Claude Agent SDK (batteries included)
└── No → LangGraph (build your own tools)
Need durable execution / checkpointing?
├── Yes → LangGraph (first-class checkpoint support)
└── No → Either works
Need human-in-the-loop pauses?
├── Yes → LangGraph (interrupt() built in)
└── No → Either works
Need to ship in < 1 day?
├── Yes → Claude Agent SDK (one query() call and you're running)
└── No → LangGraph (invest in the graph)
Building a production workflow with complex branching?
└── LangGraph + Claude Agent SDK nodes
Building a dev tool / CI integration?
└── Claude Agent SDK, probably with subagents
Quick Start Recipes
Claude Agent SDK — Code Review Bot in 20 lines
import { query } from "@anthropic-ai/claude-agent-sdk";
import { execSync } from "child_process";
const diff = execSync("git diff main HEAD").toString();
for await (const message of query({
prompt: `Review this PR diff for bugs, security issues, and style problems:\n\n${diff}`,
options: {
allowedTools: ["Read", "Glob", "Grep"],
systemPrompt: "You are a senior engineer. Be specific, be harsh, be helpful."
}
})) {
if ("result" in message) console.log(message.result);
}
LangGraph — Research Agent with Human Approval
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import interrupt
def research_node(state):
# run your LLM + web search tools here
...
def approval_node(state):
decision = interrupt({"findings": state["findings"]})
return {"approved": decision}
def publish_node(state):
if state["approved"]:
# push to CMS, Slack, wherever
...
graph = StateGraph(State)
graph.add_node("research", research_node)
graph.add_node("approve", approval_node)
graph.add_node("publish", publish_node)
graph.add_edge(START, "research")
graph.add_edge("research", "approve")
graph.add_conditional_edges("approve",
lambda s: "publish" if s["approved"] else END)
app = graph.compile(checkpointer=MemorySaver())
TL;DR
-
LangGraph = control plane for complex agentic workflows. Graph-based, durable, model-agnostic. Reach for it when you need to orchestrate across multiple agents, require human-in-the-loop, or need production-grade resilience.
-
Claude Agent SDK (formerly Claude Code SDK — just renamed) = execution engine for file/code/shell tasks. One function call. Rich built-in toolset. Subagents out of the box. Reach for it when you're building dev tooling or want a Claude-powered autonomous worker.
-
Use both when you want LangGraph's orchestration guarantees + Claude's execution power inside each node.
The multi-agent era demands frameworks, not just models. Learn the graph.