LangGraph: Build Stateful Multi-Agent Systems That Don't Crash
You've built an agent with while loops. It worked... until it didn't. The server restarted and your agent forgot everything. A long-running task timed out and you had to start over. Your "multi-agent system" is actually just three Promise.all() calls duct-taped together.
That's where LangGraph comes in.
LangGraph isn't a wrapper around an LLM. It's a stateful orchestration framework for production-grade agent systems. Built by the LangChain team but fully independent, it's what companies like Klarna, Replit, and Elastic use when they need agents that survive crashes, pause for humans, and orchestrate complex workflows across multiple specialized agents.
This guide goes deep. We'll cover the core concepts, then build a real-time multi-agent chat system with persistent state, human approval gates, and a React frontend.
Why LangGraph Exists
The agent landscape is crowded. Most "agent frameworks" are just LLM clients with tool calling. LangGraph is different:
| Feature | Simple Agents | LangGraph |
|---|---|---|
| State | In-memory only | Persistent, typed, checkpointed |
| Flow | Linear chains | Directed graphs with cycles |
| Recovery | Start over on crash | Resume from last checkpoint |
| Human Input | External polling | Built-in interrupt() |
| Multi-agent | Manual coordination | First-class subgraph support |
LangGraph treats your agent as a state machine. Nodes are functions. Edges are transitions. State is immutable and checkpointed after every step. This isn't academic β it's the difference between a prototype and production.
Core Concepts
StateGraph: The Foundation
Everything in LangGraph starts with StateGraph. You define a typed state schema, add nodes (functions), and connect them with edges.
import { StateGraph, START, END } from "@langchain/langgraph";
import { BaseMessage } from "@langchain/core/messages";
// Define your state shape
interface AgentState {
messages: BaseMessage[];
iterationCount: number;
approved: boolean;
}
// Nodes are just functions that receive state and return updates
async function agentNode(state: AgentState): Promise<Partial<AgentState>> {
const response = await llm.invoke(state.messages);
return {
messages: [...state.messages, response],
iterationCount: state.iterationCount + 1,
};
}
async function toolNode(state: AgentState): Promise<Partial<AgentState>> {
const toolResults = await executeTools(state.messages);
return { messages: [...state.messages, ...toolResults] };
}
// Build the graph
const graph = new StateGraph<AgentState>({
channels: {
messages: { value: (x, y) => x.concat(y), default: () => [] },
iterationCount: { value: (x, y) => y ?? x, default: () => 0 },
approved: { value: (x, y) => y ?? x, default: () => false },
},
});
graph.addNode("agent", agentNode);
graph.addNode("tools", toolNode);
// Conditional routing: did the agent call a tool?
function shouldContinue(state: AgentState): "tools" | "end" {
const lastMessage = state.messages[state.messages.length - 1];
if (lastMessage.additional_kwargs?.tool_calls) {
return "tools";
}
return "end";
}
graph.addConditionalEdges("agent", shouldContinue, {
tools: "tools",
end: END,
});
graph.addEdge("tools", "agent"); // Loop back
graph.addEdge(START, "agent");
const app = graph.compile();
That loop β agent β tools β agent β is the ReAct pattern. But now it's explicit, typed, and checkpointed.
Checkpointing: The Killer Feature
LangGraph's MemorySaver, SqliteSaver, and PostgresSaver persist state after every node. Crash recovery is automatic.
import { MemorySaver } from "@langchain/langgraph";
const checkpointer = new MemorySaver();
const app = graph.compile({ checkpointer });
// Each run gets a thread_id β resume with same id after failure
const config = { configurable: { thread_id: "conversation-123" } };
const result = await app.invoke(
{ messages: [new HumanMessage("Hello")] },
config
);
// Server crashes here? No problem.
// On restart, invoke with same thread_id resumes from last checkpoint
const resumed = await app.invoke(
{ messages: [new HumanMessage("As I was saying...")] },
config
);
This is non-negotiable for long-running tasks. Your agent can run for hours, survive deploys, and resume exactly where it left off.
Human-in-the-Loop
Pause execution at any node. Wait for human input. Continue.
import { interrupt } from "@langchain/langgraph";
async function reviewNode(state: AgentState): Promise<Partial<AgentState>> {
// This pauses execution and surfaces state to your UI
const decision = await interrupt({
message: "Please review the agent's plan",
plan: state.messages[state.messages.length - 1].content,
actions: ["approve", "reject", "modify"],
});
return { approved: decision.action === "approve" };
}
// In your API route or WebSocket handler:
app.addNode("review", reviewNode);
app.addEdge("agent", "review");
// When interrupt fires, store the thread_id and interrupt payload
// Resume later with the human's decision:
await app.invoke(
{ __resume__: { action: "approve" } },
{ configurable: { thread_id: "conversation-123" } }
);
This maps to real workflows: content moderation, code review, compliance checks. Not a polling hack β built into the runtime.
Multi-Agent Patterns
LangGraph supports three topologies:
1. Supervisor Pattern
A central supervisor routes tasks to specialized workers:
interface SupervisorState {
messages: BaseMessage[];
nextAgent: string | typeof END;
taskResults: Record<string, string>;
}
async function supervisor(state: SupervisorState): Promise<Partial<SupervisorState>> {
const routingPrompt = `Given this task, which agent should handle it?
- "coder" for code tasks
- "researcher" for information gathering
- "FINISH" if complete
Task: ${state.messages[state.messages.length - 1].content}`;
const response = await llm.invoke(routingPrompt);
return { nextAgent: response.content.trim() };
}
const builder = new StateGraph<SupervisorState>({...});
builder.addNode("supervisor", supervisor);
builder.addNode("coder", codingAgent);
builder.addNode("researcher", researchAgent);
builder.addConditionalEdges("supervisor", (s) => s.nextAgent, {
coder: "coder",
researcher: "researcher",
FINISH: END,
});
// Workers report back to supervisor
builder.addEdge("coder", "supervisor");
builder.addEdge("researcher", "supervisor");
2. Network Pattern
Agents communicate peer-to-peer. Each decides where to route next.
3. Hierarchical
Trees of agents β supervisors managing sub-supervisors managing workers. LangGraph handles recursion via subgraphs.
Building a Multi-Agent Chat System
Let's build something real: a chat system with two specialized agents (coder + researcher) that a supervisor orchestrates. The frontend is React with WebSocket streaming.
Architecture
βββββββββββββββββββ WebSocket ββββββββββββββββββββ
β React App β ββββββββββββββββββΊ β Express Server β
βββββββββββββββββββ ββββββββββ¬ββββββββββ
β
βββββββββ΄ββββββββ
β Supervisor β
βββββββββ¬ββββββββ
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
βββββββββββ ββββββββββββ βββββββββββ
β Coder β βResearcherβ β Human β
βββββββββββ ββββββββββββ βββββββββββ
Backend: Express + LangGraph
// server.ts
import express from "express";
import { WebSocketServer } from "ws";
import { StateGraph, START, END, interrupt } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, AIMessage, SystemMessage } from "@langchain/core/messages";
import { MemorySaver } from "@langchain/langgraph";
import { v4 as uuidv4 } from "uuid";
const llm = new ChatOpenAI({ model: "gpt-4", temperature: 0 });
// State definition
interface ChatState {
messages: (HumanMessage | AIMessage | SystemMessage)[];
nextAgent: "coder" | "researcher" | "human" | typeof END;
streaming: boolean;
}
// Supervisor decides which agent handles the next message
async function supervisor(state: ChatState): Promise<Partial<ChatState>> {
const systemPrompt = `You are a supervisor. Route the user's request to the appropriate agent:
- "coder": for programming, debugging, code review
- "researcher": for facts, explanations, research
- "human": for approval on sensitive operations
- "END": when the task is complete
Respond with ONLY one word: coder, researcher, human, or END.`;
const routingMessages = [
new SystemMessage(systemPrompt),
...state.messages.slice(-3), // Last 3 messages for context
];
const response = await llm.invoke(routingMessages);
const decision = response.content.toString().trim().toLowerCase();
return { nextAgent: decision as any };
}
// Coder agent with specialized system prompt
async function coderAgent(state: ChatState): Promise<Partial<ChatState>> {
const systemPrompt = `You are an expert programmer. Write clean, well-commented code.
Explain your reasoning. If you see bugs, point them out clearly.`;
const messages = [
new SystemMessage(systemPrompt),
...state.messages,
];
const response = await llm.invoke(messages);
return {
messages: [...state.messages, new AIMessage({
content: response.content,
additional_kwargs: { agent: "coder" }
})],
};
}
// Researcher agent
async function researcherAgent(state: ChatState): Promise<Partial<ChatState>> {
const systemPrompt = `You are a research assistant. Provide accurate, well-sourced information.
If you're uncertain, say so. Break complex topics into digestible explanations.`;
const messages = [
new SystemMessage(systemPrompt),
...state.messages,
];
const response = await llm.invoke(messages);
return {
messages: [...state.messages, new AIMessage({
content: response.content,
additional_kwargs: { agent: "researcher" }
})],
};
}
// Human approval node
async function humanApproval(state: ChatState): Promise<Partial<ChatState>> {
const lastMessage = state.messages[state.messages.length - 1];
const decision = await interrupt({
type: "approval_request",
message: "The agent wants to execute a potentially sensitive operation",
content: lastMessage.content,
options: ["approve", "reject", "modify"],
});
if (decision.action === "reject") {
return {
messages: [...state.messages, new AIMessage({
content: "Operation rejected by user."
})],
nextAgent: END,
};
}
return { nextAgent: "supervisor" };
}
// Build the graph
const graph = new StateGraph<ChatState>({
channels: {
messages: { value: (x, y) => x.concat(y), default: () => [] },
nextAgent: { value: (x, y) => y ?? x, default: () => "supervisor" },
streaming: { value: (x, y) => y ?? x, default: () => false },
},
});
graph.addNode("supervisor", supervisor);
graph.addNode("coder", coderAgent);
graph.addNode("researcher", researcherAgent);
graph.addNode("human", humanApproval);
// Routing from supervisor
graph.addConditionalEdges("supervisor", (s) => s.nextAgent, {
coder: "coder",
researcher: "researcher",
human: "human",
END: END,
});
// All workers loop back to supervisor
graph.addEdge("coder", "supervisor");
graph.addEdge("researcher", "supervisor");
graph.addEdge("human", "supervisor");
graph.addEdge(START, "supervisor");
const checkpointer = new MemorySaver();
const app = graph.compile({ checkpointer });
// WebSocket server
const wss = new WebSocketServer({ port: 3001 });
wss.on("connection", (ws) => {
const threadId = uuidv4();
ws.send(JSON.stringify({
type: "connected",
threadId,
}));
ws.on("message", async (data) => {
const { message, resume } = JSON.parse(data.toString());
try {
const config = { configurable: { thread_id: threadId } };
let input;
if (resume) {
// Resuming from human approval
input = { __resume__: resume };
} else {
// New message
input = { messages: [new HumanMessage(message)] };
}
// Stream the graph execution
const stream = await app.stream(input, config);
for await (const event of stream) {
// Send state updates to client
ws.send(JSON.stringify({
type: "state_update",
event: event.event,
data: event.data,
}));
// Handle interrupts (human approval)
if (event.event === "interrupt") {
ws.send(JSON.stringify({
type: "awaiting_approval",
interrupt: event.data,
}));
break; // Wait for human response
}
}
} catch (error) {
ws.send(JSON.stringify({
type: "error",
error: error.message,
}));
}
});
});
console.log("WebSocket server running on ws://localhost:3001");
Frontend: React + WebSocket
// App.tsx
import React, { useState, useEffect, useRef, useCallback } from 'react';
import './App.css';
interface Message {
id: string;
role: 'user' | 'assistant';
content: string;
agent?: 'coder' | 'researcher' | 'supervisor';
timestamp: Date;
}
interface ApprovalRequest {
type: string;
message: string;
content: string;
options: string[];
}
function App() {
const [messages, setMessages] = useState<Message[]>([]);
const [input, setInput] = useState('');
const [isConnected, setIsConnected] = useState(false);
const [isThinking, setIsThinking] = useState(false);
const [awaitingApproval, setAwaitingApproval] = useState<ApprovalRequest | null>(null);
const wsRef = useRef<WebSocket | null>(null);
const messagesEndRef = useRef<HTMLDivElement>(null);
useEffect(() => {
const ws = new WebSocket('ws://localhost:3001');
wsRef.current = ws;
ws.onopen = () => setIsConnected(true);
ws.onclose = () => setIsConnected(false);
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
switch (data.type) {
case 'connected':
console.log('Connected with thread:', data.threadId);
break;
case 'state_update':
handleStateUpdate(data);
break;
case 'awaiting_approval':
setAwaitingApproval(data.interrupt);
setIsThinking(false);
break;
case 'error':
console.error('Server error:', data.error);
setIsThinking(false);
break;
}
};
return () => ws.close();
}, []);
const handleStateUpdate = useCallback((data: any) => {
const { event, data: eventData } = data;
// Track agent transitions
if (event === 'on_chain_start' && eventData.node) {
if (['coder', 'researcher'].includes(eventData.node)) {
setIsThinking(true);
}
}
// Capture agent responses
if (event === 'on_chain_end' && eventData.output?.messages) {
const newMessages = eventData.output.messages;
const lastMessage = newMessages[newMessages.length - 1];
if (lastMessage._getType() === 'ai') {
setMessages(prev => {
// Avoid duplicates
if (prev.some(m => m.id === lastMessage.id)) return prev;
return [...prev, {
id: lastMessage.id || Date.now().toString(),
role: 'assistant',
content: lastMessage.content,
agent: lastMessage.additional_kwargs?.agent,
timestamp: new Date(),
}];
});
setIsThinking(false);
}
}
}, []);
const sendMessage = () => {
if (!input.trim() || !wsRef.current) return;
const userMessage: Message = {
id: Date.now().toString(),
role: 'user',
content: input,
timestamp: new Date(),
};
setMessages(prev => [...prev, userMessage]);
setInput('');
setIsThinking(true);
wsRef.current.send(JSON.stringify({ message: input }));
};
const handleApproval = (action: string, modification?: string) => {
if (!wsRef.current) return;
wsRef.current.send(JSON.stringify({
resume: { action, modification },
}));
setAwaitingApproval(null);
setIsThinking(true);
};
useEffect(() => {
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
}, [messages]);
const getAgentBadge = (agent?: string) => {
if (!agent) return null;
const colors: Record<string, string> = {
coder: '#3b82f6',
researcher: '#10b981',
supervisor: '#8b5cf6',
};
return (
<span
className="agent-badge"
style={{ backgroundColor: colors[agent] || '#6b7280' }}
>
{agent}
</span>
);
};
return (
<div className="chat-container">
<header className="chat-header">
<h1>Multi-Agent Chat</h1>
<div className={`connection-status ${isConnected ? 'connected' : 'disconnected'}`}>
{isConnected ? 'β Connected' : 'β Disconnected'}
</div>
</header>
<div className="messages-container">
{messages.map((msg) => (
<div key={msg.id} className={`message ${msg.role}`}>
<div className="message-header">
{msg.role === 'user' ? 'You' : 'Agent'}
{getAgentBadge(msg.agent)}
</div>
<div className="message-content">{msg.content}</div>
</div>
))}
{isThinking && (
<div className="thinking-indicator">
<span className="dot"></span>
<span className="dot"></span>
<span className="dot"></span>
</div>
)}
<div ref={messagesEndRef} />
</div>
{awaitingApproval && (
<div className="approval-modal">
<div className="approval-content">
<h3>β οΈ Approval Required</h3>
<p>{awaitingApproval.message}</p>
<div className="approval-preview">
{awaitingApproval.content.substring(0, 200)}...
</div>
<div className="approval-actions">
<button onClick={() => handleApproval('approve')} className="btn-approve">
Approve
</button>
<button onClick={() => handleApproval('reject')} className="btn-reject">
Reject
</button>
<button onClick={() => handleApproval('modify')} className="btn-modify">
Request Changes
</button>
</div>
</div>
</div>
)}
<div className="input-container">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
placeholder="Ask the agents something..."
disabled={isThinking || !!awaitingApproval}
/>
<button
onClick={sendMessage}
disabled={isThinking || !!awaitingApproval || !input.trim()}
>
Send
</button>
</div>
</div>
);
}
export default App;
/* App.css */
* {
box-sizing: border-box;
}
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
margin: 0;
background: #0f0f0f;
color: #e0e0e0;
}
.chat-container {
max-width: 800px;
margin: 0 auto;
height: 100vh;
display: flex;
flex-direction: column;
}
.chat-header {
padding: 1rem 1.5rem;
border-bottom: 1px solid #333;
display: flex;
justify-content: space-between;
align-items: center;
}
.chat-header h1 {
margin: 0;
font-size: 1.25rem;
font-weight: 600;
}
.connection-status {
font-size: 0.875rem;
}
.connection-status.connected {
color: #10b981;
}
.connection-status.disconnected {
color: #ef4444;
}
.messages-container {
flex: 1;
overflow-y: auto;
padding: 1.5rem;
display: flex;
flex-direction: column;
gap: 1rem;
}
.message {
max-width: 80%;
padding: 1rem;
border-radius: 12px;
}
.message.user {
align-self: flex-end;
background: #3b82f6;
color: white;
}
.message.assistant {
align-self: flex-start;
background: #1f1f1f;
border: 1px solid #333;
}
.message-header {
font-size: 0.75rem;
font-weight: 600;
margin-bottom: 0.5rem;
display: flex;
align-items: center;
gap: 0.5rem;
opacity: 0.7;
}
.agent-badge {
padding: 2px 8px;
border-radius: 4px;
font-size: 0.625rem;
text-transform: uppercase;
letter-spacing: 0.05em;
}
.message-content {
line-height: 1.6;
white-space: pre-wrap;
}
.thinking-indicator {
align-self: flex-start;
display: flex;
gap: 4px;
padding: 1rem;
}
.thinking-indicator .dot {
width: 8px;
height: 8px;
background: #666;
border-radius: 50%;
animation: pulse 1.4s infinite;
}
.thinking-indicator .dot:nth-child(2) {
animation-delay: 0.2s;
}
.thinking-indicator .dot:nth-child(3) {
animation-delay: 0.4s;
}
@keyframes pulse {
0%, 100% { opacity: 0.3; }
50% { opacity: 1; }
}
.approval-modal {
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.8);
display: flex;
align-items: center;
justify-content: center;
z-index: 100;
}
.approval-content {
background: #1f1f1f;
border: 1px solid #444;
border-radius: 12px;
padding: 1.5rem;
max-width: 500px;
width: 90%;
}
.approval-content h3 {
margin: 0 0 1rem;
}
.approval-preview {
background: #0f0f0f;
padding: 1rem;
border-radius: 8px;
font-family: monospace;
font-size: 0.875rem;
margin: 1rem 0;
max-height: 150px;
overflow-y: auto;
}
.approval-actions {
display: flex;
gap: 0.75rem;
}
.approval-actions button {
flex: 1;
padding: 0.75rem;
border: none;
border-radius: 8px;
cursor: pointer;
font-weight: 500;
transition: opacity 0.2s;
}
.approval-actions button:hover {
opacity: 0.9;
}
.btn-approve {
background: #10b981;
color: white;
}
.btn-reject {
background: #ef4444;
color: white;
}
.btn-modify {
background: #f59e0b;
color: white;
}
.input-container {
padding: 1rem 1.5rem;
border-top: 1px solid #333;
display: flex;
gap: 0.75rem;
}
.input-container input {
flex: 1;
padding: 0.75rem 1rem;
border: 1px solid #444;
border-radius: 8px;
background: #1f1f1f;
color: inherit;
font-size: 1rem;
}
.input-container input:focus {
outline: none;
border-color: #3b82f6;
}
.input-container button {
padding: 0.75rem 1.5rem;
border: none;
border-radius: 8px;
background: #3b82f6;
color: white;
font-weight: 500;
cursor: pointer;
}
.input-container button:disabled {
opacity: 0.5;
cursor: not-allowed;
}
Running It
# Install dependencies
npm install @langchain/langgraph @langchain/openai @langchain/core express ws uuid
npm install -D @types/ws @types/uuid @types/express
# Set your API key
export OPENAI_API_KEY=your-key
# Start the server
npx ts-node server.ts
# In another terminal, start the React app
cd frontend && npm run dev
Advanced Patterns
Streaming with LangGraph
For real-time UIs, use streamEvents instead of invoke:
const eventStream = await app.streamEvents(
{ messages: [new HumanMessage("Hello")] },
{ version: "v2", configurable: { thread_id: "123" } }
);
for await (const event of eventStream) {
// event.event: "on_llm_stream", "on_chain_start", etc.
// event.data.chunk: streaming token
ws.send(JSON.stringify(event));
}
This streams LLM tokens as they're generated, not just final responses.
Subgraphs for Complex Workflows
Break complex agents into subgraphs:
// coderSubgraph.ts
const coderGraph = new StateGraph<CoderState>({...})
.addNode("plan", planningNode)
.addNode("code", codingNode)
.addNode("test", testingNode)
.addEdge(START, "plan")
.addEdge("plan", "code")
.addEdge("code", "test")
.addConditionalEdges("test", shouldFixBugs, { fix: "code", done: END });
// mainGraph.ts
const mainGraph = new StateGraph<MainState>({...})
.addNode("supervisor", supervisor)
.addNode("coder_team", coderGraph.compile()) // subgraph!
.addNode("researcher", researcher)
.addEdge(START, "supervisor");
Each subgraph has its own state schema and checkpointing.
Persistence with Postgres
For production, swap MemorySaver for PostgresSaver:
import { PostgresSaver } from "@langchain/langgraph-checkpoint-postgres";
const checkpointer = PostgresSaver.fromConnString(
"postgresql://user:pass@localhost/dbname"
);
const app = graph.compile({ checkpointer });
Now state survives server restarts and scales across multiple instances.
When to Use LangGraph
| You Need | Use LangGraph |
|---|---|
| Complex conditional flows | β Graph-based routing |
| Crash recovery | β Checkpointing |
| Human approval gates | β Built-in interrupts |
| Multi-agent orchestration | β Subgraphs + supervisor |
| Long-running tasks | β Persistence |
| Model flexibility | β OpenAI, Anthropic, Gemini, local |
| Simple Q&A chatbot | β Overkill β use direct LLM |
| One-shot code generation | β Claude Agent SDK is faster |
The Bottom Line
LangGraph isn't the fastest way to build an agent. It's the most robust way.
When you're prototyping, use whatever gets you there fastest. When you're building production systems that handle real user data, survive crashes, and orchestrate across multiple specialized agents β that's when LangGraph shines.
The graph mental model forces you to think about your agent's state and flow explicitly. That's a feature, not a bug. It catches edge cases before they become outages.
Start with StateGraph. Add checkpointing. Build your supervisor. Then watch your agents handle failures, pause for humans, and resume exactly where they left off.
That's production-grade agent infrastructure.
Resources:
Install: npm install @langchain/langgraph