Every agent in the corpus runs the same five-step skeleton: gather context, call the model, parse the response, run any tools, fold the result back in. The interesting question isn’t what the loop does — it’s the container you put it in. Four containers are alive in the wild: a generator, a while, an event log, a graph. Each makes one thing easy and one thing hard.
- Generator — best when a human watches.
- While-loop — best when you want the simplest thing that works.
- Event log — best when you need replay or audit.
- Graph — best when steps run in parallel.
Agent loop
Picture an engineer with a debugger open and a long task list. They look at the screen, decide what to do next, do it, look again, repeat. An LLM agent is the same control flow with a model in the middle. That’s it. Everything else — streaming, multi-agent, memory compression, recovery — is built around this skeleton.
flowchart LR S[Context / memory] -->|build prompt| L[Call the model] L -->|stream| P[Parse intent] P -->|tool call| T[Run tool] P -->|final answer| O[Done] T -->|observation| S
The four containers
The agent function is an iterator. Each tick it yields an event — a thinking cue, a token, a tool call, a final answer. Whoever loops over the iterator decides what to render and when to stop.
async function* runAgent(state) {
while (!done) {
yield { type: 'thinking' };
const stream = await llm.stream(state.messages);
for await (const chunk of stream) {
yield { type: 'token', text: chunk };
if (chunk.kind === 'tool_use') {
const result = await dispatch(chunk);
state.append(result);
yield { type: 'tool_result', result };
break; // re-enter outer loop with new context
}
}
}
}The shape pays for itself when a human is watching. The UI redraws on each yield with no extra code — it’s just an async for. Interruption is “stop iterating.” Backpressure is automatic; if the renderer falls behind, the model isn’t asked for the next token.
The cost is process boundaries. Generators don’t cross processes well, can’t be checkpointed mid-iteration, and don’t fan out to multiple consumers without re-design.
You’ll recognize this shape in: Claude Code, NanoClaw, Mistral Vibe, OpenClaw — all interactive CLIs.
The simplest container that works. Read it left-to-right and it’s exactly the diagram.
turn = 0
while turn < MAX_TURNS:
turn += 1
response = llm.chat(messages, tools=tools)
if response.tool_calls:
for tc in response.tool_calls:
messages.append({"role": "tool", "content": dispatch(tc)})
else:
return response.contentEasy to teach, easy to port across languages, easy to onboard a teammate to. The cost: streaming the UI now needs a callback or a queue; recovering from a crash mid-loop means you must explicitly checkpoint the message list.
You’ll recognize this shape in: Strix, Hermes, Kimi Code, ML Intern.
There is no “loop” — only an append-only log of events. Every action emits an event. Every observation emits an event. A controller subscribes and decides what action to emit next based on the log so far. Replaying the log reconstitutes any state.
flowchart LR C[Controller] -->|emit action| EL[(Event Log)] EL -->|subscribe| C EL -->|subscribe| UI EL -->|subscribe| Recorder C -->|run| Tools[Tool layer] Tools -->|emit observation| EL
The wins are heavy: time-travel debugging is free, you can replay a session deterministically, the audit trail is the source of truth, and other components (recorder, microagent triggers) just become subscribers. The cost is the curve to learn it; schema evolution must be planned because old events live forever.
You’ll recognize this shape in: OpenHands v0 and v1.
Stages are nodes, transitions are edges. The orchestrator steps the graph; agents run inside nodes. Useful when several independent steps run at once, or when specialists alternate (planner → executor → critic → planner).
const graph = new Graph()
.node('plan', plan)
.node('execute', execute)
.node('critique', critique)
.edge('plan', 'execute')
.edge('execute', 'critique')
.edge('critique', 'plan', when((r) => r.needsReplan));You get parallelism for free, nodes are individually testable, and complex flows stay readable as the topology grows. The tax: it’s overkill for simple agents and you’ve now bought into a graph framework.
You’ll recognize this shape in: Multica, Open Design (loosely).
Anatomy of a single iteration
This sequence is the same regardless of container. The container just decides who calls whom.
-
Compose the request
Pull current messages, the (mostly cached) system prompt, the tool schemas, and any per-turn additions — a memory recall, a scope reminder, a clock. The cheap-but-correct step that determines the next ten thousand input tokens.
-
Stream from the model
Open a streaming connection. Streaming earns three things at once: token-level UI, the option to stop reading early, and the ability to dispatch tools while arguments are still arriving.
-
Parse for intent
The model returns text, a tool call, a thought block, or an answer. The parser is format-specific — Anthropic
tool_useblocks, XML<function>tags, OpenAItool_calls. See tool-calling-formats. -
(Optional) short-circuit
A single tool call per turn is the common case. Strix forbids more than one and aborts the stream as soon as it sees the closing tag — anything after is hallucinated rambling that you’d otherwise pay for. See streaming-early-stop.
-
Dispatch the tool
Validate args (Zod / Pydantic / a JSON schema), execute it, capture the structured result. This is the natural place for guardrails: scope checks, allowlists, rate limits, sandbox boundaries.
-
Fold the result back in
Push the observation into the message list as a
tool_result(or whatever your format uses). Go to step 1, with one more turn used. -
Exit conditions
Three of them. Budget exhausted (turns / tokens / dollars), an explicit
final_answertool, or the model returns no tool calls and a final text block.
Iteration budgets — and the cheap trick that improves output
Every loop in the corpus has a hard cap. The good ones also tell the agent it’s running out, so it can wrap up gracefully rather than be guillotined mid-thought.
| Project | Default budget | Warning behavior |
|---|---|---|
| Claude Code | per-task max turns | implicit, via token tracking |
| OpenHands | configurable | budget escalation per model |
| Strix | 300 | warnings at 85% + last 3 |
| Hermes | 90 | token-based budget |
Pick a container
- A human watches it work Generator default
- Need replay / audit trail Event log
- Steps must run in parallel Graph
- Smallest thing that works While-loop
Recommended default: If you're not sure, start with a while-loop. Migrate to a generator the first time you build a UI; migrate to events the first time someone asks for an audit log.
Anti-patterns from the corpus
- No iteration limit. Easy to write, easy to bankrupt yourself when a tool fails in a way that the model “fixes” by retrying forever.
- No de-duplication of failed tool calls. The agent retries the same broken call. Surface a “you just tried this” hint in the next prompt; the model will pivot.
- Streaming + blocking tool dispatch in the same call site. UI freezes. Hand off the dispatch to a queue and continue the stream.
- Multiple tool calls per turn with no policy. Either embrace parallel tool use end-to-end (with a side-effect classifier — pure tools parallelize, others serialize) or forbid it entirely. Ambiguous middle ground is bug-prone.
- Token-budget creep. “Just one more memory recall, scope reminder, time stamp.” These compound; an agent that started at 6K input/turn ends up at 30K six months in. Audit prompt size per turn, not per session.
Projects that implement this
- Claude Code — Anthropic's official agentic CLI. Streaming tool calls, prompt caching, thinking signatures, multi-agent subagents, slash commands.
- OpenHands (v0) — All-hands AI v0 — autonomous software engineer agent. Event-sourced state, microagents, controller-level guardrails.
- Strix — Open-source 'AI hacker' for autonomous pentesting. XML tool format, markdown-as-skills, LLM-based dedupe, module-level agent graph.
- OpenHands (v1) — OpenHands re-architected: cleaner controller, refined memory condenser, improved tool dispatch. v1 of the All-Hands agent.
- Mistral Vibe — Mistral-flavored coding agent reference. Middleware-based dispatch, minimal tool set, instructive for understanding agent loop fundamentals.
- Hermes Agent — 40+ tool, multi-platform agent. Provider adapters per LLM, trajectory compression preserves first/last turns, side-channel auxiliary client.
- NanoClaw — Tiny Claude-Code-shaped clone. Excellent for studying the irreducible structure of an agent loop without production overhead.
- OpenClaw — Open-source Claude-Code-style agent reproduction. Bigger than NanoClaw, reveals which patterns scale and which stay minimal.
- Kimi Code — Moonshot's Kimi-flavored coding agent. Compact reference for an agent loop with OpenAI-compatible tool calling.
- ML Intern — ML-engineering-flavored agent. Tooling for data exploration, model training, and notebook-style work.
- Open Design — Open-source design / UI-generation agent. LLM-driven design intent → code, with a design-system-aware tool surface.
- Multica — Multi-cloud / multi-agent orchestration. Architecture patterns for spanning providers and clouds in one agent.
Related insights
Decouples the agent's loop from its expertise. Domain experts contribute via PR; the loop almost never changes; the library evolves weekly.
A free 10-20% cost reduction per agent step. Compounds across hundreds of steps in a session.
A specific failure mode (empty response with temp=0) has a specific cheap fix. Worth knowing because it's not in any tutorial.
Lose this and the model re-thinks every turn (cost spike) or you crash on model switch with an opaque signature error.