TL;DR 9 min read

Every agent in the corpus runs the same five-step skeleton: gather context, call the model, parse the response, run any tools, fold the result back in. The interesting question isn’t what the loop does — it’s the container you put it in. Four containers are alive in the wild: a generator, a while, an event log, a graph. Each makes one thing easy and one thing hard.

Generator — best when a human watches.
While-loop — best when you want the simplest thing that works.
Event log — best when you need replay or audit.
Graph — best when steps run in parallel.

Agent loop

Picture an engineer with a debugger open and a long task list. They look at the screen, decide what to do next, do it, look again, repeat. An LLM agent is the same control flow with a model in the middle. That’s it. Everything else — streaming, multi-agent, memory compression, recovery — is built around this skeleton.

flowchart LR
S[Context / memory] -->|build prompt| L[Call the model]
L -->|stream| P[Parse intent]
P -->|tool call| T[Run tool]
P -->|final answer| O[Done]
T -->|observation| S

The skeleton every variant elaborates on.

The four containers

The agent function is an iterator. Each tick it yields an event — a thinking cue, a token, a tool call, a final answer. Whoever loops over the iterator decides what to render and when to stop.

async function* runAgent(state) {
  while (!done) {
    yield { type: 'thinking' };
    const stream = await llm.stream(state.messages);
    for await (const chunk of stream) {
      yield { type: 'token', text: chunk };
      if (chunk.kind === 'tool_use') {
        const result = await dispatch(chunk);
        state.append(result);
        yield { type: 'tool_result', result };
        break; // re-enter outer loop with new context
      }
    }
  }
}

The shape pays for itself when a human is watching. The UI redraws on each yield with no extra code — it’s just an async for. Interruption is “stop iterating.” Backpressure is automatic; if the renderer falls behind, the model isn’t asked for the next token.

The cost is process boundaries. Generators don’t cross processes well, can’t be checkpointed mid-iteration, and don’t fan out to multiple consumers without re-design.

You’ll recognize this shape in: Claude Code, NanoClaw, Mistral Vibe, OpenClaw — all interactive CLIs.

The simplest container that works. Read it left-to-right and it’s exactly the diagram.

turn = 0
while turn < MAX_TURNS:
    turn += 1
    response = llm.chat(messages, tools=tools)
    if response.tool_calls:
        for tc in response.tool_calls:
            messages.append({"role": "tool", "content": dispatch(tc)})
    else:
        return response.content

Easy to teach, easy to port across languages, easy to onboard a teammate to. The cost: streaming the UI now needs a callback or a queue; recovering from a crash mid-loop means you must explicitly checkpoint the message list.

You’ll recognize this shape in: Strix, Hermes, Kimi Code, ML Intern.

There is no “loop” — only an append-only log of events. Every action emits an event. Every observation emits an event. A controller subscribes and decides what action to emit next based on the log so far. Replaying the log reconstitutes any state.

flowchart LR
C[Controller] -->|emit action| EL[(Event Log)]
EL -->|subscribe| C
EL -->|subscribe| UI
EL -->|subscribe| Recorder
C -->|run| Tools[Tool layer]
Tools -->|emit observation| EL

An event log replaces the loop: every action and observation is an immutable event.

The wins are heavy: time-travel debugging is free, you can replay a session deterministically, the audit trail is the source of truth, and other components (recorder, microagent triggers) just become subscribers. The cost is the curve to learn it; schema evolution must be planned because old events live forever.

You’ll recognize this shape in: OpenHands v0 and v1.

Stages are nodes, transitions are edges. The orchestrator steps the graph; agents run inside nodes. Useful when several independent steps run at once, or when specialists alternate (planner → executor → critic → planner).

const graph = new Graph()
  .node('plan', plan)
  .node('execute', execute)
  .node('critique', critique)
  .edge('plan', 'execute')
  .edge('execute', 'critique')
  .edge('critique', 'plan', when((r) => r.needsReplan));

You get parallelism for free, nodes are individually testable, and complex flows stay readable as the topology grows. The tax: it’s overkill for simple agents and you’ve now bought into a graph framework.

You’ll recognize this shape in: Multica, Open Design (loosely).

Anatomy of a single iteration

This sequence is the same regardless of container. The container just decides who calls whom.

Compose the request

Pull current messages, the (mostly cached) system prompt, the tool schemas, and any per-turn additions — a memory recall, a scope reminder, a clock. The cheap-but-correct step that determines the next ten thousand input tokens.
Stream from the model

Open a streaming connection. Streaming earns three things at once: token-level UI, the option to stop reading early, and the ability to dispatch tools while arguments are still arriving.
Parse for intent

The model returns text, a tool call, a thought block, or an answer. The parser is format-specific — Anthropic tool_use blocks, XML <function> tags, OpenAI tool_calls. See tool-calling-formats.
(Optional) short-circuit

A single tool call per turn is the common case. Strix forbids more than one and aborts the stream as soon as it sees the closing tag — anything after is hallucinated rambling that you’d otherwise pay for. See streaming-early-stop.
Dispatch the tool

Validate args (Zod / Pydantic / a JSON schema), execute it, capture the structured result. This is the natural place for guardrails: scope checks, allowlists, rate limits, sandbox boundaries.
Fold the result back in

Push the observation into the message list as a tool_result (or whatever your format uses). Go to step 1, with one more turn used.
Exit conditions

Three of them. Budget exhausted (turns / tokens / dollars), an explicit final_answer tool, or the model returns no tool calls and a final text block.

Iteration budgets — and the cheap trick that improves output

Every loop in the corpus has a hard cap. The good ones also tell the agent it’s running out, so it can wrap up gracefully rather than be guillotined mid-thought.

Project	Default budget	Warning behavior
Claude Code	per-task max turns	implicit, via token tracking
OpenHands	configurable	budget escalation per model
Strix	300	warnings at 85% + last 3
Hermes	90	token-based budget

Pick a container

? What matters most for your agent?

A human watches it work Generator default
Need replay / audit trail Event log
Steps must run in parallel Graph
Smallest thing that works While-loop

Recommended default: If you're not sure, start with a while-loop. Migrate to a generator the first time you build a UI; migrate to events the first time someone asks for an audit log.

Anti-patterns from the corpus

No iteration limit. Easy to write, easy to bankrupt yourself when a tool fails in a way that the model “fixes” by retrying forever.
No de-duplication of failed tool calls. The agent retries the same broken call. Surface a “you just tried this” hint in the next prompt; the model will pivot.
Streaming + blocking tool dispatch in the same call site. UI freezes. Hand off the dispatch to a queue and continue the stream.
Multiple tool calls per turn with no policy. Either embrace parallel tool use end-to-end (with a side-effect classifier — pure tools parallelize, others serialize) or forbid it entirely. Ambiguous middle ground is bug-prone.
Token-budget creep. “Just one more memory recall, scope reminder, time stamp.” These compound; an agent that started at 6K input/turn ends up at 30K six months in. Audit prompt size per turn, not per session.

Projects that implement this

Claude Code — Anthropic's official agentic CLI. Streaming tool calls, prompt caching, thinking signatures, multi-agent subagents, slash commands.
OpenHands (v0) — All-hands AI v0 — autonomous software engineer agent. Event-sourced state, microagents, controller-level guardrails.
Strix — Open-source 'AI hacker' for autonomous pentesting. XML tool format, markdown-as-skills, LLM-based dedupe, module-level agent graph.
OpenHands (v1) — OpenHands re-architected: cleaner controller, refined memory condenser, improved tool dispatch. v1 of the All-Hands agent.
Mistral Vibe — Mistral-flavored coding agent reference. Middleware-based dispatch, minimal tool set, instructive for understanding agent loop fundamentals.
Hermes Agent — 40+ tool, multi-platform agent. Provider adapters per LLM, trajectory compression preserves first/last turns, side-channel auxiliary client.
NanoClaw — Tiny Claude-Code-shaped clone. Excellent for studying the irreducible structure of an agent loop without production overhead.
OpenClaw — Open-source Claude-Code-style agent reproduction. Bigger than NanoClaw, reveals which patterns scale and which stay minimal.
Kimi Code — Moonshot's Kimi-flavored coding agent. Compact reference for an agent loop with OpenAI-compatible tool calling.
ML Intern — ML-engineering-flavored agent. Tooling for data exploration, model training, and notebook-style work.
Open Design — Open-source design / UI-generation agent. LLM-driven design intent → code, with a design-system-aware tool surface.
Multica — Multi-cloud / multi-agent orchestration. Architecture patterns for spanning providers and clouds in one agent.

Strix ●●●

Markdown-as-prompt-library architecture

Decouples the agent's loop from its expertise. Domain experts contribute via PR; the loop almost never changes; the library evolves weekly.

skills-as-md agent-loop

Strix ●●●

Streaming early stop on </function>

A free 10-20% cost reduction per agent step. Compounds across hundreds of steps in a session.

streaming-tools tool-calling-formats agent-loop

OpenHands (v0) ●●●

Temperature perturbation as recovery from empty responses

A specific failure mode (empty response with temp=0) has a specific cheap fix. Worth knowing because it's not in any tutorial.

agent-loop guardrails

Claude Code ●●●

Thinking signature preservation across turns (and stripping on model switch)

Lose this and the model re-thinks every turn (cost spike) or you crash on model switch with an opaque signature error.

extended-thinking prompt-caching agent-loop

Agent loop

Agent loop

The four containers

Anatomy of a single iteration

Compose the request

Stream from the model

Parse for intent

(Optional) short-circuit

Dispatch the tool

Fold the result back in

Exit conditions

Iteration budgets — and the cheap trick that improves output

Pick a container

Anti-patterns from the corpus

Projects that implement this

Related insights