A single agent loses focus around 60–80 turns. Splitting work across agents — planner, executor, reviewer — restores clean context windows and unlocks parallelism. Three design questions decide everything else: who blocks on whom, what context the child sees, and how the result flows back. The default-and-best answers (parent waits, fresh context, summary back) cover 90% of cases. The remaining 10% are where the architecture earns its name.
Multi-agent coordination
A single LLM agent forgets things. Around the 60-turn mark, even with a 200K context, attention frays — earlier reasoning gets diluted, the agent revisits already-failed paths, output quality drifts. The fix isn’t a smarter model. It’s fewer turns per agent, achieved by delegating subtasks to fresh agents that finish quickly and report back.
That’s all “multi-agent” really is: a way to spend many short, sharp context windows instead of one long, blunt one.
The dominant pattern: a tool that delegates
Every project in the corpus implements multi-agent the same way at heart: the parent has a delegate (or agent, or subagent) tool. Calling it pauses the parent. The child runs to completion in its own loop. The child’s final answer comes back as a tool_result and the parent resumes one turn later.
sequenceDiagram participant P as Parent participant T as Delegate tool participant C as Child agent P->>T: delegate(task='research X', context=summary) T->>C: spawn(messages=[system, user_task]) Note over C: child runs its own loop C-->>T: final answer / artifact T-->>P: tool_result(child_summary) Note over P: parent resumes
Why this is the default: it slots into the existing loop without inventing new infrastructure. The parent already knows how to dispatch tools. A child agent is just an unusually long-running tool.
Question 1 — what context does the child see?
The child gets a brand-new conversation: just the system prompt plus the task description the parent wrote. No history. No prior turns.
This is what you want by default. The whole point of delegating is that the child doesn’t carry the parent’s accumulated cruft. The parent compresses what the child returns, not what the child produces along the way.
If you find yourself thinking “but the child needs to know X” — write X into the task description. That’s the discipline. The discipline is the win.
The child inherits a copy of the parent’s context.
You almost never want this. The two cases where you do:
- The parent has built up a detailed plan that’s awkward to summarize without losing fidelity, and the child needs to execute it verbatim.
- The parent has loaded a large reference document into context and the child needs to query against the same document.
Both cases are warning signs. The first is usually solved by putting the plan in a file the child reads as a tool call. The second is usually solved by giving the child a search tool against the document.
Reach for forking only when the alternatives are clearly worse.
A long-running child that takes successive tasks. Saves spin-up cost when the child has expensive setup (e.g. building a code-search index).
Rarely worth it. Lifecycle becomes a question (when does the pool drain? what if the child crashes?), and the children are no longer cleanly independent. Most teams just pay the spin-up cost and keep agents stateless.
Question 2 — does the parent block?
- Children read independently (no side effects) Run in parallel wins
- Each child writes to a shared file or DB Serialize, or scope each child to a subdir
- Children depend on each other's output Serialize
- You don't know yet Block by default; loosen later
Recommended default: Parallelism is the headline win of multi-agent. Default to running children in parallel if you can, but verify the child's tools don't share write paths.
Question 3 — what flows back?
The child’s outcome comes back as a tool_result. There are three flavors of “outcome”:
-
Just the final answer
A string or markdown summary. Cheapest. Parent loses everything else. Right when the child’s job is “answer this question” and the work was disposable.
-
Final answer + key artifacts
The summary plus paths/IDs the child produced (file paths, ticket IDs, commit hashes). Right when the parent will keep working with what the child made.
-
Full sub-trajectory
The entire child transcript. Almost never useful — defeats the point of delegation, blows up parent’s context. The exception: when an auditor or human needs to inspect the child’s reasoning later, store the trajectory but don’t return it. Return a pointer.
The Strix mailbox — when delegation isn’t enough
Strix takes the unusual step of letting agents post messages to peer agents, not just return up to a parent. The infrastructure is module-level dictionaries: a graph dict tracks parent-children, an instances dict holds live agents, a messages dict is per-recipient mailboxes. Any agent can drop a note into another’s mailbox.
class BaseAgent:
_agent_graph: dict[str, list[str]] = {}
_agent_instances: dict[str, BaseAgent] = {}
_agent_messages: dict[str, list[Msg]] = {}
The cleverness isn’t the dicts — it’s the recognition that for a single-process pentest, you don’t need Redis or a broker. Python’s GIL serializes individual dict operations. You just need to declare “one scan per process” and you’re done.
Coordination pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Children that delegate to children that… | runaway cost, rate limits | depth cap (most teams: 2 or 3) |
| Children that share writeable files | flaky races | filesystem lock, or scope each child to a subdir |
| Parent waits sequentially on slow children | wall-clock dominated by tail | parallelize where independent |
| Child crashes, parent hangs | half-completed task, stuck loop | timeouts + structured error returns |
| Parent and child instructions conflict | child does the wrong thing | child re-states scope in its own system prompt |
When not to delegate
- The task is small. A 5-turn delegation is more overhead than just doing it inline.
- The state is unfit to summarize. If you can’t crisply describe what the child should do, the child won’t do it crisply either.
- One critical edit. Don’t delegate “now write the fix” if the parent has all the context — fragmenting understanding loses the fix.
Cross-project comparison
| Project | Pattern | Fresh by default? | Parallel children? | Notable |
|---|---|---|---|---|
| Claude Code | Parent-child delegate tool | yes | sequential | Fork flag exists, rarely used |
| OpenHands | Delegate as an action / event | yes | sequential | Registry-based agent lookup |
| Strix | Delegate + module-level mailbox | yes | yes (threading) | Single-process limitation |
| Hermes | Delegate tool | yes | yes (threading) | Registry of named specialist agents |
| Multica | Graph nodes | n/a | yes | Edges are coordination |
Projects that implement this
- Claude Code — Anthropic's official agentic CLI. Streaming tool calls, prompt caching, thinking signatures, multi-agent subagents, slash commands.
- OpenHands (v0) — All-hands AI v0 — autonomous software engineer agent. Event-sourced state, microagents, controller-level guardrails.
- Strix — Open-source 'AI hacker' for autonomous pentesting. XML tool format, markdown-as-skills, LLM-based dedupe, module-level agent graph.
- OpenHands (v1) — OpenHands re-architected: cleaner controller, refined memory condenser, improved tool dispatch. v1 of the All-Hands agent.
- Hermes Agent — 40+ tool, multi-platform agent. Provider adapters per LLM, trajectory compression preserves first/last turns, side-channel auxiliary client.
- Open Design — Open-source design / UI-generation agent. LLM-driven design intent → code, with a design-system-aware tool surface.
- Multica — Multi-cloud / multi-agent orchestration. Architecture patterns for spanning providers and clouds in one agent.
Related insights
Two pentest reports describing the same SQL injection with different payloads aren't textually similar — but they should dedupe. Hashing fails; LLM reasoning works.
For one-process multi-agent coordination, plain Python dicts are the right answer. No Redis, no broker, no race conditions you need locks for.
A pentest agent that can be talked out of scope is dangerous. Putting scope in the locked system prompt — not the message log — defeats prompt injection.