Strix — CodeDocs Vault

Strix

Open-source 'AI hacker' for autonomous pentesting. XML tool format, markdown-as-skills, LLM-based dedupe, module-level agent graph.

Category: security-agent
Language: Python
Runtime: Python 3.11+
Providers: LiteLLM (any)
Agent loop: while-loop
Tool format: xml-function-blocks
Memory: llm-summarize
Caching: cache_control-explicit
Sandbox: docker

Pedagogical: ★★★★★
Flags: multi-agent · thinking · md-skills · streaming
Concepts: agent-loop ·tool-calling-formats ·memory-compression ·multi-agent-coordination ·guardrails ·skills-as-md ·streaming-tools ·sandboxing
Verified: cloned
Upstream: github.com/usestrix/strix

Mark "Strix" as studied

What it is

Open-source autonomous pentest agent. The most architecturally bold project in the corpus — many of its choices look unconventional at first and turn out to be the right tradeoff for a long-running, multi-agent, security-sensitive workload.

What’s worth studying

Strix gets four things right that few other agents do:

XML tool format with single-call-per-turn. The system prompt forbids more than one <function> block, which lets the parser abort the stream the moment it sees </function> and stop paying for trailing hallucination. A free 10–20% per-step output saving. See streaming-early-stop.
Markdown-first skills directory. ~30 attack methodologies live as .md files in strix/skills/. New methodology = markdown PR. The agent’s loop almost never changes; the brain evolves weekly. See markdown-as-skills.
Authorized scope injected at render time. The agent’s allowed targets come from the platform DB, rendered into the system prompt by jinja, never from user chat. Defeats prompt injection by construction. See scope-injected-at-render.
LLM-based dedupe of findings. Two sub-agents reporting “found SQL injection at /search” and “input validation missing in search.py” are the same bug. Hashing fails; LLM reasoning works. See llm-dedupe-root-cause.

A fifth choice that’s quieter but elegant: inter-agent messaging via plain Python module-level dicts. No Redis, no broker. One assumption (“one scan per process”), checked at boot, and you’re done. See module-level-mailbox.

Drill-down

The full per-doc analysis lives below — these are the original numbered analyses, rendered as styled HTML. Pick a section to study deeper.

Insights from this project

Non-obvious tricks pulled out as standalone study cards.

Strix ●●●

Markdown-as-prompt-library architecture

Decouples the agent's loop from its expertise. Domain experts contribute via PR; the loop almost never changes; the library evolves weekly.

skills-as-md agent-loop

Strix ●●●

LLM-based deduplication that reasons about root cause

Two pentest reports describing the same SQL injection with different payloads aren't textually similar — but they should dedupe. Hashing fails; LLM reasoning works.

memory-compression multi-agent-coordination

Strix ●●●

Streaming early stop on </function>

A free 10-20% cost reduction per agent step. Compounds across hundreds of steps in a session.

streaming-tools tool-calling-formats agent-loop

Strix ●●●

Inter-agent messaging via module-level dicts

For one-process multi-agent coordination, plain Python dicts are the right answer. No Redis, no broker, no race conditions you need locks for.

multi-agent-coordination mailbox-patterns

Strix ●●●

Authorized scope injected into system prompt at render time

A pentest agent that can be talked out of scope is dangerous. Putting scope in the locked system prompt — not the message log — defeats prompt injection.

guardrails multi-agent-coordination

Concepts touched

Agent loop The skeleton every agent shares — read state, ask the model, parse, act, repeat — and how the wiring choices shape every other system around it.
Memory compression Long sessions overflow the context window. The good implementations don't summarize — they enumerate what to keep.
Multi-agent coordination When one agent isn't enough — three questions to answer for any review pipeline, planner-executor flow, or critic loop.
Skills as markdown Move agent expertise out of code and into versionable markdown. Domain experts contribute via PR; the loop almost never changes.
Sandboxing Limit what the tool layer can do regardless of what the agent intends. Docker, firewall, process limits, or all three.
Guardrails Layered defenses — prompt, schema, controller, sandbox — each catching a different class of failure. The story you tell auditors.
Streaming tool calls Don't wait for the full response. Parse tool calls as they stream and dispatch the moment you have enough — sometimes earlier.
Tool calling formats How the agent and the model agree on 'I want to run this function.' Get this wrong and you're locked into one provider, can't stream cleanly, or pay for trailing hallucination.

Other projects on these concepts

Omnigent shares 8 concepts
Claude Code shares 7 concepts
OpenHands (v0) shares 6 concepts
OpenHands (v1) shares 6 concepts
Hermes Agent shares 4 concepts
OpenClaw shares 4 concepts