← All projects

Strix

Open-source 'AI hacker' for autonomous pentesting. XML tool format, markdown-as-skills, LLM-based dedupe, module-level agent graph.

Category
security-agent
Language
Python
Runtime
Python 3.11+
Providers
LiteLLM (any)
Agent loop
while-loop
Tool format
xml-function-blocks
Memory
llm-summarize
Caching
cache_control-explicit
Sandbox
docker
Pedagogical
★★★★★
Flags
multi-agent · thinking · md-skills · streaming
Concepts
agent-loop ·tool-calling-formats ·memory-compression ·multi-agent-coordination ·guardrails ·skills-as-md ·streaming-tools ·sandboxing
Verified
cloned
Upstream
github.com/usestrix/strix

What it is

Open-source autonomous pentest agent. The most architecturally bold project in the corpus — many of its choices look unconventional at first and turn out to be the right tradeoff for a long-running, multi-agent, security-sensitive workload.

What’s worth studying

Strix gets four things right that few other agents do:

  1. XML tool format with single-call-per-turn. The system prompt forbids more than one <function> block, which lets the parser abort the stream the moment it sees </function> and stop paying for trailing hallucination. A free 10–20% per-step output saving. See streaming-early-stop.
  2. Markdown-first skills directory. ~30 attack methodologies live as .md files in strix/skills/. New methodology = markdown PR. The agent’s loop almost never changes; the brain evolves weekly. See markdown-as-skills.
  3. Authorized scope injected at render time. The agent’s allowed targets come from the platform DB, rendered into the system prompt by jinja, never from user chat. Defeats prompt injection by construction. See scope-injected-at-render.
  4. LLM-based dedupe of findings. Two sub-agents reporting “found SQL injection at /search” and “input validation missing in search.py” are the same bug. Hashing fails; LLM reasoning works. See llm-dedupe-root-cause.

A fifth choice that’s quieter but elegant: inter-agent messaging via plain Python module-level dicts. No Redis, no broker. One assumption (“one scan per process”), checked at boot, and you’re done. See module-level-mailbox.

Drill-down

The full per-doc analysis lives below — these are the original numbered analyses, rendered as styled HTML. Pick a section to study deeper.

Insights from this project

Non-obvious tricks pulled out as standalone study cards.