LLM-based deduplication that reasons about root cause

Strix runs many sub-agents that each test for vulnerabilities. They report findings independently. Two findings might describe the same underlying bug with different payloads, line numbers, and prose — a textual hash would never match them.

Instead, Strix asks an LLM: “Are these two findings the same root cause? Here’s both. Reason about it.” Same-root-cause findings collapse into one report.

def is_same_root_cause(a: Finding, b: Finding) -> bool:
    response = llm.complete(
        DEDUPE_PROMPT.format(a=a.full_text(), b=b.full_text())
    )
    return response.startswith("YES")

The prompt is essentially: “two findings, both might describe the same vulnerability, decide if the root cause is identical.”

Why this is non-obvious

Most deduplication is fast and cheap (hash, embedding cosine). LLM-based dedupe is slow and expensive — orders of magnitude more cost per pair. You’d never use this for tweets or log lines.

But for high-stakes, low-volume domains (security findings, customer support tickets, legal contracts), the false-merge cost dwarfs the LLM-call cost. Spending tokens on dedupe is correct.

Pattern beyond Strix

This generalizes anywhere two reports might be the same despite surface differences:

Customer complaints (different language, same root cause).
Bug reports (different stack traces, same broken function).
Search results in citation-heavy domains.

When NOT to use it

Volume is high (millions of items): too expensive.
Surface similarity is a strong signal (textual matches): hash first, LLM only on near-misses.
Latency-sensitive flow: do it offline as a batch job.

Hybrid pattern

A practical optimization: cheap-first. Embedding cosine to find candidates above some threshold; LLM only for the candidate pairs. Most pairs prune cheaply; expensive reasoning only where it matters.

Sources

strix/05_skills_and_prompts.md:20 ? unverified