Engineers building long-running agents who need state to outlive a context window.

Tour — Memory compression deep dive

Long sessions break agents. The model loses focus around 60–80 turns even with a 200K context. This tour walks through how production systems push that ceiling without bankrupting their token budget.

Pacing

Block	Time
Concept · memory-compression	15 min
Insight · preservation rules are the strategy	5 min
OpenHands v1 condenser drill-down	15 min
Strix memory_compressor drill-down	10 min
Concept · prompt-caching	10 min
Insight · latched sticky flags	5 min

Output

Three concrete artifacts you should produce while reading:

A strategy choice for your agent (sliding window / LLM-summarize / event-source / hybrid) defended with one sentence on the use case.
A preservation list for your domain — five bullets, what the summarizer must keep verbatim. If you can’t write it confidently, you don’t yet have enough domain clarity to ship.
A compaction-trigger plan that doesn’t trash your cache hit rate. Concrete: “compact at turn 50, 100, 150” or “compact when input exceeds 80K tokens.”

Common mistakes you’ll avoid

Generic “summarize this conversation” prompts.
Compaction every turn (cache death).
Forgetting to preserve task IDs / file paths / errors.
Treating the summary as opaque instead of structured.

For your specific projects

For Swisscheese, the preservation list is reviewer verdicts, diff hashes, file paths, and error text. Reviewer prose is the noise; structured verdicts are the signal.

For AI Act compliance, treat the audit log and the operating memory as separate concerns. Compress the operating memory; never compress the audit log.

Itinerary

Memory compression concept

The four strategies. Read first; the algorithm is the easy part.
Memory compression preserves credentials, payloads, task IDs explicitly insight

The non-obvious lesson: preservation rules matter more than algorithm choice. Generic 'summarize this' produces useless mush.
OpenHands (v1) project

The most rigorous condenser — strategies are pluggable, the summarizer prompt enumerates what to preserve.
Strix project

Domain-specific preservation in the security-agent context. Read the compressor file.
Prompt caching concept

Compression invalidates cache. The interaction is essential to understand together — get one wrong and your bill explodes.
Latched sticky flags for cache coherence insight

How to keep cache hot when the prompt would otherwise change.
Hermes Agent project

Hybrid strategy: first-and-last verbatim, middle summarized. A pragmatic middle.

Memory compression strategies — a deep dive

Tour — Memory compression deep dive

Pacing

Output

Common mistakes you’ll avoid

For your specific projects

Itinerary

Memory compression concept

Memory compression preserves credentials, payloads, task IDs explicitly insight

OpenHands (v1) project

Strix project

Prompt caching concept

Latched sticky flags for cache coherence insight

Hermes Agent project