Tour — Memory compression deep dive
Long sessions break agents. The model loses focus around 60–80 turns even with a 200K context. This tour walks through how production systems push that ceiling without bankrupting their token budget.
Pacing
| Block | Time |
|---|---|
| Concept · memory-compression | 15 min |
| Insight · preservation rules are the strategy | 5 min |
| OpenHands v1 condenser drill-down | 15 min |
| Strix memory_compressor drill-down | 10 min |
| Concept · prompt-caching | 10 min |
| Insight · latched sticky flags | 5 min |
Output
Three concrete artifacts you should produce while reading:
- A strategy choice for your agent (sliding window / LLM-summarize / event-source / hybrid) defended with one sentence on the use case.
- A preservation list for your domain — five bullets, what the summarizer must keep verbatim. If you can’t write it confidently, you don’t yet have enough domain clarity to ship.
- A compaction-trigger plan that doesn’t trash your cache hit rate. Concrete: “compact at turn 50, 100, 150” or “compact when input exceeds 80K tokens.”
Common mistakes you’ll avoid
- Generic “summarize this conversation” prompts.
- Compaction every turn (cache death).
- Forgetting to preserve task IDs / file paths / errors.
- Treating the summary as opaque instead of structured.
For your specific projects
For Swisscheese, the preservation list is reviewer verdicts, diff hashes, file paths, and error text. Reviewer prose is the noise; structured verdicts are the signal.
For AI Act compliance, treat the audit log and the operating memory as separate concerns. Compress the operating memory; never compress the audit log.
Itinerary
-
Memory compression concept
The four strategies. Read first; the algorithm is the easy part.
-
Memory compression preserves credentials, payloads, task IDs explicitly insight
The non-obvious lesson: preservation rules matter more than algorithm choice. Generic 'summarize this' produces useless mush.
-
OpenHands (v1) project
The most rigorous condenser — strategies are pluggable, the summarizer prompt enumerates what to preserve.
-
Strix project
Domain-specific preservation in the security-agent context. Read the compressor file.
-
Prompt caching concept
Compression invalidates cache. The interaction is essential to understand together — get one wrong and your bill explodes.
-
Latched sticky flags for cache coherence insight
How to keep cache hot when the prompt would otherwise change.
-
Hermes Agent project
Hybrid strategy: first-and-last verbatim, middle summarized. A pragmatic middle.