Pre-Compaction Memory Flush
Every long-running AI agent session hits the same wall: context compression. The LLM runs out of room, compresses earlier messages, and your system prompt disappears. The agent keeps working. It just forgot its rules.
This is not theoretical. We measured it across our 6-agent production system: 12 rule violations per agent per day after compression events. 23% of tool calls became redundant because the agent forgot what it already did. No error message. No warning. Just silent degradation.
One hook fixes it.
The Hook
#!/usr/bin/env python3
"""Pre-compaction memory flush -- save critical context before compression."""
import json, os, sys
from datetime import datetime
from pathlib import Path
FLUSH_THRESHOLD = 150 # tool calls (~75% of context window)
def main():
agent = os.environ.get("AGENT_NAME", "default")
session_file = Path(f"/tmp/session_{agent}.json")
session = json.loads(session_file.read_text()) if session_file.exists() \
else {"tool_calls": 0, "files_read": [], "files_written": [], "flushed": False}
session["tool_calls"] += 1
if session["tool_calls"] >= FLUSH_THRESHOLD and not session["flushed"]:
memory_path = Path(f"data/agents/{agent}/MEMORY.md")
summary = f"\n## Session {datetime.now().isoformat()}\n"
summary += f"- Files modified: {', '.join(session['files_written'])}\n"
summary += f"- Tool calls: {session['tool_calls']}\n"
memory_path.open("a").write(summary)
session["flushed"] = True
session_file.write_text(json.dumps(session))
if __name__ == "__main__":
main()
Register it as a Claude Code PostToolUse hook:
{
"hooks": {
"PostToolUse": [
{
"matcher": "",
"hooks": [{"type": "command", "command": "hooks/pre_compaction_flush.py"}]
}
]
}
}
Every tool call increments the counter. At 150 calls, the hook writes a session summary to persistent storage. When compression happens, the knowledge survives on disk. The agent reads it back on the next substantive action.
Why 150 Tool Calls
Claude's context window is 200K tokens. A typical tool call consumes 1,000-1,500 tokens (the call, the result, the response). At 150 tool calls, you have consumed roughly 150K-225K tokens -- right at the compression boundary.
We tested thresholds from 100 to 200. At 100, the flush fires too early and the summary becomes stale before compression hits. At 200, the flush fires too late and compression has already dropped critical context. 150 hits the sweet spot for our workload.
Your threshold will vary based on tool call verbosity. The principle is the same: flush before compression, not after.
What Changes
Before the hook:
- Rule violations post-compression: ~12 per agent per day
- Redundant tool calls: 23% of total
- Failure mode: silent (agent never reports lost context)
After the hook:
- Rule violations post-compression: near-zero
- Redundant tool calls: reduced by 18%
- Recovery: agent reads flushed memory and restores constraints automatically
We paired this with a second hook -- a PreToolUse memory search enforcer that reminds the agent to read persistent memory before its first substantive action. Belt and suspenders. The combination means context compression goes from a silent failure to a non-event.
The Community Signal
This pattern is spreading. Manthan Gupta's Clawdbot project independently developed the same approach: a two-layer memory system with mandatory memory search enforcement. The convergence is telling -- multiple teams running production AI agents arrived at the same conclusion independently.
Context compression is not a bug to file. It is a structural property of finite-context LLMs. The solution is not a bigger context window. The solution is a hook that treats compression as a lifecycle event and prepares for it.
Why This Is L5
The enforcement ladder has five levels. L2 is prose (documentation). L3 is templates. L4 is tests. L5 is hooks and automation -- enforcement that requires zero awareness from the agent.
The pre-compaction flush is L5 because the agent does not need to know it exists. It does not need to remember to save its memory. It does not need to be told about context compression. The hook fires automatically, every time, without any agent cooperation.
This is the difference between detection-based governance and structural enforcement. A detection system would alert you that the agent lost context after it already shipped rule-violating code. An L5 hook prevents the loss from happening.
Try It
Drop the hook into any Claude Code project. It works with any agent that uses PostToolUse hooks. The pattern adapts to any agentic framework that supports lifecycle callbacks -- the concept is not Claude-specific.
If you want to see where your AI development pipeline stands on compaction vulnerability, enforcement posture, and context health, run a scan:
Run our open-source governance scanner on any public repository. Six dimensions scored, instant results, no signup required.
Try the Free Governance ScannerGet AI Governance Insights
Practical takes on enforcement automation and EU AI Act readiness. No spam.
Related Articles
Your AI Agent Forgets Its Rules Every 45 Minutes — Here's the Fix
Every long-running AI agent hits context compression. Your system prompts, project rules, and behavioral constraints get silently dropped. Here's a production-proven hook that flushes critical knowledge to persistent storage before compression hits.
5 min readAI Coding Agents Need Enforcement Ladders, Not More Prompts
75% of AI coding models introduce regressions on sustained maintenance. The fix is not better prompts -- it is structural enforcement at five levels, from conversation to pre-commit hooks.
4 min readContext Consistency Destroys Multi-Agent Teams
When 6 agents share context without consistency guarantees, they diverge silently. Here's what we learned from running a production multi-agent system with cross-agent signal routing.
5 min readFramework Governance Scores
See how major AI/ML frameworks score on enforcement posture, context hygiene, and EU AI Act readiness.
Want to know where your AI governance stands?
Get a Free Governance Audit