August 2026 is a real EU AI Act planning checkpoint for many teams. Use the free scan now, and request baseline review if security, procurement, or launch pressure is already active.

Pre-Compaction Memory Flush

4 min readTutorials & How-To

Every long-running AI agent session hits the same wall: context compression. The LLM runs out of room, compresses earlier messages, and your system prompt disappears. The agent keeps working. It just forgot its rules.

This is not theoretical. We measured it across our 6-agent production system: 12 rule violations per agent per day after compression events. 23% of tool calls became redundant because the agent forgot what it already did. No error message. No warning. Just silent degradation.

One hook fixes it.

The Hook

#!/usr/bin/env python3
"""Pre-compaction memory flush -- save critical context before compression."""
import json, os, sys
from datetime import datetime
from pathlib import Path

FLUSH_THRESHOLD = 150  # tool calls (~75% of context window)

def main():
    agent = os.environ.get("AGENT_NAME", "default")
    session_file = Path(f"/tmp/session_{agent}.json")

    session = json.loads(session_file.read_text()) if session_file.exists() \
        else {"tool_calls": 0, "files_read": [], "files_written": [], "flushed": False}

    session["tool_calls"] += 1

    if session["tool_calls"] >= FLUSH_THRESHOLD and not session["flushed"]:
        memory_path = Path(f"data/agents/{agent}/MEMORY.md")
        summary = f"\n## Session {datetime.now().isoformat()}\n"
        summary += f"- Files modified: {', '.join(session['files_written'])}\n"
        summary += f"- Tool calls: {session['tool_calls']}\n"
        memory_path.open("a").write(summary)
        session["flushed"] = True

    session_file.write_text(json.dumps(session))

if __name__ == "__main__":
    main()

Register it as a Claude Code PostToolUse hook:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "",
        "hooks": [{"type": "command", "command": "hooks/pre_compaction_flush.py"}]
      }
    ]
  }
}

Every tool call increments the counter. At 150 calls, the hook writes a session summary to persistent storage. When compression happens, the knowledge survives on disk. The agent reads it back on the next substantive action.

Why 150 Tool Calls

Claude's context window is 200K tokens. A typical tool call consumes 1,000-1,500 tokens (the call, the result, the response). At 150 tool calls, you have consumed roughly 150K-225K tokens -- right at the compression boundary.

We tested thresholds from 100 to 200. At 100, the flush fires too early and the summary becomes stale before compression hits. At 200, the flush fires too late and compression has already dropped critical context. 150 hits the sweet spot for our workload.

Your threshold will vary based on tool call verbosity. The principle is the same: flush before compression, not after.

What Changes

Before the hook:

  • Rule violations post-compression: ~12 per agent per day
  • Redundant tool calls: 23% of total
  • Failure mode: silent (agent never reports lost context)

After the hook:

  • Rule violations post-compression: near-zero
  • Redundant tool calls: reduced by 18%
  • Recovery: agent reads flushed memory and restores constraints automatically

We paired this with a second hook -- a PreToolUse memory search enforcer that reminds the agent to read persistent memory before its first substantive action. Belt and suspenders. The combination means context compression goes from a silent failure to a non-event.

The Community Signal

This pattern is spreading. Manthan Gupta's Clawdbot project independently developed the same approach: a two-layer memory system with mandatory memory search enforcement. The convergence is telling -- multiple teams running production AI agents arrived at the same conclusion independently.

Context compression is not a bug to file. It is a structural property of finite-context LLMs. The solution is not a bigger context window. The solution is a hook that treats compression as a lifecycle event and prepares for it.

Why This Is L5

The enforcement ladder has five levels. L2 is prose (documentation). L3 is templates. L4 is tests. L5 is hooks and automation -- enforcement that requires zero awareness from the agent.

The pre-compaction flush is L5 because the agent does not need to know it exists. It does not need to remember to save its memory. It does not need to be told about context compression. The hook fires automatically, every time, without any agent cooperation.

This is the difference between detection-based governance and structural enforcement. A detection system would alert you that the agent lost context after it already shipped rule-violating code. An L5 hook prevents the loss from happening.

Try It

Drop the hook into any Claude Code project. It works with any agent that uses PostToolUse hooks. The pattern adapts to any agentic framework that supports lifecycle callbacks -- the concept is not Claude-specific.

If you want to see where your AI development pipeline stands on compaction vulnerability, enforcement posture, and context health, run a scan:

Proof Path

Keep the next move honest after this article

Run the free repo scan on any public repository to get a quick signal before you buy deeper work.

This post is explanation or saved evidence, not current findings for your repo. Use the proof and product path below instead of stopping at the article.

State right now: this article is explanation or saved evidence for one topic, not Walseth AI's proof page and not current findings for your repo by itself.

Next step: read /proof when you need Walseth AI's current measured proof, or run the free repo scan when you need current public-repo findings before a paid follow-through.

Measured proof

See Walseth AI's current operating proof

This article explains the model or preserves saved evidence. The proof page holds Walseth AI's current measured operating proof.

Repo findings

Run the free scan on your own public repository

Use the free scan when this post makes you ask what your own repo looks like right now instead of staying at explanation or saved examples.

Paid follow-through

Use the baseline sprint when the signal is already real

Choose the baseline sprint after the free scan or an equivalent repo signal confirms a real gap and you need remediation order.

Current article CTA

This post's direct CTA still points to the most relevant next surface for this topic.

Run Free Repo Scan

Get AI Governance Insights

Practical takes on enforcement automation and EU AI Act readiness. No spam.

Newsletter only

What happens

Email updates only

Submitting adds this address to future newsletter sends only.

What it does not do

No service request

It does not start a scan, open a paid lane, or trigger a private follow-up.

If you need help now

Use the right path

Run the free repo scan for current public-repo signal. Request baseline review if the issue is already real.

Related Articles

Framework Governance Scores

See how major AI/ML frameworks score on enforcement posture, context hygiene, and EU AI Act readiness.

Want to know where your AI governance stands?

Get a Free Governance Audit