
This talk was given at ViennaJS May 2026 by Edouard Maleix.
Most teams manage agent context with static rule files — CLAUDE.md, AGENTS.md, custom YAML checklists, skills, etc. I do too. Then I tracked what actually happened: agents repeated the same mistakes across sessions, rules accumulated without evidence they helped, and review load stayed constant no matter how many rules I added.
This talk covers what I learned building and testing a different approach: treating context as a living artifact with a lifecycle — generated from real incidents, curated for gaps, compiled to token budgets, and evaluated before injection. The key insight: context you can't measure is context you can't improve.
I'll walk through three concrete problems and the patterns I found to address them:
1. Attribution — when an agent opens a PR, you can't tell what it wrote, why, or whether a human reviewed it. Giving agents their own signing keys and git identity changes code review from guessing to auditing.
2. Memory that persists— Monday you correct an agent, Tuesday it makes the same mistake. I built a typed diary (episodic, procedural, semantic, reflection entries) that survives across sessions and compiles into token-budget context packs. The difference vs. static rules: rules authored from memory can't compound; entries harvested from incidents can.
3. Evaluating context, not just code — SWE-bench tests whether agents can fix bugs. I needed something different: does this context pack actually help the agent avoid known mistakes? I'll show why scenarios harvested from real incidents (20–67% baseline → 95–100% with context) wildly outperform auto-generated ones.
The talk is grounded in real workflows, real failures, and real data. The patterns — identity, lifecycle-based memory, context-as-testable-artifact — are applicable regardless of which agent or framework you use.