Agent Memory Intelligence Benchmark. Existing benchmarks reward flashy retrieval metrics (Recall@k) that don't correlate with downstream task quality. Engram measures what actually matters: does the agent perform its job better with this memory system than without it?
A three-tier evaluation framework weighted 20/40/40: retrieval quality, knowledge management (temporal accuracy, contradiction resolution, long-horizon retention, staleness detection, context efficiency), and actual agent task performance delta. Adapter-based architecture so any memory system can participate by implementing four methods. Positioned to fill gaps in existing benchmarks including LongMemEval, LoCoMo, MemoryAgentBench, and MEMTRACK.
Maintained deliberately alongside MemForge so my own system — and competitors — can be evaluated honestly and publicly.
Fund this project