Agent Memory Intelligence Benchmark. Existing benchmarks reward flashy retrieval metrics (Recall@k) that don't correlate with downstream task quality. Engram measures what actually matters: does the agent perform its job better with this memory system than without it?

A three-tier evaluation framework weighted 20/40/40: retrieval quality, knowledge management (temporal accuracy, contradiction resolution, long-horizon retention, staleness detection, context efficiency), and actual agent task performance delta. Adapter-based architecture so any memory system can participate by implementing four methods. Positioned to fill gaps in existing benchmarks including LongMemEval, LoCoMo, MemoryAgentBench, and MEMTRACK.

Maintained deliberately alongside MemForge so my own system — and competitors — can be evaluated honestly and publicly.

Fund this project

Unverified URL

The funding manifest has not provided proof via wellKnown that this link is associated with it. Learn more.

Continue