Memory
Research papers, repositories, and articles about memory
Showing 13 of 13 items
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
ML-Master 2.0 introduces a "hierarchical cognitive cache" that separates short-term logs from long-term strategy for AI agents working for days on ML engineering tasks. It hits state-of-the-art on MLE-Bench, hinting at how to run week-long research agents.
Memory in the Age of AI Agents
A substantial survey that systematizes the fast-growing literature on ‘agent memory’—how agentic LLM systems store, retrieve, and evolve information over time. It proposes a taxonomy across forms (token, parametric, latent), functions (factual, experiential, working) and dynamics, and catalogs existing benchmarks and frameworks. If you’re building agent systems with nontrivial memory, this is quickly becoming the reference map of the territory.
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
EvoArena builds a three-domain benchmark where agents must keep working as terminals, codebases, and user preferences change over time. The companion EvoMem memory system logs non-additive updates as patches, giving measurable gains on both step-level and chain-level success in evolving tasks.
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
MemSkill turns memory operations into skills that an agent can learn, select, and even redesign over time. It beats hand-written memory pipelines on long conversations, documents, and embodied tasks like ALFWorld.
Rethinking Continual Experience Internalization for Self-Evolving LLM Agents
This work studies how agents should turn past trajectories into lasting skills without collapsing or forgetting. It compares different ways to chunk experience, inject it back into the model, and train on it off-policy. If you’re building agents that learn over weeks or months, the design choices here matter more than raw model size.
EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management
EvoDS is a data science agent that learns new tools and manages its own memory over time. It treats both "what skills to learn" and "what to remember" as separate learning problems. If you’re turning analytics workflows into long-lived agents, this is a concrete blueprint.
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
HGMem turns the “scratchpad” of a multi-step retrieval system into a hypergraph that connects many related facts at once. This richer memory structure helps language models keep global context straight over long tasks, boosting performance on challenging reasoning and long-document benchmarks.
letta-ai/letta
Letta is a framework for long-lived agents with memory and tools. Use it to build assistants that actually remember projects over weeks, not prompts.
thedotmack/claude-mem
A Claude Code plugin that logs your coding sessions, compresses them with Claude via the agent SDK, and feeds back relevant context into future sessions. In practice it acts like a persistent, AI-managed memory of your projects, making the assistant far more ‘aware’ of the codebase and past conversations. It’s a concrete, production-friendly take on the “long-term memory for coding agents” idea.
Fast-weight Product Key Memory
Fast-weight Product Key Memory adds a dynamic, almost "scratchpad" store alongside the usual attention in language models. It aims to keep the efficiency of linear attention while recovering much of softmax attention’s ability to remember rare, long-range details.
cpacker/MemGPT
MemGPT explores memory systems for language agents, mixing long-term and short-term storage. Steal ideas from here before reinventing your own memory manager.
Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration
The authors release LMEE-Bench to test how agents explore and remember in long-horizon 3D tasks. Their MemoryExplorer method trains a vision-language model with reinforcement learning to actively query and use episodic memory.
vectorize-io/hindsight
Hindsight provides a human-like memory layer for agents, inspired by cognitive science. Use it to move past naive "stuff everything in context" strategies.