Back to AI Lab

Memory

Research papers, repositories, and articles about memory

Showing 13 of 13 items

Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

ML-Master 2.0 introduces a "hierarchical cognitive cache" that separates short-term logs from long-term strategy for AI agents working for days on ML engineering tasks. It hits state-of-the-art on MLE-Bench, hinting at how to run week-long research agents.

Xinyu Zhu, Yuzhu Cai

Memory in the Age of AI Agents

A substantial survey that systematizes the fast-growing literature on ‘agent memory’—how agentic LLM systems store, retrieve, and evolve information over time. It proposes a taxonomy across forms (token, parametric, latent), functions (factual, experiential, working) and dynamics, and catalogs existing benchmarks and frameworks. If you’re building agent systems with nontrivial memory, this is quickly becoming the reference map of the territory.

Yuyang Hu, Shichun Liu

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

EvoArena builds a three-domain benchmark where agents must keep working as terminals, codebases, and user preferences change over time. The companion EvoMem memory system logs non-additive updates as patches, giving measurable gains on both step-level and chain-level success in evolving tasks.

Jundong Xu, Qingchuan Li

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

MemSkill turns memory operations into skills that an agent can learn, select, and even redesign over time. It beats hand-written memory pipelines on long conversations, documents, and embodied tasks like ALFWorld.

Haozhen Zhang, Quanyu Long

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

This work studies how agents should turn past trajectories into lasting skills without collapsing or forgetting. It compares different ways to chunk experience, inject it back into the model, and train on it off-policy. If you’re building agents that learn over weeks or months, the design choices here matter more than raw model size.

Jingwen Chen, Wenkai Yang

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

EvoDS is a data science agent that learns new tools and manages its own memory over time. It treats both "what skills to learn" and "what to remember" as separate learning problems. If you’re turning analytics workflows into long-lived agents, this is a concrete blueprint.

Zherui Yang, Fan Liu

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

HGMem turns the “scratchpad” of a multi-step retrieval system into a hypergraph that connects many related facts at once. This richer memory structure helps language models keep global context straight over long tasks, boosting performance on challenging reasoning and long-document benchmarks.

Chulun Zhou, Chunkang Zhang

letta-ai/letta

Letta is a framework for long-lived agents with memory and tools. Use it to build assistants that actually remember projects over weeks, not prompts.

19,930

thedotmack/claude-mem

A Claude Code plugin that logs your coding sessions, compresses them with Claude via the agent SDK, and feeds back relevant context into future sessions. In practice it acts like a persistent, AI-managed memory of your projects, making the assistant far more ‘aware’ of the codebase and past conversations. It’s a concrete, production-friendly take on the “long-term memory for coding agents” idea.

7,300

Fast-weight Product Key Memory

Fast-weight Product Key Memory adds a dynamic, almost "scratchpad" store alongside the usual attention in language models. It aims to keep the efficiency of linear attention while recovering much of softmax attention’s ability to remember rare, long-range details.

Tianyu Zhao, Llion Jones

cpacker/MemGPT

MemGPT explores memory systems for language agents, mixing long-term and short-term storage. Steal ideas from here before reinventing your own memory manager.

19,600

Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration

The authors release LMEE-Bench to test how agents explore and remember in long-horizon 3D tasks. Their MemoryExplorer method trains a vision-language model with reinforcement learning to actively query and use episodic memory.

Sen Wang, Bangwei Liu

vectorize-io/hindsight

Hindsight provides a human-like memory layer for agents, inspired by cognitive science. Use it to move past naive "stuff everything in context" strategies.

619