OmniMem: Scalable and Adaptive Memory Retrieval for Long Video Generation
Summary
OmniMem adds an external memory system to long-video generators so they can re-use past details instead of re-encoding full histories. It adapts which frames to keep, letting models generate longer, more coherent clips under fixed compute. If you’re chasing hour-scale video worlds, this is a template for managing context.
Topics
Related Content
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
The authors build SpatialClaw, a code-driven agent that uses a stateful Python kernel plus vision tools to solve 3D and 4D spatial puzzles. It beats prior spatial agents across 20 benchmarks and six vision-language backbones, showing that the action interface design can unlock much stronger spatial reasoning.
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
Wraps real robots in a closed-loop system where coding agents iteratively reset scenes, run policies, check results, and improve code. If you’re serious about autonomous robot labs, this is basically a blueprint.
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
Builds thousands of synthetic "computers" with realistic files and calendars to simulate month-long knowledge work for AI agents. Each run spans 8+ hours and ~2,000 steps, yielding dense signals for training long-horizon productivity agents. If you are designing office copilots or agent training curricula, copy this setup to cheaply generate rich experience data. ([arxiv.org](https://arxiv.org/abs/2604.28181))
Kling-Omni Technical Report
Kling-Omni is a unified system for generating and editing high-end video from text, images, and video context. Treat it as a reference design for next-gen multimodal world simulators.