Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Lays out a five-level roadmap for visual generation, from basic image mapping up to interactive world modeling for agents. Argues the next race is about structure, memory, and causality, not prettier pictures. If you work on vision models, benchmark against these levels, not just FID-style metrics. ([huggingface.co](https://huggingface.co/papers/2604.28185))
Keming Wu, Zuhao Yang