World Models

Research papers, repositories, and articles about world models

Showing 2 of 2 items

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Presents LongVie 2, a world-model-style generator for ultra-long videos with explicit control signals. The model can condition on multimodal inputs and maintain temporal coherence over very long horizons, with a public project page for demos. This sits right at the frontier of ‘video world models’ that might eventually underpin simulation-heavy planning and agent training.

Jianxiong Gao, Zhaoxi Chen

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

Presents DrivePI, a 4D (3D + time) multimodal large model for autonomous driving that unifies perception, prediction, and planning. Instead of separate stacks, DrivePI treats driving as a holistic spatial-temporal understanding problem, ingesting sensor data and outputting both scene interpretations and future trajectories. It’s another sign that end-to-end or semi end-to-end ‘driving MLLMs’ are becoming a serious research direction.

Zhe Liu, Runhui Huang