Generation

Research papers, repositories, and articles about generation

Showing 3 of 3 items

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Presents LongVie 2, a world-model-style generator for ultra-long videos with explicit control signals. The model can condition on multimodal inputs and maintain temporal coherence over very long horizons, with a public project page for demos. This sits right at the frontier of ‘video world models’ that might eventually underpin simulation-heavy planning and agent training.

Jianxiong Gao, Zhaoxi Chen

Towards Scalable Pre-training of Visual Tokenizers for Generation

Studies how to pre-train visual tokenizers at scale specifically for generative models, rather than piggybacking on CLIP-like encoders. The paper explores architectures and training setups that produce discrete visual tokens that are more generation-friendly, with released models on GitHub. Visual tokenization is increasingly the bottleneck for efficient, high-fidelity image and video generation, so a focused treatment here is quite timely.

Jingfeng Yao, Yuda Song

DragMesh: Interactive 3D Generation Made Easy

DragMesh offers a real-time framework for interactively generating articulated 3D motion by decoupling kinematics from motion generation, using a dual-quaternion VAE and FiLM conditioning. For 3D/graphics folks, it’s a signal that interactive, physically plausible articulation is becoming practical, not just offline. ([huggingface.co](https://huggingface.co/papers/2512.06424))

Tianshan Zhang, Zeyu Zhang