Image Generation
Research papers, repositories, and articles about image generation
Showing 3 of 3 items
StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space
StereoSpace is a diffusion-based monocular-to-stereo system that learns geometric consistency purely from viewpoint conditioning, without explicitly predicting depth or doing warping. The authors also propose a strictly "geometry-free at test time" evaluation protocol and show their method produces sharper parallax and more comfortable stereo than existing depth- or warp-based pipelines.
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
VQRAE introduces a unified visual tokenizer that can simultaneously support high-level multimodal understanding and discrete-token image generation. Building on a pretrained vision encoder and a high-dimensional semantic VQ codebook, it yields continuous semantic features for reasoning and discrete tokens for reconstruction, showing that quantizing semantic encoders with large codebooks can preserve both meaning and detail.
DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance
DuetSVG proposes a unified multimodal model that generates both raster images and SVG code jointly, using the image stream to guide SVG token decoding. By letting the model "see" what it’s drawing during generation, it produces vector graphics that are more visually faithful, semantically correct, and syntactically clean than text-only SVG generators.