Embeddings
Research papers, repositories, and articles about embeddings
Showing 3 of 3 items
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
VQRAE introduces a unified visual tokenizer that can simultaneously support high-level multimodal understanding and discrete-token image generation. Building on a pretrained vision encoder and a high-dimensional semantic VQ codebook, it yields continuous semantic features for reasoning and discrete tokens for reconstruction, showing that quantizing semantic encoders with large codebooks can preserve both meaning and detail.
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization
Omni-Attribute is an open-vocabulary attribute encoder that learns to isolate specific visual factors—like style, lighting, or expression—rather than entangling everything into a single holistic embedding. Using curated positive/negative pairs and a dual generative/contrastive objective, it produces attribute-specific embeddings that are better for retrieval, personalization, and compositional image generation.
RO-ViT: Region-aware pre-training for open-vocabulary ...
RO‑ViT proposes a region-aware pretraining scheme for vision transformers that uses cropped positional embeddings and focal loss to better align image–text pretraining with region-level object detection. Developers building open‑vocabulary detectors can reuse these ideas—plus the released code—to boost novel‑class detection without changing model capacity, especially when fine‑tuning ViT backbones on detection datasets. ([ai.googleblog.com](https://ai.googleblog.com/2023/08/ro-vit-region-aware-pre-training-for.html))