Embeddings

Research papers, repositories, and articles about embeddings

Showing 3 of 3 items

VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction

VQRAE introduces a unified visual tokenizer that can simultaneously support high-level multimodal understanding and discrete-token image generation. Building on a pretrained vision encoder and a high-dimensional semantic VQ codebook, it yields continuous semantic features for reasoning and discrete tokens for reconstruction, showing that quantizing semantic encoders with large codebooks can preserve both meaning and detail.

Sinan Du, Jiahao Guo

Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Omni-Attribute is an open-vocabulary attribute encoder that learns to isolate specific visual factors—like style, lighting, or expression—rather than entangling everything into a single holistic embedding. Using curated positive/negative pairs and a dual generative/contrastive objective, it produces attribute-specific embeddings that are better for retrieval, personalization, and compositional image generation.

Tsai-Shien Chen, Aliaksandr Siarohin

RO-ViT: Region-aware pre-training for open-vocabulary ...

RO‑ViT proposes a region-aware pretraining scheme for vision transformers that uses cropped positional embeddings and focal loss to better align image–text pretraining with region-level object detection. Developers building open‑vocabulary detectors can reuse these ideas—plus the released code—to boost novel‑class detection without changing model capacity, especially when fine‑tuning ViT backbones on detection datasets. ([ai.googleblog.com](https://ai.googleblog.com/2023/08/ro-vit-region-aware-pre-training-for.html))

Google AI Blog