GitHub Repository

OpenBMB/VoxCPM

June 1, 202623,4782,715

Summary

Tokenizer-free speech model for multilingual text-to-speech, voice design, and cloning. It skips text tokens entirely and works directly on audio representations. If you build voice products, this shows where foundation TTS models are heading.

Topics

audio multilingual generative-models

View Original

Related Content

stable-diffusion-webui

stable-diffusion-webui by AUTOMATIC1111 is the de facto standard local web interface for Stable Diffusion, providing a massive feature set—txt2img, img2img, inpainting/outpainting, upscaling, LoRA/embeddings support, training utilities, and a huge extension ecosystem—on top of consumer GPUs. If you’re doing any kind of image generation or fine-tuning with Stable Diffusion in a local or lab environment, this is usually the first tool people reach for and the one most community workflows target. ([github.com](https://github.com/AUTOMATIC1111/stable-diffusion-webui?utm_source=openai))

T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

T-pro 2.0 is an open-weight Russian large language model focused on hybrid reasoning: it can answer directly or emit explicit reasoning traces, and it’s optimized for low-latency inference via speculative decoding. Alongside the model, the authors release a Russian instruction corpus, a math benchmark, and an EAGLE-based inference stack, making it a practical foundation for Russian-language reasoning applications.

huggingface/transformers

The standard library for state-of-the-art models in text, vision, audio, and combined formats. If you build with open models, you almost certainly depend on this already.

opendatalab/MinerU

Pipeline that converts messy PDFs and Office docs into clean markdown or JSON tuned for LLM and agent workflows. It's quickly becoming a standard pre-processing tool. Plug it in if you're serious about document-heavy RAG. ([github.com](https://github.com/trending?since=daily))