Audio
Research papers, repositories, and articles about audio
Showing 6 of 6 items
ggml-org/whisper.cpp
A fast C/C++ port of OpenAI’s Whisper that runs on laptops, phones, and edge devices. It’s the go-to option when you need offline speech transcription.
resemble-ai/chatterbox
Chatterbox is a state-of-the-art open source text-to-speech stack. If you need production-quality voices without a SaaS bill, start here.
HeartMuLa: A Family of Open Sourced Music Foundation Models
HeartMuLa bundles an audio–text matcher, robust lyric recognizer, music codec, and a music-generating LLM. You get controllable, prompt-driven song generation plus tools for indexing and understanding songs at scale.
OpenBMB/VoxCPM
Tokenizer-free speech model for multilingual text-to-speech, voice design, and cloning. It skips text tokens entirely and works directly on audio representations. If you build voice products, this shows where foundation TTS models are heading.
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
Seedance 1.5 pro jointly generates video and sound from one model rather than bolting audio on later. Content teams can use this to explore tightly synced audio-visual experiences.
chidiwilliams/buzz
A desktop app that wraps Whisper for local transcription and translation. It turns powerful speech models into a one-click tool for creators and analysts.