Audio

Research papers, repositories, and articles about audio

Showing 6 of 6 items

ggml-org/whisper.cpp

A fast C/C++ port of OpenAI’s Whisper that runs on laptops, phones, and edge devices. It’s the go-to option when you need offline speech transcription.

46,559

resemble-ai/chatterbox

Chatterbox is a state-of-the-art open source text-to-speech stack. If you need production-quality voices without a SaaS bill, start here.

16,307

HeartMuLa: A Family of Open Sourced Music Foundation Models

HeartMuLa bundles an audio–text matcher, robust lyric recognizer, music codec, and a music-generating LLM. You get controllable, prompt-driven song generation plus tools for indexing and understanding songs at scale.

Dongchao Yang, Yuxin Xie

OpenBMB/VoxCPM

Tokenizer-free speech model for multilingual text-to-speech, voice design, and cloning. It skips text tokens entirely and works directly on audio representations. If you build voice products, this shows where foundation TTS models are heading.

23,478

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Seedance 1.5 pro jointly generates video and sound from one model rather than bolting audio on later. Content teams can use this to explore tightly synced audio-visual experiences.

Heyi Chen, Siyan Chen

chidiwilliams/buzz

A desktop app that wraps Whisper for local transcription and translation. It turns powerful speech models into a one-click tool for creators and analysts.

17,750