TH
โ† Back
news 2026-04-07 ยท github-trending

๐ŸŽ™๏ธ VibeVoice โ€” Microsoft Open-Sources Voice AI That Handles 1-Hour Audio

๐ŸŽ™๏ธ VibeVoice โ€” Microsoft Open-Sources Voice AI That Handles 1-Hour Audio

What if AI could listen to your entire hour-long meeting, identify every speaker, and transcribe everything โ€” for free?

Microsoft just dropped VibeVoice, a family of open-source frontier voice AI models under the MIT license that handles both speech-to-text and text-to-speech at a level that rivals proprietary solutions.


๐ŸŽฏ What it does:


The secret sauce? Ultra-low frame rate speech tokenizers running at 7.5 Hz combined with a next-token diffusion framework. Translation: it understands context like an LLM but produces audio quality like a diffusion model.


Imagine recording a team meeting and getting a perfectly formatted transcript โ€” who said what, when, with key points highlighted. Or creating a multi-voice AI podcast that sounds genuinely human.

All models are available on Hugging Face. MIT licensed. No strings attached.

๐Ÿ“„ Source

github-trending
Share: Facebook ๐•
โ† Previous
๐Ÿ•ต๏ธ Microsoft Catches Poisoned AI Models โ€” They Wo
Next โ†’
๐Ÿ” MinerU2.5-Pro Hits 95.69% on OmniDocBench โ€” A N