Microsoft's MAI-Transcribe-1 runs 2.5x faster than its predecessor at $0.36 per audio hour

Microsoft's MAI-Transcribe-1 runs 2.5x faster than its predecessor at $0.36 per audio hour Microsoft has introduced MAI-Transcribe-1, a speech-to-text model supporting 25 languages that achieves the lowest word error rate of any model tested on the FLEURS benchmark, beating Scribe v2, Whisper-large-V3, GPT-Transcribe, and Gemini 3.1 Flash-Lite. The model is also built to handle tough recording conditions like background noise, poor audio quality, and overlapping speech, Microsoft says. Microsoft is rolling out MAI-Transcribe-1 across Copilot Voice and Microsoft Teams. Developers can try it as a public preview through Microsoft Foundry and the Microsoft AI Playground. The model runs 2.5 times faster than Microsoft's previous Azure Fast offering and costs $0.36 per audio hour.

Combined with MAI-Voice-1 and a language model, it can also power voice agents, Microsoft says. Cohere and Mistral recently released open-source alternatives that perform at a similar level. AI News Without the Hype – Curated by Humans As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive. Subscribe now.

မူရင်းသတင်းရင်းမြစ်များ