Google rolls out Gemini 3.1 Flash TTS with support for 70+ languages and developer audio tags

April 15, 2026
Close-up of CSS code displayed on a computer monitor, showcasing web development.
Photo by Negative Space on Pexels

What's new

Google has rolled out Gemini 3.1 Flash TTS, a new text‑to‑speech model it says is its most expressive yet. It supports more than 70 languages and adds "audio tags" that give developers fine‑grained control over how speech is rendered. Short sentences. Big promise. Want your app to whisper, pause for effect, or hammer home a punchline? These tags are designed to let you do that programmatically.

It has been reported that the audio tags enable adjustments to prosody, emphasis, pacing and other vocal attributes so developers can sculpt delivery without retraining models. The upgrade reportedly also expands voice diversity and realism, making narrations, voice assistants, and accessibility tools sound more natural and human‑like. Google frames this as an enabling move for creators and product teams who need flexible speech output at scale.

Why it matters

TTS is no longer just robotic announcements. It’s now part of an arms race to produce convincing, controllable voice. Gemini 3.1 arrives as competitors push on expressiveness and multilingual reach — a trend that has unlocked new creative possibilities but also raised questions about misuse. It has been reported that some observers urge caution on reputation and deepfake risks, while developers celebrate the ability to iterate on voice UX more quickly.

In short: more languages, more control, more realism. For product teams building everything from screen readers to storytelling apps, that’s exciting. For everyone else? Listen closely. The voice revolution just got louder.

Sources: the-decoder.com