news 2026-04-15 · simon-willison

🎙️ Google's Gemini Flash TTS Lets You Direct AI Voices Like a Film Director

What if you could direct an AI voice the same way a director guides an actor on set?

"Speak softer here." "Speed up the pacing." "Add a London accent." "Smile through the words."

That's exactly what Google's brand-new Gemini 3.1 Flash TTS does — and it just launched today.

Unlike traditional text-to-speech that reads everything in the same robotic tone, this model accepts natural language prompts that work like director's notes. You write an "audio profile" describing the character's personality, vocal style, pacing, and even regional accent — and it performs accordingly.

🎯 What makes it special:

— Natural language control — no complex coding required

— Accent flexibility — tested with London, Newcastle, and Exeter variants

— Multi-speaker conversations — each character gets a distinct voice

— Emotional direction — "vocal smile," pacing changes, dynamic shifts

— Standard Gemini API access (model ID: gemini-3.1-flash-tts-preview)

Think of the possibilities: podcasts with multiple distinct characters, audiobooks that shift tone with the story, or video narration that actually sounds human.

The era of robotic AI voices is ending. The era of AI voices you can actually direct has just begun.

📄 Source

simon-willison

← Previous

🎨 Weird Prompt Showdown: ERNIE Turbo vs Flux.2 Kl

🔍 AI Can Only Read a Handful of Scripts — Hundred