ElevenLabs Eleven v3 (Alpha) is a highly expressive, performance-driven text-to-speech (TTS) model designed for advanced voice acting, emotional nuance, and directorial control. It is an experimental, alpha-stage model that excels at extreme human-like expressiveness and responsiveness to direction, but comes with notable trade-offs in stability and consistency. It is best suited for creative, character-driven, short, or performance-heavy use cases where multiple generations and iteration are acceptable.
Key Features
-
Advanced Expressiveness & Directing
- Extremely responsive to direction, producing highly human-like performances with emotional depth and variation.
-
Audio Tags via Brackets
- Supports free-text audio tags written in brackets (e.g. [whisper], [angry], [laughs]) to guide delivery and performance.
-
Multilingual Capabilities
- Supports many languages, enabling expressive delivery beyond English.
-
Main Weakness: Consistency & Stability
- As an alpha model, results can vary significantly between generations and often require multiple attempts to achieve the desired output, making it hard to use for longer form content.
Technical Capabilities
- Modalities: Text to Speech
- Custom Voice Cloning: Not supported
- Supported Settings: Speed control (0.5-1.5), Voice Effects
- Emotions available: Emotional delivery controlled via a “Stability” slider, with values 0-100. 0 = Very emotional and unpredictable. 100 = Very stable, book-reading delivery.
-
Voice Tags Options:
- Add pause; for one second pause, insert “<break time="1s" />” as part of the prompt
- Voice Tags; insert absolutely any text in brackets [], such as “[Laughter]” as part of the prompt to direct the voice.
- Accents Available: Only the voice actor’s native accent
- Languages Available: English, French, German, Portuguese, Spanish, Afrikaans, Arabic, Armenian, Assamese, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Catalan, Cebuano, Chichewa, Croatian, Czech, Danish, Dutch, Estonian, Filipino, Finnish, Galician, Georgian, Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Kirghiz, Korean, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malay, Malayalam, Mandarin Chinese, Marathi, Nepali, Norwegian, Pashto, Persian, Polish, Punjabi, Romanian, Russian, Serbian, Sindhi, Slovak, Slovenian, Somali, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh
Best Use Cases
Character & Voice Acting Content
Ideal for scripted characters, branded commercials, dialogue, storytelling, and creative narration that demands emotional performance.
Performance-Heavy Short Form
Best suited for short scripts where expressive delivery is more important than repeatability.
Creative & Experimental Workflows
Strong choice for exploratory creative work where multiple generations and manual selection are expected.
Strengths and Limitations
Strengths
- High Emotional Range: Capable of nuanced, dramatic, and human-like performances.
- Directorial Control: Free-text audio tags enable fine-grained performance guidance.
- Expressive Multilingual Output: Emotional delivery extends across many supported languages.
Limitations
- Low Consistency: Results can vary widely between generations.
- Alpha Stability Issues: May introduce unexpected phrasing, delivery changes, artifacts, or audio profiles
- Iteration Required: Often requires several generations to achieve the desired result.
- Not Production-Safe by Default: Less suitable for high-volume or long-form workflows, and prone to changes.
Tips for Better Prompts
- Use Audio Tags Intentionally: Insert bracketed tags directly before the line they should affect
- Write Like a Script: Stage directions, tone hints, and emotional cues improve results.
- Expect Iteration: Plan for multiple generations and manual selection.
- Keep Takes Short: Shorter scripts reduce instability and increase performance quality.
- Adjust Stability Carefully: Lower stability unlocks expressiveness but increases variability.
-
Use Punctuation for Rhythm: Commas, periods, exclamation marks, and parentheses can be used to direct and guide natural, more expressive pacing.
- For example, for a more dramatic effect, the phrase:
- “Listen, If we walk away today, me, you, all of us, we may never get another chance.”
- Can be written:
- “Listen… If we walk away today? me… you… all of us: we may never! get another chance...”
- For example, for a more dramatic effect, the phrase:
-
Tips for using audio tags:
- Situational Awareness – Tags such as [WHISPER], [SHOUTING], and [SIGH] let Eleven v3 react to the moment—raising stakes, softening warnings, or pausing for suspense.
- Character Performance – From [pirate voice] to [French accent], tags turn narration into role-play. Shift persona mid-line and direct full-on character performances without changing models.
- Emotional Context – Cues like [sigh], [excited], or [tired] steer feelings moment by moment, layering tension, relief, or humor—no re-recording needed.
- Narrative Intelligence – Storytelling is timing. Tags such as [pause], [awe], or [dramatic tone] control rhythm and emphasis so AI voices guide the listener through each beat.
- Multi-Character Dialogue – Write overlapping lines and quick banter with [interrupting], [overlapping], or tone switches. One model, many voices—natural conversation in a single take.
- Delivery Control – Fine-tune pacing and emphasis. Tags like [pause], [rushed], or [drawn out] give precision over tempo, turning plain text into performance.
- Accent Emulation – Switch regions on the fly—[American accent], [British accent], [Southern US accent] and more—for culturally rich speech without model swaps.
Need some more help? Head back to our Help Center.