Audio Speech (TTS)
Endpoints
Audio Speech (TTS)
Text-to-speech. Synchronous, returns audio bytes, billed per character. Several voice models, 3,000+ pre-made voices.
POST
Audio Speech (TTS)
Synchronous endpoint. Send text, get back audio bytes in one call. Pick a model based on quality needs vs latency budget.
Request
application/json body.
One of:
eleven-v3— most expressive, audio tags + emotional range, 70+ languages ($0.405/1K char)eleven-multilingual-v2— hero quality, 29 languages, expressive ($0.405/1K char)eleven-flash-v2-5— ~75ms time-to-first-byte, real-time agents ($0.20/1K char)eleven-turbo-v2-5— balanced quality + speed ($0.20/1K char)
Text to synthesize. Max 5000 characters per request — chunk longer text client-side. Also accepts the alias
text for OpenAI compatibility.ElevenLabs voice id — opaque string from
GET /v1/audio/voices. Also accepts the alias voice. There is no global default; pick one explicitly.Audio format.
mp3_44100_128 (default), mp3_44100_192, mp3_22050_32, pcm_16000/22050/24000/44100, ulaw_8000.Optional fine-tuning:
stability (0–1), similarity_boost (0–1), style (0–1), use_speaker_boost (boolean).Opt-in low-latency mode. When
true, audio is delivered progressively as it is synthesized — time-to-first-audio drops to ~0.4s (vs ~1.8s for the default full-buffer path). Transparent to your code: the response is still a single complete audio/mpeg stream you can pipe straight to disk or a player. Currently applies to the MiniMax speech models; ignored by models that don’t support progressive synthesis.Response
200 OK with raw audio bytes. Billing rides on X-Kyma-* response headers — the body stays a clean audio file you can pipe straight to disk or play.
| Header | What |
|---|---|
Content-Type | matches the requested format (audio/mpeg for mp3, etc.) |
X-Kyma-Model | resolved model id |
X-Kyma-Chars-Billed | input char count used for pricing |
X-Kyma-Cost-USD | actual cost charged |
X-Kyma-Balance-USD | remaining balance |
Errors
| Status | error.code | When |
|---|---|---|
400 | not_a_tts_model | model is not a TTS SKU |
400 | voice_required | voice / voice_id missing |
400 | input_too_long | input text > 5000 chars |
400 | invalid_request | invalid JSON or missing input |
401 | auth_error | missing or invalid API key |
402 | billing_error | balance too low |
502 | provider_error | upstream TTS provider failure |
See also
GET /v1/audio/voices— browse the voice libraryPOST /v1/audio/music— generative musicPOST /v1/audio/sfx— sound effects