Audio Speech (TTS)

Synchronous endpoint. Send text, get back audio bytes in one call. Pick a model based on quality needs vs latency budget.

curl -X POST https://kymaapi.com/v1/audio/speech \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "eleven-multilingual-v2",
    "input": "The first move is what sets everything in motion.",
    "voice_id": "JBFqnCBsd6RMkjVDRZzb",
    "response_format": "mp3_44100_128"
  }' \
  --output speech.mp3

Request

application/json body.

model

string

default:"eleven-multilingual-v2"

One of:

eleven-v3 — most expressive, audio tags + emotional range, 70+ languages ($0.405/1K char)
eleven-multilingual-v2 — hero quality, 29 languages, expressive ($0.405/1K char)
eleven-flash-v2-5 — ~75ms time-to-first-byte, real-time agents ($0.20/1K char)
eleven-turbo-v2-5 — balanced quality + speed ($0.20/1K char)

input

string

required

Text to synthesize. Max 5000 characters per request — chunk longer text client-side. Also accepts the alias text for OpenAI compatibility.

voice_id

string

required

ElevenLabs voice id — opaque string from GET /v1/audio/voices. Also accepts the alias voice. There is no global default; pick one explicitly.

response_format

string

default:"mp3_44100_128"

Audio format. mp3_44100_128 (default), mp3_44100_192, mp3_22050_32, pcm_16000/22050/24000/44100, ulaw_8000.

voice_settings

object

Optional fine-tuning: stability (0–1), similarity_boost (0–1), style (0–1), use_speaker_boost (boolean).

stream

boolean

default:"false"

Opt-in low-latency mode. When true, audio is delivered progressively as it is synthesized — time-to-first-audio drops to ~0.4s (vs ~1.8s for the default full-buffer path). Transparent to your code: the response is still a single complete audio/mpeg stream you can pipe straight to disk or a player. Currently applies to the MiniMax speech models; ignored by models that don’t support progressive synthesis.

Response

200 OK with raw audio bytes. Billing rides on X-Kyma-* response headers — the body stays a clean audio file you can pipe straight to disk or play.

Header	What
`Content-Type`	matches the requested format (`audio/mpeg` for mp3, etc.)
`X-Kyma-Model`	resolved model id
`X-Kyma-Chars-Billed`	input char count used for pricing
`X-Kyma-Cost-USD`	actual cost charged
`X-Kyma-Balance-USD`	remaining balance

Errors

Status	`error.code`	When
`400`	`not_a_tts_model`	`model` is not a TTS SKU
`400`	`voice_required`	`voice` / `voice_id` missing
`400`	`input_too_long`	input text > 5000 chars
`400`	`invalid_request`	invalid JSON or missing input
`401`	`auth_error`	missing or invalid API key
`402`	`billing_error`	balance too low
`502`	`provider_error`	upstream TTS provider failure

​Request

​Response

​Errors

​See also

Request

Response

Errors

See also