Audio Music - Kyma API

Synchronous endpoint across two providers:

ElevenLabs Music (elevenlabs-music) — caller supplies music_length_ms, billed per second.
MiniMax Music 2.0 (minimax-music) — provider derives length from lyrics, flat per-song price.
MiniMax Music Pro (minimax-music-pro) — same flat-per-song shape, higher fidelity. Currently backed by music-2.6 (upgraded from music-2.5 on 2026-05-17; API contract and pricing unchanged).

# ElevenLabs — pure description
curl -X POST https://kymaapi.com/v1/audio/music \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "elevenlabs-music",
    "prompt": "warm lo-fi hip-hop, jazz piano, soft snare, late evening city ambience",
    "music_length_ms": 30000
  }' \
  --output bed.mp3

# MiniMax — style prompt + structured lyrics
curl -X POST https://kymaapi.com/v1/audio/music \
  -H "Authorization: Bearer $KYMA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-music-pro",
    "prompt": "uplifting indie pop, female vocals, anthemic chorus",
    "lyrics": "[verse]\nWalking through the morning light\n[chorus]\nWe will rise again tonight\n[Instrumental]"
  }' \
  --output song.mp3

Request

application/json body.

model

string

default:"elevenlabs-music"

One of elevenlabs-music, minimax-music, minimax-music-pro.

prompt

string

Description of the music. Max 2000 characters. Required for elevenlabs-music. For MiniMax models the prompt drives style/arrangement; lyrics drive vocals.

lyrics

string

MiniMax music only. Vocal content with optional structural tags: [verse], [chorus], [bridge], [Instrumental]. Pass [Instrumental] alone for a vocal-less track. ElevenLabs ignores this field.

music_length_ms

integer

default:"30000"

Output duration in milliseconds. Range: 1000 (1 second) to 300000 (5 minutes). ElevenLabs honors this; MiniMax derives length from the lyrics. Also accepts duration_ms as an alias.

response_format

string

default:"mp3_44100_128"

Audio format. Same options as audio/speech.

save_to_blob

string

Optional query param. Set to 1 to have Kyma upload the resulting MP3 to Vercel Blob, write a multimodal_jobs row (so the call appears in your gallery + share pages), and return JSON { job_id, kind: "audio", url, duration_sec, cost_usd, balance_usd } instead of streaming bytes. Used by the Canvas and Muse audio kinds — most direct API callers don’t need this.

Response

Default (streaming)

200 OK with audio bytes. Headers:

Header	What
`X-Kyma-Model`	resolved model id
`X-Kyma-Duration-Sec`	clip duration used for billing
`X-Kyma-Cost-USD`	actual cost charged
`X-Kyma-Balance-USD`	remaining balance

With `?save_to_blob=1`

200 OK with JSON:

{
  "object": "audio.generation",
  "job_id": "mmj_a1b2c3d4e5f6...",
  "kind": "audio",
  "model": "minimax-music-pro",
  "url": "https://blob.vercel-storage.com/mmj/.../mmj_a1b2....mp3",
  "duration_sec": 86,
  "cost_usd": 0.21,
  "balance_usd": 47.79
}

The url is a permanent Vercel Blob URL — safe to embed in apps, store in a DB, share publicly.

Pricing

SKU	Price	Pricing mode
`elevenlabs-music`	$0.135 / sec	Per-second (caller-controlled length)
`minimax-music`	$0.045 / song	Flat per generation
`minimax-music-pro`	$0.21 / song	Flat per generation

ElevenLabs is duration-proportional, MiniMax is flat per song. Cost comparison for typical lengths:

Length	ElevenLabs Music	MiniMax Music Pro	MiniMax Music
30 seconds	$4.05	$0.21	$0.045
60 seconds	$8.10	$0.21	$0.045
3 minutes	$24.30	$0.21	$0.045
5 minutes (max)	$40.50	$0.21	$0.045

MiniMax is 19–190× cheaper than ElevenLabs for full-length tracks. Pick ElevenLabs only when you need fine-grained length control on a short bed (under ~1.5 seconds is the only crossover); otherwise the flat MiniMax pricing wins. ElevenLabs’s worst-case rate is locked at the Pro tier overage to stay safe across subscription levels.

Char limits

Per upstream provider — the gateway rejects oversized inputs before forwarding so you don’t burn bandwidth + a hold. Counts are NFC-normalized so Vietnamese / accented Latin / other diacritic-heavy scripts measure by visible characters, not UTF-16 code units.

SKU	`prompt`	`lyrics`	Why
`elevenlabs-music`	2000 chars	(ignored)	Free-form description, vocals included via prompt cues
`minimax-music`	200 chars	600 chars	Provider splits style cue from vocal content
`minimax-music-pro`	200 chars	600 chars	Same shape as `minimax-music`

Errors include the SKU name and the field that overflowed:

prompt too long for minimax-music-pro: 245 chars (max 200).
lyrics too long for minimax-music-pro: 612 chars (max 600). Use [Instrumental] for vocal-less.

Errors

Status	`error.code`	When
`400`	`not_a_music_model`	`model` is not a music SKU
`400`	`prompt_too_long`	prompt > 2000 chars
`400`	`invalid_duration`	`music_length_ms` outside `[1000, 300000]`
`401`	`auth_error`	missing or invalid API key
`402`	`billing_error`	balance too low
`502`	`provider_error`	upstream provider failure

​Request

​Response

​Default (streaming)

​With ?save_to_blob=1

​Pricing

​Char limits

​Errors

​See also

Request

Response

Default (streaming)

With `?save_to_blob=1`

Pricing

Char limits

Errors

See also