Audio Music
Endpoints
Audio Music
Prompt-driven music generation. Synchronous, returns audio bytes, billed per second (ElevenLabs) or per song (MiniMax).
POST
Audio Music
Synchronous endpoint across two providers:
With
The
ElevenLabs is duration-proportional, MiniMax is flat per song. Cost comparison for typical lengths:
MiniMax is 19–190× cheaper than ElevenLabs for full-length tracks. Pick ElevenLabs only when you need fine-grained length control on a short bed (under ~1.5 seconds is the only crossover); otherwise the flat MiniMax pricing wins. ElevenLabs’s worst-case rate is locked at the Pro tier overage to stay safe across subscription levels.
Errors include the SKU name and the field that overflowed:
- ElevenLabs Music (
elevenlabs-music) — caller suppliesmusic_length_ms, billed per second. - MiniMax Music 2.0 (
minimax-music) — provider derives length from lyrics, flat per-song price. - MiniMax Music Pro (
minimax-music-pro) — same flat-per-song shape, higher fidelity. Currently backed bymusic-2.6(upgraded frommusic-2.5on 2026-05-17; API contract and pricing unchanged).
Request
application/json body.
One of
elevenlabs-music, minimax-music, minimax-music-pro.Description of the music. Max 2000 characters. Required for
elevenlabs-music. For MiniMax models the prompt drives style/arrangement; lyrics drive vocals.MiniMax music only. Vocal content with optional structural tags:
[verse], [chorus], [bridge], [Instrumental]. Pass [Instrumental] alone for a vocal-less track. ElevenLabs ignores this field.Output duration in milliseconds. Range:
1000 (1 second) to 300000 (5 minutes). ElevenLabs honors this; MiniMax derives length from the lyrics. Also accepts duration_ms as an alias.Audio format. Same options as audio/speech.
Optional query param. Set to
1 to have Kyma upload the resulting MP3 to Vercel Blob, write a multimodal_jobs row (so the call appears in your gallery + share pages), and return JSON { job_id, kind: "audio", url, duration_sec, cost_usd, balance_usd } instead of streaming bytes. Used by the Canvas and Muse audio kinds — most direct API callers don’t need this.Response
Default (streaming)
200 OK with audio bytes. Headers:
| Header | What |
|---|---|
X-Kyma-Model | resolved model id |
X-Kyma-Duration-Sec | clip duration used for billing |
X-Kyma-Cost-USD | actual cost charged |
X-Kyma-Balance-USD | remaining balance |
With ?save_to_blob=1
200 OK with JSON:
url is a permanent Vercel Blob URL — safe to embed in apps, store in a DB, share publicly.
Pricing
| SKU | Price | Pricing mode |
|---|---|---|
elevenlabs-music | $0.135 / sec | Per-second (caller-controlled length) |
minimax-music | $0.045 / song | Flat per generation |
minimax-music-pro | $0.21 / song | Flat per generation |
| Length | ElevenLabs Music | MiniMax Music Pro | MiniMax Music |
|---|---|---|---|
| 30 seconds | $4.05 | $0.21 | $0.045 |
| 60 seconds | $8.10 | $0.21 | $0.045 |
| 3 minutes | $24.30 | $0.21 | $0.045 |
| 5 minutes (max) | $40.50 | $0.21 | $0.045 |
Char limits
Per upstream provider — the gateway rejects oversized inputs before forwarding so you don’t burn bandwidth + a hold. Counts are NFC-normalized so Vietnamese / accented Latin / other diacritic-heavy scripts measure by visible characters, not UTF-16 code units.| SKU | prompt | lyrics | Why |
|---|---|---|---|
elevenlabs-music | 2000 chars | (ignored) | Free-form description, vocals included via prompt cues |
minimax-music | 200 chars | 600 chars | Provider splits style cue from vocal content |
minimax-music-pro | 200 chars | 600 chars | Same shape as minimax-music |
Errors
| Status | error.code | When |
|---|---|---|
400 | not_a_music_model | model is not a music SKU |
400 | prompt_too_long | prompt > 2000 chars |
400 | invalid_duration | music_length_ms outside [1000, 300000] |
401 | auth_error | missing or invalid API key |
402 | billing_error | balance too low |
502 | provider_error | upstream provider failure |
See also
POST /v1/audio/sfx— pair music beds with sound effectsPOST /v1/audio/speech— narration to layer over the musicGET /v1/audio/voices— voice picker for narration