Voice Clone (MiniMax)
Endpoints
Voice Clone (MiniMax)
Clone a voice from a 10-second to 5-minute reference recording. Returns a voice_id usable in /v1/audio/speech with any MiniMax HD or Turbo SKU.
POST
Voice Clone (MiniMax)
Synchronous endpoint. Upload a reference audio clip via multipart form, get back a
voice_id you can pass to /v1/audio/speech on any MiniMax voice model.
Request
multipart/form-data body.
Reference audio. MP3, WAV, or M4A. Max 10 MB. Duration must be 10 seconds to 5 minutes — Kyma parses the audio header locally and rejects out-of-range files with
400 reference_too_short or 400 reference_too_long before charging. Longer clips don’t improve clone quality and waste upload bandwidth.Optional human-readable label, max 64 chars. Surfaced in the response and stored alongside the ownership row for your reference.
Voice clone SKU. Currently only
minimax-voice-clone is supported.Response
200 OK JSON.
| Field | What |
|---|---|
voice_id | Use this in /v1/audio/speech voice_id field. Namespaced as kyma_<rand>. |
name | Echo of the label you sent (or null). |
cost_usd | Flat charge applied ($2.10). |
balance_usd | Remaining balance after settle. |
X-Kyma-Model, X-Kyma-Cost-USD, and X-Kyma-Balance-USD headers are also set.
Pricing
Flat $2.10 per cloned voice. One-time charge — once cloned, thevoice_id is reusable in unlimited TTS calls.
Ownership
Cloned voice IDs are gated per Kyma user. If user A passes user B’svoice_id to /v1/audio/speech, the request returns 403 voice_not_owned. Voice IDs that aren’t on file are assumed to be MiniMax system voices (browseable by everyone).
Errors
| Status | error.code | When |
|---|---|---|
400 | not_a_voice_clone_model | model is not a clone SKU |
400 | invalid_request | missing file, invalid form data |
400 | reference_too_short | audio duration < 10 seconds |
400 | reference_too_long | audio duration > 5 minutes |
400 | audio_unreadable | could not parse duration — file may be corrupt or in an unsupported codec |
402 | insufficient_credits | balance below $2.10 |
413 | invalid_request | audio file > 10 MB |
415 | invalid_request | audio format not MP3/WAV/M4A |
500 | ownership_write_failed | clone succeeded but ownership row insert failed (no charge applied; safe to retry) |
502 | provider_error | upstream MiniMax failure |
See also
POST /v1/audio/voice-design— generate a voice from a written description (no reference audio)POST /v1/audio/speech— use the cloned voice- Voice Clone (model) — service overview