Overview
gpt-4o-mini-transcribe-2025-12-15 is OpenAI’s premium speech-to-text model, surfaced on Kyma through the transcribe-quality alias. It’s the right pick when accuracy on real-world audio matters more than raw cost — conversational dialogue, noisy environments, and multilingual code-switching (Vietnamese ↔ English mixing, Mandarin ↔ English, etc.) where Whisper Turbo’s distilled decoder gives ground.
Ships alongside the default transcribe alias (which resolves to Whisper Large v3 Turbo). Two aliases, one decision per request: pick transcribe for high-volume bulk transcription, transcribe-quality when accuracy is the constraint.
Specs
| Field | Value |
|---|---|
| Model ID | gpt-4o-mini-transcribe-2025-12-15 |
| Alias | transcribe-quality |
| Creator / Provider | OpenAI |
| Best for | Noisy / conversational / code-switching audio, multilingual dictation |
| Max file size | 25 MB (multipart upload) |
| Input modalities | Audio (mp3, wav, m4a, ogg, webm, flac) |
| Output modalities | Text |
| Pricing mode | Per minute |
| Min billable | 1 minute (rounded up) |
Pricing
| Cost | |
|---|---|
| Per minute | $0.00405 |
| 1-hour file | $0.243 |
| 5-second clip | $0.00405 (rounds up to 1 min) |
transcribe; reserve transcribe-quality for the cases that need it.
Use this when
- Audio contains code-switching (e.g. Vietnamese + English in the same utterance) and Whisper Turbo is producing garbled output on the non-primary language.
- Background noise, low-quality recording, or far-field microphones — accuracy matters more than throughput.
- Conversational dictation where capitalization, punctuation, and proper nouns must be right first time.
Pick something else when
- High-volume bulk transcription where ~99% accuracy is fine — use
whisper-v3-turboat ~4.5× cheaper. - The audio is over 25 MB — Quality tier currently only supports multipart upload. Use
transcribewith the JSONaudio_urlmode for files up to 100 MB. - You need full per-segment timestamps reliably under outage — Quality tier does not fall back to Vertex Gemini (the default
transcribedoes), so a 5xx from OpenAI surfaces directly.
No fallback
The defaulttranscribe alias has an automatic Whisper → Vertex Gemini fallback chain on upstream outage. The Quality tier opts out by design: you explicitly chose OpenAI for accuracy, and silently swapping to a different provider would defeat that contract. OpenAI outages return as 502 transcription_failed so your client knows to retry or downgrade.
See the Fallback chain section on the endpoint reference for the full rules.
Concurrency limits
Routed through theopenai audio sub-pool. Per-tier caps:
| Tier | Concurrent slots |
|---|---|
| Tier 0 | 1 |
| Tier 1 | 2 |
| Tier 2 | 4 |
| Tier 3 | 8 |
| Tier 4 | 18 |
transcribe (whisper-v3-turbo) sub-pool or any other audio capability. See Rate Limits — Audio limits.
Example
Python (raw HTTP — OpenAI SDK doesn’t expose Kyma aliases)
whisper-1 internally for transcription so passing model="transcribe-quality" through the SDK won’t reach Kyma’s alias resolver. Raw requests works fine.
Aliases that resolve here
transcribe-quality— premium ASR tier on Kyma.
gpt-4o-mini-transcribe-2025-12-15 directly. If you want to ride future Quality-tier upgrades (e.g. if Kyma promotes a newer OpenAI STT SKU), use the alias.
See also
whisper-v3-turbo— default STT (cheaper, faster, fallback chain enabled)POST /v1/audio/transcriptions— endpoint reference- Rate Limits — concurrency caps for the
openaiaudio sub-pool - Model Aliases — full alias index