- Use
qwen-3.6-plusfor the best default - Use
kimi-k2.6for tool-heavy agents - Use
deepseek-r1for hard reasoning - Use
gemini-2.5-flashfor 1M context
Quick decision guide
| If you need… | First pick | Alias | Why |
|---|---|---|---|
| Safest default | qwen-3.6-plus | best | Best overall quality across common tasks |
| Tool-heavy agents | kimi-k2.6 | agent | Best first pick for tools, long sessions, screenshots |
| Deep reasoning | deepseek-r1 | reasoning | Best for logic, math, and difficult analysis |
| Fast coding loops | qwen-3-32b | fast | Lower latency for code/debug loops |
| Code-specialized output | qwen-3-coder | code | Code-focused model with longer context |
| Long documents | gemini-2.5-flash | long-context | 1M context window |
| Vision | gemma-4-31b | vision | Cheapest strong multimodal option |
| Balanced open model | llama-3.3-70b | balanced | Good compromise between quality and cost |
| Live web search | sonar | search | Real-time web search with citations for current info |
| Cheap automation | glm-4.5-air | — | Cost-sensitive agentic workloads |
Use cases that matter
1. I just need one model
Start withqwen-3.6-plus.
That is the right answer most of the time if you are building:
- a chatbot
- a coding assistant
- an internal copilot
- a general-purpose product feature
2. I am building an agent
Start withkimi-k2.6.
If your agent:
- calls tools
- works across multiple steps
- reads screenshots or other visual context
- needs long sessions
kimi-k2.6 is the best first pick.
If you want a text-only engineering alternative, try glm-5.1.
3. I care about cost
Start withdeepseek-v3 for strong value.
If the workload is more repetitive and automation-heavy than quality-sensitive, consider:
glm-4.5-airgpt-oss-120b
4. I need deep reasoning
Usedeepseek-r1.
This is the right pick for:
- hard analysis
- logic-heavy tasks
- math
- planning where quality matters more than speed
5. I need long context
Usegemini-2.5-flash.
If you want cheaper long-context throughput and do not need multimodal input, look at glm-4.7-flash.
6. I need live web data
Usesonar (alias search).
It runs a real-time web search on every request and returns a current, cited answer — for news, prices, releases, and anything that changes after a model’s training cutoff. For deeper multi-step research and longer reports, use sonar-pro.
Both bill a small flat web-search fee on top of tokens (see pricing), and neither supports tool calling — the search happens internally, so you just ask a question.
Multimodal
Image and video models bill per call (or per second of video) instead of per token, and run through a separate async endpoint - see/v1/images/generations and /v1/videos/generations.
Image (per-image pricing)
| Model | Best for | Price |
|---|---|---|
flux-1.1-ultra | Cinematic photo, hero shots, editorial | $0.081 / image |
flux-kontext-pro | Image-to-image edit, inpaint, refinement | $0.054 / image |
ideogram-v3 | Typography, packaging, posters, logos | $0.108 / image |
recraft-v3 | Vector illustration, brand assets | $0.054 / image |
Video (per-second pricing)
| Model | Best for | Duration | Price |
|---|---|---|---|
kling-2.5-pro | Budget cinematic, brand b-roll | 5 or 10s | $0.0945 / sec |
kling-3-pro | Premium cinematic, hero brand video | 3-15s | $0.1512 / sec |
kling-3-pro-audio | Cinematic with native audio + dialogue | 3-15s | $0.2268 / sec |
seedance-2-pro | Multi-shot action, social with audio | 4-15s | $0.4096 / sec |
seedance-2-fast | Social shorts, rapid iteration | 4-15s | $0.3266 / sec |
image_url to switch into image-to-video mode without changing the model ID.
Canonical sources
For the current live catalog, use:Switching models
You do not need to change your integration. Just change themodel parameter: