Choosing a Model - Kyma API

Kyma exposes active models through one OpenAI-compatible endpoint. The hard part is not integration — it is choosing the right model for the job. If you just want the short version:

Use qwen-3.6-plus for the best default
Use kimi-k2.6 for tool-heavy agents
Use deepseek-r1 for hard reasoning
Use gemini-2.5-flash for 1M context

See which model should I use? for the full decision page.

Quick decision guide

If you need…	First pick	Alias	Why
Safest default	`qwen-3.6-plus`	`best`	Best overall quality across common tasks
Tool-heavy agents	`kimi-k2.6`	`agent`	Best first pick for tools, long sessions, screenshots
Deep reasoning	`deepseek-r1`	`reasoning`	Best for logic, math, and difficult analysis
Fast coding loops	`qwen-3-32b`	`fast`	Lower latency for code/debug loops
Code-specialized output	`qwen-3-coder`	`code`	Code-focused model with longer context
Long documents	`gemini-2.5-flash`	`long-context`	1M context window
Vision	`gemma-4-31b`	`vision`	Cheapest strong multimodal option
Balanced open model	`llama-3.3-70b`	`balanced`	Good compromise between quality and cost
Live web search	`sonar`	`search`	Real-time web search with citations for current info
Cheap automation	`glm-4.5-air`	—	Cost-sensitive agentic workloads

Use cases that matter

1. I just need one model

Start with qwen-3.6-plus. That is the right answer most of the time if you are building:

a chatbot
a coding assistant
an internal copilot
a general-purpose product feature

2. I am building an agent

Start with kimi-k2.6. If your agent:

calls tools
works across multiple steps
reads screenshots or other visual context
needs long sessions

then kimi-k2.6 is the best first pick. If you want a text-only engineering alternative, try glm-5.1.

3. I care about cost

Start with deepseek-v3 for strong value. If the workload is more repetitive and automation-heavy than quality-sensitive, consider:

glm-4.5-air
gpt-oss-120b

4. I need deep reasoning

Use deepseek-r1. This is the right pick for:

hard analysis
logic-heavy tasks
math
planning where quality matters more than speed

5. I need long context

Use gemini-2.5-flash. If you want cheaper long-context throughput and do not need multimodal input, look at glm-4.7-flash.

6. I need live web data

Use sonar (alias search). It runs a real-time web search on every request and returns a current, cited answer — for news, prices, releases, and anything that changes after a model’s training cutoff. For deeper multi-step research and longer reports, use sonar-pro. Both bill a small flat web-search fee on top of tokens (see pricing), and neither supports tool calling — the search happens internally, so you just ask a question.

Multimodal

Image and video models bill per call (or per second of video) instead of per token, and run through a separate async endpoint - see /v1/images/generations and /v1/videos/generations.

Image (per-image pricing)

Model	Best for	Price
`flux-1.1-ultra`	Cinematic photo, hero shots, editorial	$0.081 / image
`flux-kontext-pro`	Image-to-image edit, inpaint, refinement	$0.054 / image
`ideogram-v3`	Typography, packaging, posters, logos	$0.108 / image
`recraft-v3`	Vector illustration, brand assets	$0.054 / image

Video (per-second pricing)

Model	Best for	Duration	Price
`kling-2.5-pro`	Budget cinematic, brand b-roll	5 or 10s	$0.0945 / sec
`kling-3-pro`	Premium cinematic, hero brand video	3-15s	$0.1512 / sec
`kling-3-pro-audio`	Cinematic with native audio + dialogue	3-15s	$0.2268 / sec
`seedance-2-pro`	Multi-shot action, social with audio	4-15s	$0.4096 / sec
`seedance-2-fast`	Social shorts, rapid iteration	4-15s	$0.3266 / sec

All five video models accept an image_url to switch into image-to-video mode without changing the model ID.

Canonical sources

For the current live catalog, use:

Switching models

You do not need to change your integration. Just change the model parameter:

response = client.chat.completions.create(
    model="kimi-k2.6",  # was: "qwen-3.6-plus"
    messages=[{"role": "user", "content": "Write a Python function"}]
)

Or use an alias:

response = client.chat.completions.create(
    model="agent",  # resolves to kimi-k2.6
    messages=[{"role": "user", "content": "Plan the refactor and call tools as needed"}]
)

Discover models programmatically

curl https://kymaapi.com/v1/models

Examples:

# Agent-friendly models
curl "https://kymaapi.com/v1/models?recommended_for=agent&tools=true"

# Long-context models
curl "https://kymaapi.com/v1/models?min_context_window=128000"

# Vision-capable models
curl "https://kymaapi.com/v1/models?vision=true"

​Quick decision guide

​Use cases that matter

​1. I just need one model

​2. I am building an agent

​3. I care about cost

​4. I need deep reasoning

​5. I need long context

​6. I need live web data

​Multimodal

​Image (per-image pricing)

​Video (per-second pricing)

​Canonical sources

​Switching models

​Discover models programmatically