See real-time cache stats across all models on the Rankings page.
How It Works
- First request: Full prompt is processed and cached by the provider
- Subsequent requests: Cached prefix is reused — up to 90% cheaper and 80% faster
Real-World Impact
Based on production data from the Kyma community (7-day rolling window):- 22%+ overall cache hit rate across all models
- deepseek-v3 leads with 56% cache hit rate — heavy agentic usage
- gemini-2.5-flash and gemma-4-31b consistently hit 20-35%
- Coding agents (OpenClaw, Cline, Roo Code) see the highest cache rates due to repeated system prompts
Automatic Caching
For OpenAI-compatible requests, caching is automatic when your prompt exceeds 1,024 tokens. Place static content (system prompt, tool definitions) at the beginning:Best Practices
Structure prompts for caching
Place stable content first, dynamic content last:For coding agents
Coding agents (OpenClaw, Cline, Roo Code, Claude Code) automatically benefit from caching because they send the same system prompt + tool definitions with every request. Real production example — 50-request coding session with deepseek-v3:| Without caching | With caching (56% hit rate) | |
|---|---|---|
| Effective input tokens | 250,000 | 110,000 |
| Input cost | $0.203 | $0.049 |
| Savings | — | $0.154 (76%) |
What to avoid
- Don’t put timestamps or request IDs in system prompts — breaks cache
- Don’t reorder tool definitions between requests
- Keep system prompt identical across requests
Cache Stats in Response
Kyma normalizes cache statistics from all providers into a unified format:| Field | Description |
|---|---|
cached_tokens | Tokens read from cache (90% discounted) |
cache_write_tokens | Tokens written to cache on first request |
cost | Total cost charged for this request (USD) |
cache_discount | Amount saved from caching (USD) |
Tracking Your Savings
Per-request
Every API response includesusage.cost (what you paid) and usage.cache_discount (what you saved). Sum these over your session to track total savings.
Community-wide
Visit the Cache Stats rankings to see:- Overall cache hit rate across all Kyma users
- Per-model cache breakdown (cached vs uncached vs output tokens)
- Total community savings in USD
Supported Models
All models on Kyma support prompt caching. Cache effectiveness varies by model — Kyma normalizes the behavior so you always see the samecached_tokens shape and the same 90% discount.
Check which models are actively caching:
Pricing
Cached tokens are charged at 10% of the normal input price (90% discount).| Token type | Rate |
|---|---|
| Input (non-cached) | Full price |
| Input (cached) | 10% of input price |
| Output | Full price |
| Price per 1M tokens | |
|---|---|
| Input (full) | $0.810 |
| Input (cached) | $0.081 |
| Output | $2.295 |
usage.cost and usage.cache_discount fields in every response let you track savings in real-time.