๐ฐ DeepSeek Cuts API Costs by 90% With Automatic Context Caching
If you're sending the same long document to an AI model over and over โ paying full price each time โ DeepSeek just solved your problem without you lifting a finger.
Context Caching is a new feature enabled by default on all DeepSeek API calls. When your request shares the same prefix as a previous one, the system pulls the overlapping content from disk cache instead of reprocessing it.
The result? A 90% cost reduction on cached tokens โ from 1 yuan per million tokens down to just 0.1 yuan.
No code changes needed. No configuration. Just use the API as usual and watch your bills shrink.
**What it means in practice:**
- Analyzing a 50-page financial report with 10 follow-up questions? You pay full price once, then 90% less for each subsequent query.
- Running a chatbot with a long system prompt? The prompt gets cached automatically across conversations.
- Using few-shot examples? They're cached after the first call.
The API response now includes `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` so you can see exactly how much you're saving.
The only catch: content under 64 tokens won't be cached, and unused caches expire within hours to days. But for most real-world use cases โ document Q&A, multi-turn conversations, and repeated prompts โ this is essentially free money.
๐ Source
deepseek-blog