news 2026-04-24 · deepseek-blog

💰 DeepSeek Cuts API Costs by 90% With Automatic Context Caching

If you're sending the same long document to an AI model over and over — paying full price each time — DeepSeek just solved your problem without you lifting a finger.

Context Caching is a new feature enabled by default on all DeepSeek API calls. When your request shares the same prefix as a previous one, the system pulls the overlapping content from disk cache instead of reprocessing it.

The result? A 90% cost reduction on cached tokens — from 1 yuan per million tokens down to just 0.1 yuan.

No code changes needed. No configuration. Just use the API as usual and watch your bills shrink.

**What it means in practice:**

Analyzing a 50-page financial report with 10 follow-up questions? You pay full price once, then 90% less for each subsequent query.
Running a chatbot with a long system prompt? The prompt gets cached automatically across conversations.
Using few-shot examples? They're cached after the first call.

The API response now includes `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` so you can see exactly how much you're saving.

The only catch: content under 64 tokens won't be cached, and unused caches expire within hours to days. But for most real-world use cases — document Q&A, multi-turn conversations, and repeated prompts — this is essentially free money.

📄 Source

deepseek-blog

← Previous

🤔 Hating Closed-Source AI Makes Sense — But Local

💻 DeepSeek Launches FIM Completion — AI That Fill