TH
โ† Back
news 2026-04-24 ยท deepseek-blog

๐Ÿ’ฐ DeepSeek Cuts API Costs by 90% With Automatic Context Caching

๐Ÿ’ฐ DeepSeek Cuts API Costs by 90% With Automatic Context Caching

If you're sending the same long document to an AI model over and over โ€” paying full price each time โ€” DeepSeek just solved your problem without you lifting a finger.

Context Caching is a new feature enabled by default on all DeepSeek API calls. When your request shares the same prefix as a previous one, the system pulls the overlapping content from disk cache instead of reprocessing it.

The result? A 90% cost reduction on cached tokens โ€” from 1 yuan per million tokens down to just 0.1 yuan.

No code changes needed. No configuration. Just use the API as usual and watch your bills shrink.

**What it means in practice:**

The API response now includes `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` so you can see exactly how much you're saving.

The only catch: content under 64 tokens won't be cached, and unused caches expire within hours to days. But for most real-world use cases โ€” document Q&A, multi-turn conversations, and repeated prompts โ€” this is essentially free money.

๐Ÿ“„ Source

deepseek-blog
Share: Facebook ๐•
โ† Previous
๐Ÿค” Hating Closed-Source AI Makes Sense โ€” But Local
Next โ†’
๐Ÿ’ป DeepSeek Launches FIM Completion โ€” AI That Fill