Prompt Caching in AI Can Save Up To 90% on API Costs

With the increasing dependency of developers on Artificial Intelligence, cost management is becoming a huge problem. Running long prompts on APIs repeatedly is becoming more and more expensive. To solve this issue, Anthropic is here with Prompt Caching. It is said to reduce costs by up to 90%.

Highlights:

Anthropic introduces Prompt Caching; many are calling it a game-changer.
It helps reduce costs by up to 90% and latency by up to 85% for long prompts.
Claude and Gemini offer prompt caching features, but not OpenAI.

What is Prompt Caching?

Let’s start with what we know about caching that we are familiar with already. Caching is the process of storing copies of files in a cache, or temporary storage location so that they can be accessed more quickly.

Prompt caching means storing prompts at a temporary location for easy access. When you type a prompt on LLMs like ChatGPT or Claude, they must process the request and generate a response by doing many complex computations.

In some cases, similar prompts are issued repeatedly, which leads to increased costs, especially for AI-based API services. Therefore, using prompt caching, we can get answers to repeated requests from the cache instead of recalculating from scratch.

Prompt Caching is useful for saving costs and would help save a lot of time as the repeated queries are not sent to the server and are answered quickly by retrieving the information from the cache. It even helps to optimize the usage of computational resources, which in turn saves many energy resources.

Let’s see how anthropic prompt caching works: When you make API calls with the feature enabled, they check if the designated parts of the prompts are already cached from a recent query, if they are, they retrieve them. The initial API call is going to be expensive as they need to store the prompt in a cache (cache write) but the subsequent calls are going to be a lot cheaper.

Anthropic has released prompt caching for its Claude 3.5 sonnet and Claude 3 Haiku models. It is yet to be released for Claude 3 Opus. Take a look at the estimated costs released by the company has released:

According to Anthropic, Prompt Caching can reduce costs by up to 90% and latency by up to 85% for long prompts.

How Prompt Caching is Useful?

There are several use cases that this release has. Let’s take a look at some of them:

Company chatbots: There are many questions that customers ask frequently. All customers browsing a cafe’s website may have the following enquiries: What is the menu? What are the different types of coffee varieties available? Is the cafe open at 8 pm? Does it serve tiramisu? Is UPI accepted? So, all the FAQs like these can be stored in a cache and answered quickly now.
Developers can now upload their codebase into the LLM and use it as a coding assistant.
It can process large documents easily and help to refer to specific information. This can be helpful in the domain of law, where lawyers can upload details of various cases and look up the results.
You can talk to books, papers, documentation, podcast transcripts, and other long-form content and ask questions based on it repeatedly using prompt caching.

But there are several security concerns regarding this. Thomas Randall, director of AI market research at Info-Tech Research Group has said:

“While prompt caching is the right direction for performance optimization and greater usage efficiency, it is important to highlight security best practices when utilizing caching within programming, If prompts are shared across (or between) organizations that are not reset or reviewed appropriately, sensitive information within a cache may inadvertently be passed on.”

Another problem with this is that the cache has a 5-minute lifetime. Several people are comparing it now with Gemini’s Context Caching. Gemini charges $4.50/million tokens/hour to keep the context cache warm, while Anthropic charges for cache writes, and “cache has a 5-minute lifetime, refreshed each time the cached content is used“.

Conclusion

Prompt caching is a significant development in the field of AI. If more companies start adopting this method, it can soon be a standard procedure. Meanwhile, if you have not tried this AI model, here are Claude 3 Prompts to test it.

Prompt Caching in AI Can Save Up To 90% on API Costs

Geethanjali Pedamallu

RelatedPosts

7 Best AI Tools for Remote Job Seekers in 2025

9 Best AI Interview Assistant Tools For Job Seekers in 2025

AI Just Created a Full Tom & Jerry Cartoon Episode

Amazon’s New AI Makes Buying from Any Website Easy

What Went Wrong With Microsoft’s AI Version of Quake II?

About FavTutor

Categories

Important Subjects

Resources