Why API caching isn’t always the best caching strategy

Feb 20, 2026

3 min read

In the last article, we talked about solving problems during stressful situations and remembering to load-test your solutions/product before releasing them to the public because sometimes you cannot predict how your system will function under heavy load.

Scaling API requests, the problem

The problem with scaling our system was that our users hit our API every 15-20 seconds, which put a lot of load on both the API and the database. We tried resolving the issue with the database proxy but users were still requesting the API every 15-20 seconds, which meant that if all of the API requests would create a new lambda container we would have the same number of containers as we had database connections.

API caching

At that point, we knew that we needed to reduce the number of calls made to the API, which only meant that we needed to increase the interval that every user would call the API. If we did that, it would result in the users having outdated stats, and that would remove the real-time aspect from the product.

We could also try to implement some API caching, which means we can reduce the amount of API requests that will go to our API, which also results in fewer invocations and new instances being created of our API when it needs to scale.

The solution

Because most of our infrastructure on BLAST.tv is running on the serverless architecture on AWS, we have also chosen to have AWS API Gateway as our API proxy to control which endpoints go to where.

API Gateway does have caching built in as an underlying feature, which you can add with a click of a button to a specific endpoint and method. We chose to try to implement API caching on our statistics endpoint with API Gateway, which would return the cached response to users who hit the cache.

Testing the solution

When we load-tested the API caching, we still saw the same spikes in created lambda instances and database connections. So we dived into what the problem occurred by and how we could implement a fix for it.

After reading up on how the caching API Gateway supports, we got to a problem called “The thundering herd” problem. This problem occurs when many requests miss the same cache key and hit the underlying API/database with many simultaneous requests on the same endpoint. It causes the same effect as if all the users requested the API directly without the cache being there.

Picture explaining thundering herd problem with caching. [Source](https://javascript.plainenglish.io/thundering-herd-problem-solution-with-node-js-and-promise-e8bc55dc5105)

With this solution, our problem would still exist because our users' access pattern is when we have a broadcast/event, and everybody is requesting the same data from the same endpoint. We would need to figure out a better caching strategy.

There are other caching methods that we can use to resolve this issue. We are looking at how these methods can solve our scaling issue. As well as what we need to implement for it to work, in a later article in this series.

Part of the "Real Time Stats on BLAST.tv" series

This post is part 3 of 3 in this series.

View full series