Blog

Thoughts on software development, tech trends, and personal growth.

In the last article, we discussed why API caching isn’t always the best caching strategy when we want to cache highly requested endpoints, which include data that is heavy to calculate. The reason behind the problem with API caching here is the “thundering herd” problem; you can read more about what it is in the previous article in the series. Other caching methods to use Read-through The problem mentioned above is using the caching strategy called “Read-through” caching. This, in our case, sits between our users and our API to prevent users from hitting the API and both requesting the database, as well as not invoking a new instance of the lambda function. Write-through Another way to cache heavy calculations from a database is something called write-through caching. This strategy is different from the read-through strategy. The reason is that it first writes to the cache and afterwards writes the data to the database. If new data has been posted by a user in a database, it would get inserted into the cache, then the database. But in our case, we don’t have any users who create new entries into our database. We have a processor that processes data, calculates it, and inserts it into the database. How can we scale this so people can request this kind of data without users querying the database directly? The golden era of static files You have probably seen programming memes about having your database as static files or even Excel spreadsheets. Maybe it is a bad idea to use static files or Excel spreadsheets as the primary database, but not for caching the data. The reason why static files are good for storing data is that they scale almost infinitely. One minor detail to note is that the data needs to be the same for all users. If you have set up an S3 bucket in front of a CloudFront distribution, you can utilize the scaling CDN to scale your data globally, which now gets even closer to the end user. This caching strategy works perfectly in our use case because the real-time statistics data should be the same for every user and is calculated through heavy calculations, which should only happen a couple of times every other second or so. This method would require a processor that runs on a cron job, which calculates the statistics and inserts them into the bucket, which the CDN would distribute. For our use case, we would only need the processor to run if we have an ongoing match. Implementation of static files as a caching strategy In our system, we need to scale the data to as many users as possible in real-time, so our users won’t miss any in-game statistics. When you want to use cache the normal way, you traditionally want to cache responses for as long as possible. That way, you prevent the many requests from going directly to the service to fetch the data. In our system, we still want to prevent users from requesting data directly from the S3 bucket. Having the CloudFront distribution in front of it prevents requests from going directly to the S3 bucket. When we want to update the in-game statistics, we need to invalidate the cache on the CloudFront distribution. To invalidate the cache, we say to CloudFront, that they should delete the cached versions from all of their edge nodes around the globe. Next time a user requests the updated in-game statistics, CloudFront grabs the updated version from our S3 bucket, and caches the updated version on their edge nodes, so the other users would be able to access it. Adding origin shield, but why? Invalidating cached files from a CloudFront distribution technically deletes the stored file on each of the edge nodes globally. After all files have been deleted from all edge nodes, CloudFront then needs to fetch the file from its origin (S3 bucket). If a lot of users are requesting the same file at the same time, but CloudFront hasn’t cached it yet, then all requests would go to the S3 bucket, which means that you could see a spike in reads on your bucket. This is where the origin shield comes to save the day. This is AWS CloudFront’s own explanation of what an origin shield is: Origin shield is an additional layer in the caching infrastructure that helps to minimize your origin’s load, improve its availability, and reduce its operating costs. As stated above, an origin shield is an additional layer in how the CloudFront distribution caches data. The layer lies above all the edge-nodes and prevents them from reading directly from the origin (S3 bucket). Instead, the edge-nodes read from the origin-shield, which has the data lying in its cache/storage. Other technologies used for scaling real time data Other technologies you can use to distribute read time data to users include web sockets. Sockets run on an open TCP connection as their transport protocol, which means there’s no need for a new handshake, and data can be delivered with high performance. You can publish events/messages to subscribers, which would then get those events. In our case, we can use this to distribute our live statistics to users who subscribed to them. A typical web socket system/infrastructure could look something like this: Here, the user first subscribes to the socket API, and then they receive events or messages from the socket API via the socket connection. Usually, you would scale your APIs depending on the load. In our scenario, with our live streams, we could have between 500 and 5000 viewers. Having one server handling 5000 socket connections wouldn’t be reliable. Although it seems pretty easy to scale an API, it’s a bit harder to scale an API when it needs some state. The state in this case is a list of all the connected clients to all our WebSocket APIs. Since each API does not know what connections the other APIs have, there is no way for it to send messages to those clients. We can share this state between the APIs with a cluster, which is normally a database, to store all the clients. The popular JavaScript library used to manage web sockets, socket.io, has clustering built in, with some cluster adapters built in. In our situation, we want the cluster to react to changes quickly. So it can deliver events/messages with high performance. We are using Redis in this example. With our scaled APIs and our web sockets cluster created, our infrastructure now looks like this: As you can see, each of the WebSocket APIs is connected to the Redis cluster, and both send and receive information about the other APIs' connections. But why do we need a cluster if we scale our APIs Since each of the APIs only knows about its own connections without the cluster. If we then wanted to send a message, the send (POST) request would be sent to one of the APIs, and that API would only be able to send the message to its own connections, not the other APIs’ connections. What about server-sent-events (SSE) You can also use server-sent-events. One of the drawbacks of SSE is that it’s not bi-directional, which means that the client can’t send messages back to the server. In the situation regarding the live stats, it doesn’t really matter. The client should only consume messages and not send anything back to the server. Conclusion We have gone through different methods to distribute live data to users. Each one of them has its own pros and cons. It’s always good to discuss with a broader team or other people about what you think is the best solution to serve real-time data to your users. Some questions are good to ask yourself when trying to decide: How well should it scale? Most of the time static files are the answer Do we want clients to send back information? Web sockets is a good choice here, to prevent a lot of API calls Is a simple setup good enough for our use case? If you know API caching would be good enough, and you know you wouldn’t get a “Thundering Herd” problem. Then API caching is the simplest solution Hopefully, this series has been useful for you to read. If you haven’t read the other articles in the series, I highly recommend reading the other 3. Those explain the problem that we have faced, and not only the solution. I’m hoping this shared some insight on how to scale real-time data to users, problems you can face, and how to prevent them when they occur. I can at least say that my team and I have learned a lot about how to tackle it, as well as what and how to implement this in other situations. Thanks for reading the article (maybe series). Hope you have a wonderful day ☀️

microservicesserverlesscachingblast-tv
Feb 27, 2026 · 9 min read
Read more

In the last article, we talked about solving problems during stressful situations and remembering to load-test your solutions/product before releasing them to the public because sometimes you cannot predict how your system will function under heavy load. Scaling API requests, the problem The problem with scaling our system was that our users hit our API every 15-20 seconds, which put a lot of load on both the API and the database. We tried resolving the issue with the database proxy but users were still requesting the API every 15-20 seconds, which meant that if all of the API requests would create a new lambda container we would have the same number of containers as we had database connections. API caching At that point, we knew that we needed to reduce the number of calls made to the API, which only meant that we needed to increase the interval that every user would call the API. If we did that, it would result in the users having outdated stats, and that would remove the real-time aspect from the product. We could also try to implement some API caching, which means we can reduce the amount of API requests that will go to our API, which also results in fewer invocations and new instances being created of our API when it needs to scale. The solution Because most of our infrastructure on BLAST.tv is running on the serverless architecture on AWS, we have also chosen to have AWS API Gateway as our API proxy to control which endpoints go to where. API Gateway does have caching built in as an underlying feature, which you can add with a click of a button to a specific endpoint and method. We chose to try to implement API caching on our statistics endpoint with API Gateway, which would return the cached response to users who hit the cache. Testing the solution When we load-tested the API caching, we still saw the same spikes in created lambda instances and database connections. So we dived into what the problem occurred by and how we could implement a fix for it. After reading up on how the caching API Gateway supports, we got to a problem called “The thundering herd” problem. This problem occurs when many requests miss the same cache key and hit the underlying API/database with many simultaneous requests on the same endpoint. It causes the same effect as if all the users requested the API directly without the cache being there. With this solution, our problem would still exist because our users' access pattern is when we have a broadcast/event, and everybody is requesting the same data from the same endpoint. We would need to figure out a better caching strategy. There are other caching methods that we can use to resolve this issue. We are looking at how these methods can solve our scaling issue. As well as what we need to implement for it to work, in a later article in this series.

microservicesserverlesscachingblast-tv
Feb 20, 2026 · 3 min read
Read more

In the first article, I wrote about our live statistics system and why we implemented it on our website. In later articles, I want to dive deep into problems we have had with scaling the system to a growing number of users and what we did to tackle chosen problems. Releasing the feature to the public After putting the product/feature on our website, our first broadcast/event occurred. It was the Fall Final 2022 event in November, and it was the first time we would show our users/fans the new real-time stats feature we have built over the last five weeks. We were stoked to see what the fans were thinking about it and how they would react. At the start, we saw around 3-400 people watching the live stream on our website, and everything was running smoothly, with no errors occurring yet. The viewership grew with around a thousand people, and we started to see errors, and users reporting that the stats weren’t working for them. We tried to investigate if either the API or database was failing. It wasn’t the API that was struggling, because we chose to build the API upon the Lambda serverless function infrastructure on AWS, which could scale to 1500 running instances at once. Then we looked at how the database was handling the traffic, and it was struggling, with 1000 connections to the database. Pain with serverless computing scaling After the event, we sat down to reflect on what caused the outage of real-time stats. We wrote a postmortem about what happened and investigated how we can resolve this issue so that in the future, we can scale this feature to more people than we had in this event (around 1200 people). The first place we started looking for issues was CloudWatch. There was a similarity in the metrics tabs of both the database and the API between Lambda concurrent executions and database connections. We could see that we had a lot of concurrent executions on the last day of the event, which made sense because a lot of people tuned in to watch the grand final and the showmatch on BLAST.tv By looking at the metrics we could conclude that for every container that lambda spun up, it would generate a connection to the database, which meant that we needed to implement some API caching or reduce the amount of API calls that were made by every user each x second to the API which returns the live stats. Database Proxy implementation The current API implementation meant that each single one of the viewers would send a GET request to the statistics API, which would query the data and return it to the user. For every x amount of seconds we specified, we would make each one of the users request the stats. So this meant if we had 1500 concurrent viewers on our live stream, we would have 1500 requests every 15 seconds. We would see the requests coming in waves because the interval/timer starts when the user loads the live page. Our database wasn’t happy with 1500 heavy queries every 15-20 seconds, even though the database can scale to 128 ACUs. The database was struggling because of the amount of connections to the database, so we tried setting up a database proxy in front of it, in the hope that it could reduce the load on the database. The proxy would be in front of the database to handle database pooling for us, so when the lambdas scale to hundreds of instances, they would connect to the database proxy, which would handle the database connections and the transactions to the actual database. The database proxy would allow us to scale the lambdas and keep the number of connections to the database below a fixed threshold so the database wouldn’t crash. We can do this, because the database proxy helps us pooling the connections to the database, and allows the connections to be re-used for queries. Bug fixing under stressful situations We tried implementing the database proxy during the actual event when the problem occurred after deploying what we thought was the fix with the database proxy, but it didn’t seem to have changed regarding the database connections. The connections were still around the ~800 mark. After the event was over, and we tried our best to survive the rest of the days during the event, we sat down to try to figure out why our changes with the database proxy we deployed didn’t work. After some research, we found that we missed linking the lambda function with the database proxy. There is a setting under the lambda function, which you can select the database or database proxy it would connect to, which would reduce the CPU and memory usage of our database and handle the database connection automatically from the lambda function to the database proxy. Conclusion There are a few points the BLAST.tv team learned during this outage of Live Statistics during Fall Final 2022. First and foremost, we needed to have load-tested the system before launching it. We had only tested it on our development environment (with a maximum of 5 users), which resulted in us not knowing how the system would scale to hundreds of users during a live event. The second learning we can take away from this would be even though we are in a stressful situation we shouldn’t panic and start deploying all sorts of things. It’s better to sit down, breathe a little, and find which solutions are available and how to implement them. It’s also better to deploy a solution to a development environment so you can test the solution before promoting it to production. Doing this would minimize the amount of downtime of a given service/system. It’s also a good idea to deploy a single solution one at a time. That way, can you ensure that you can test if the solution fixes the problem and tell what worked and what didn’t work. Thank you for reading through this article. In the later articles, I want to explain why we didn’t go with API caching and the different types of caching methods there are.

microservicesserverlessblast-tv
May 31, 2024 · 7 min read
Read more

This is the first part of my first series called 'Real-Time Stats on BLAST.tv.' In this series, I wanted to show and explain how we at BLAST are and what technical/engineering challenges that come with trying to build new functionalities for a large user base. The purpose of Real Time Statistics I first wanted to discuss our 'Real-Time Statistics' on our platform, BLAST.tv. The reason we wanted real-time statistics on BLAST.tv was that it was one of the most requested features by our users, and we also aimed to enhance the way our viewers experience esports online. We believed that bringing live statistics closer to the end-user would provide them with more insights into the game while they are watching the livestream. It would also allow the users to discuss the statistics in the chat window we have on the website. The history behind Real Time Statistics on BLAST.tv It all begins with our team wanting implement real-time statistics for Counter-Strike (the game). These statistics would display in-depth information for the given match we are showcasing on BLAST.tv, similar to how the website HLTV does it. Counter-Strike has functionality that allows it to send game events occurring on the game server to a specific HTTP endpoint in the form of log lines. Example of a log line: 01/01/1970 - 00:00:00.000 - MatchStatus: Score: 0:0 on map "de_overpass" RoundsPlayed: 0 We ended up creating a relatively simple log processor that processed the log lines and inserted the data into a relational database (PostgreSQL). The data was split into entity events (match events) and player events (kills, deaths, assists, etc.). We stored the 'loggedAt' timestamp as well, so we could list the events in the order they were sent. The splitting of events allows us to query specific data depending on what we need for the product. The design Our designers created a design for live statistics with two different views: a simple overview and a more in-depth one. The simple view provides users with key statistics for each player, along with the current score on the map and information about the leading team. The two bars on either side indicate the number of maps each team has won. Most Counter-Strike matches are played in a Best of 3 (BO3) format, so winning 2 maps is required to win the match. The detailed view shows more in-depth statistics for each player. When the user expands the left sidebar, they are given the option to choose between the two teams. After the user has picked one of the two teams, they need to select a specific player for that team. Once the player is selected, they can view in-depth statistics such as Average Damage per Round (ADR) or headshot percentage, and so on. When the detailed view is expanded, the bottom stats bar would also expand to reveal the round progression over time, showing which team won each round. The user would be able to click into a specific round, and see detailed statistics about which player got the first blood, the number of kills each team achieved in that specific round, as well as who dealt the most damage. The problems with scaling As you can see by the designs, we needed to provide a lot of detailed information/data about the running match, and that’s the reason why this became one of our biggest and most complicated systems we have to date. I have chosen to split the problems into separate articles, making it easier to understand the various challenges we have encountered while scaling this product to thousands of users. I have tried to focus the articles around the scaling issues we have been through instead of the actual logic that goes on behind the scenes. Next article, would you be able to read what challenges we faced with using serverless infrastructure to build out this functionality, and how serverless isn’t always the way forward for having well-performing infrastructure. I’m also discussing different strategies we used to deliver real-time data to the end user and how we scaled that. I will dive deeper into how we scaled the data to thousands of users using cache in different ways. API caching isn’t always the way to go, and I will argue why we chose to move away from using it to use static files instead. Thank you for taking the time to read my article. Have a great day! ☀️

microservicesdatablast-tv
May 17, 2024 · 5 min read
Read more

If you are starting a startup or building the next big app for the AppStore, I recommend trying out the pair Flutter and Firebase as a programming stack. Flutter and Firebase are both developed by Google, where Flutter is their open-source cross-platform app framework, and Firebase is their BaaS (Backend-as-a-service). Arguments for using Flutter and Firebase And here is where the first point why Flutter and Firebase are a good programming stack. Google is standing behind both Flutter and Firebase, which means the integrations between the two platforms almost flawless. Pretty much every product that Firebase offers can be integrated into a Flutter app with just a couple of lines of code. A bare minimum app linked to a specific Firebase project is only two lines in two separate files. In pubspec.yaml file: And in the main.dart file: So with just two lines, you can link your new Flutter app up to Firebase. The company Invertase is maintaining the plugins and is updating them. Invertase is also managing the Firebase plugins for React Native. So they have two teams working to support Firebase for both React Native and also Flutter. Below is a list of products in their stable version that supports Flutter. My second argument for using Flutter with Firebase is that they both are very beginner-friendly, with a lot of good documentation to help you set up a fast app that can get distributed to both Android and IOS at the same time. With Flutter as the frontend, can it help you a lot when it comes down to performance and distribution concurrent to two platforms. Firebase will, on the other hand, help you manage a scalable backend service for your app. It can be a pain to manage your own backend, like a VPS (Virtual Private Server) all by yourself, and by using Firebase, you can focus on making your app and get it out to the world fast and secure. The drawbacks of using Flutter and Firebase But there are also drawbacks of using Flutter and Firebase together. Flutter is the new kid on the block, so the community is so limited. This, unfortunately, means that there are only a limited number of answered community and Stackoverflow questions. But the Flutter community expanding rapidly every day. The problem with Firebase as a backend service is that it can be very costly if your app scales very fast. Because Firebase structured, so you pay for CRUD (Create, Read, Update, Delete) actions on their database, and not only for the bandwidth you use, can it be very costly if you have an app that reads a lot of data from your Firestore If you want to build an app with Flutter and Firebase and are thinking This is will be the next Facebook. Then I recommend finding a good revenue strategy for how your app is going to make itself money. Conclusion Flutter and Firebase is a solid programming stack to build a cross-platform and scalable app. Flutter gives you the ability to create a fast and almost native app in no time. Firebase gives you a free tier of most of their products, which you can use to create an app with a solid 100-1000 daily users without any backend costs. But if your app scales to more than 50.000 users, you maybe need to consider either building your own backend (API, Database, etc.) or creating a solid revenue plan for your app, which can pay for its monthly bills. Final Note: Let me know what you think about using Firebase and Flutter to build cross-platform apps. Are you a fan of those technologies, or are you using other technologies/programming stacks?

dartflutterfirebaseapp-development
Jun 28, 2021 · 4 min read
Read more

Introductions to !Bangs Most of my friends who use DuckDuckGo daily have used the !Bang feature alongside the normal browser to increase their productivity. DuckDuckGo has been adding different shortcuts to their search engine since 2008, and they are still the only search engine that promotes this feature. But because I use Google Chrome, and there weren't integrated custom shortcuts by default, I needed to figure out a way to implement it. Therefore I began to search the web for how to make my own shortcuts to different websites, and after many hours I could not find any solution to the problem. But I found a built-in function in Google Chrome which is the "Custom Search Engines" Here you could get a list over some of Chromes Default Search engines as well as other websites search engines. Google Chrome has been adding websites custom search engine automatically by default without anyone knowing. So if you look up the list of custom search engines on your Chrome browser, it most likely has a lot of website search engines already added. How to create your own "!Bang" Firstly you need to make sure that you have access to Google Chromes custom and other search engines. You can check by going into the settings, and under Search Engine. Then you will need to click on "Manage search engines", it's here where you can manage your main search engine, which in my case is Google and your custom search engines (!Bangs) Find the search URL Secondly, we need to find the URL to a specific website we want to add to our custom search shortcuts. You can only add websites that include our search term in the URL. So if a website's URL doesn't change when you search, it most likely can't be added to a custom search shortcut. But websites like Youtube, Twitch, Facebook, and most of the Google Apps support it. For example, we can take the website Twitch, which is a live streaming platform for all kinds of things. When we search on Twitch, Twitch updates its URL to include our search term, as seen in the picture below. We are looking for the bit in the URL after twitch.tv/search?term= which in our case, there has been searched on "Hello World". Adding the website to our Custom Search Shortcuts Now that we have found the URL which we want to add to our Custom Search Shortcuts, we can add it to the list. Firstly we need to click on the little add button, which opens a box we can enter a name of the shortcut, a keyword, and the URL. Here will we give our Shortcut a name, which in our example is Twitch. Afterward, will we find a keyword phrase that we like. It's this phrase we need to enter into the search box of our Chrome Browser. So a good thing is also trying to keep it as short as possible, and also make it understandable, so we can remember it. The last part is to copy-paste the URL we found into the URL box, and replace our search term which in our example was "HelloWorld" with a %s. The %s will get replaced when we use our custom search shortcut. Finally, click the add button, and there you will see your first created shortcut, which you can use to search websites faster. Type the keyword into your browser, press space, and see the magic happen. 🔮 Bonus Tip: Because Google Chrome by default adds a website's search engine into our Custom Search Engines, you can add a little star (*) in front of the name of your Custom Shortcuts. This will move all of your custom search shortcuts to the top of the list. Bonus Tip #2: If you don't want a star in front of all your shortcuts, you can add a little nice extension that prevents the browser to add websites custom search engines. Link Here Thanks! Thanks for reading my first article on how to create custom search shortcuts. If you have any questions feel free to comment below. Have a wonderful day 😄

google-chromesearch
Jun 21, 2021 · 4 min read
Read more