LLD-7 API Rate Limiter

5 min readMar 3, 2024

In this article, we will discuss and try to implement some rate limiter algorithms.

What is an API Rate Limiter:

In a network system, a rate limiter is used to control the rate of traffic sent by a client or a service. In the HTTP world, a rate limiter limits the number of client requests allowed to be sent over a specified period. If the API request count exceeds the threshold defined by the rate limiter, all the excess calls are blocked.

Benefits of Using Rate Limiter:

Prevent resource starvation caused by Denial of Service (DoS) attack.
Reduce cost. Limiting excess requests means fewer servers and allocating more resources to high-priority APIs.
Prevent servers from being overloaded.

There are different ways to implement a Rate Limiter.

Client Side
Server Side

Here, we will see the server-side implementation of the Rate Limiter.

Design:

The design for the server-side rate limiter should be like this:

While designing a rate limiter, an important question to ask ourselves is: where should the rater limiter be implemented, on the server-side or in a gateway? There is no absolute answer.

It depends on your company’s current technology stack, engineering resources, priorities, goals, etc. Here are a few general guidelines:

Evaluate your current technology stack, such as programming language, cache service, etc. Make sure your current programming language is efficient to implement rate limiting on the server side.
Identify the rate-limiting algorithm that fits your business needs. When you implement everything on the server side, you have full control of the algorithm. However, your choice might be limited if you use a third-party gateway.
If you have already used microservice architecture and included an API gateway in the design to perform authentication, IP whitelisting, etc., you may add a rate limiter to the API gateway.
Building your rate-limiting service takes time. If you do not have enough engineering resources to implement a rate limiter, a commercial API gateway is a better option.

Algorithms for Implementing Rate Limiter:

Token Bucket
Leaky Bucket
Fixed Window Counter
Sliding Window Log
Sliding Window Counter

In this article, we will see the explanation for every algo mentioned and implement the Token Bucket, Leaky Bucket, Fixed Window Counter, and Sliding Window Counter algorithm. The implementation is very basic, just a basic workflow.

Token Bucket Algorithm

It is the most simple and most widely used Rate Limiting Algorithm.

Workflow:

There are n number of tokens present in a bucket, that has pre-defined capacity.
For each API request, one token will be assigned.
The bucket will be refilled at a certain defined rate, say after every 1 minute. If the bucket is full, the rest of the tokens will overflow.
If the token count becomes 0 within that interval, all the requests will be discarded.

Implementation:

Leaky Bucket Algorithm

The leaking bucket algorithm is similar to the token bucket except that requests are processed at a fixed rate. It is usually implemented with a first-in-first-out (FIFO) queue.

Workflow:

When a request comes in, it checks if there is a space in the queue or not. If not, the request will be dropped, else the request is pushed in the queue.
All the requests in the queue are processed at a constant rate.

Implementation

Fixed Window Counter:

Workflow:

As the name suggests, we will create fixed-size time frame windows, where each window has a counter responsible for counting the number of hits get in a particular time frame.
If the number of hits exceeds the threshold, all the requests in that window will be discarded.

Consider the below example where one time frame can allow max of 3 requests in a window.

Implementation:

Sliding Window Counter

The fixed window counter algorithm has a major issue: it allows more requests to go through at the edges of a window. The sliding window log algorithm fixes the issue.

Workflow:

The algorithm keeps track of request timestamps. Timestamp data is usually kept in a cache, such as sorted sets of Redis.
When a new request comes in, remove all the outdated timestamps. Outdated timestamps are defined as those older than the start of the current time window.
Add the timestamp of the new request to the log.
If the log size is the same or lower than the allowed count, a request is accepted. Otherwise, it is rejected.

In this example, the rate limiter allows 2 requests per minute. Usually, Linux timestamps are stored in the log. However, human-readable representation of time is used in our example for better readability.

The log is empty when a new request arrives at 1:00:01. Thus, the request is allowed.
A new request arrives at 1:00:30, the timestamp 1:00:30 is inserted into the log. After the insertion, the log size is 2, not larger than the allowed count. Thus, the request is allowed.
A new request arrives at 1:00:50, and the timestamp is inserted into the log. After the insertion, the log size is 3, larger than the allowed size 2. Therefore, this request is rejected even though the timestamp remains in the log.
A new request arrives at 1:01:40. Requests in the range [1:00:40,1:01:40) are within the latest time frame, but requests sent before 1:00:40 are outdated.
Two outdated timestamps, 1:00:01 and 1:00:30, are removed from the log. After the remove operation, the log size becomes 2; therefore, the request is accepted.

Implementation:

Important questions to be taken care of:

How we can scale this system to a distributed environment? (Refer to Sorted Sets in Redis)
Problems that we can face while scaling to a distributed environment.
- Race Condition
- Synchronisation
Refer to Redis Lua Scripts for more knowledge.

Reference:

GitHub Code Repo ✍️

Follow for more on LinkedIn 💁

LLD-7 API Rate Limiter

Written by Akhilesh Mahajan