LLD-7 API Rate Limiter

Akhilesh Mahajan
5 min readMar 3, 2024

--

In this article, we will discuss and try to implement some rate limiter algorithms.

What is an API Rate Limiter:

In a network system, a rate limiter is used to control the rate of traffic sent by a client or a service. In the HTTP world, a rate limiter limits the number of client requests allowed to be sent over a specified period. If the API request count exceeds the threshold defined by the rate limiter, all the excess calls are blocked.

Benefits of Using Rate Limiter:

  • Prevent resource starvation caused by Denial of Service (DoS) attack.
  • Reduce cost. Limiting excess requests means fewer servers and allocating more resources to high-priority APIs.
  • Prevent servers from being overloaded.

There are different ways to implement a Rate Limiter.

  • Client Side
  • Server Side

Here, we will see the server-side implementation of the Rate Limiter.

Design:

The design for the server-side rate limiter should be like this:

Rate Limiter

While designing a rate limiter, an important question to ask ourselves is: where should the rater limiter be implemented, on the server-side or in a gateway? There is no absolute answer.

It depends on your company’s current technology stack, engineering resources, priorities, goals, etc. Here are a few general guidelines:

  • Evaluate your current technology stack, such as programming language, cache service, etc. Make sure your current programming language is efficient to implement rate limiting on the server side.
  • Identify the rate-limiting algorithm that fits your business needs. When you implement everything on the server side, you have full control of the algorithm. However, your choice might be limited if you use a third-party gateway.
  • If you have already used microservice architecture and included an API gateway in the design to perform authentication, IP whitelisting, etc., you may add a rate limiter to the API gateway.
  • Building your rate-limiting service takes time. If you do not have enough engineering resources to implement a rate limiter, a commercial API gateway is a better option.

Algorithms for Implementing Rate Limiter:

  • Token Bucket
  • Leaky Bucket
  • Fixed Window Counter
  • Sliding Window Log
  • Sliding Window Counter

In this article, we will see the explanation for every algo mentioned and implement the Token Bucket, Leaky Bucket, Fixed Window Counter, and Sliding Window Counter algorithm. The implementation is very basic, just a basic workflow.

Token Bucket Algorithm

It is the most simple and most widely used Rate Limiting Algorithm.

Workflow:

  1. There are n number of tokens present in a bucket, that has pre-defined capacity.
  2. For each API request, one token will be assigned.
  3. The bucket will be refilled at a certain defined rate, say after every 1 minute. If the bucket is full, the rest of the tokens will overflow.
  4. If the token count becomes 0 within that interval, all the requests will be discarded.
Token Bucket

Implementation:

Leaky Bucket Algorithm

The leaking bucket algorithm is similar to the token bucket except that requests are processed at a fixed rate. It is usually implemented with a first-in-first-out (FIFO) queue.

Workflow:

  1. When a request comes in, it checks if there is a space in the queue or not. If not, the request will be dropped, else the request is pushed in the queue.
  2. All the requests in the queue are processed at a constant rate.
Leaky Bucket

Implementation

Fixed Window Counter:

Workflow:

  1. As the name suggests, we will create fixed-size time frame windows, where each window has a counter responsible for counting the number of hits get in a particular time frame.
  2. If the number of hits exceeds the threshold, all the requests in that window will be discarded.

Consider the below example where one time frame can allow max of 3 requests in a window.

Fixed Window Counter

Implementation:

Sliding Window Counter

The fixed window counter algorithm has a major issue: it allows more requests to go through at the edges of a window. The sliding window log algorithm fixes the issue.

Workflow:

  • The algorithm keeps track of request timestamps. Timestamp data is usually kept in a cache, such as sorted sets of Redis.
  • When a new request comes in, remove all the outdated timestamps. Outdated timestamps are defined as those older than the start of the current time window.
  • Add the timestamp of the new request to the log.
  • If the log size is the same or lower than the allowed count, a request is accepted. Otherwise, it is rejected.
Sliding Window Counter

In this example, the rate limiter allows 2 requests per minute. Usually, Linux timestamps are stored in the log. However, human-readable representation of time is used in our example for better readability.

  • The log is empty when a new request arrives at 1:00:01. Thus, the request is allowed.
  • A new request arrives at 1:00:30, the timestamp 1:00:30 is inserted into the log. After the insertion, the log size is 2, not larger than the allowed count. Thus, the request is allowed.
  • A new request arrives at 1:00:50, and the timestamp is inserted into the log. After the insertion, the log size is 3, larger than the allowed size 2. Therefore, this request is rejected even though the timestamp remains in the log.
  • A new request arrives at 1:01:40. Requests in the range [1:00:40,1:01:40) are within the latest time frame, but requests sent before 1:00:40 are outdated.
  • Two outdated timestamps, 1:00:01 and 1:00:30, are removed from the log. After the remove operation, the log size becomes 2; therefore, the request is accepted.

Implementation:

GitHub Code Repo ✍️

Follow for more on LinkedIn 💁

--

--

Akhilesh Mahajan

Full-Stack Developer | Golang, Java, Rust, Node, React Developer | AWS☁️, Docker, Kubernetes | Passionate about distributed systems and cloud-native application