Using Rate Limiting Algorithms for Data Processing Pipelines

Hello, I’m Valerio, founder and CTO at Inspector.

You may have heard of rate limiting associated with API consumption. In this article, I’ll show you a more complex use of this component, using it to coordinate data ingestion pipelines.

As a building inspector, I learn a lot about data-intensive applications, and pipelines are one of the most critical components of their internal architecture.

The architecture of a data-intensive application can be simplified with the diagram below:

Large and unpredictable volumes of incoming data require a well-designed data processing pipeline to ingest this data without disrupting the entire system with each spike in incoming data.

While the Ingest node and the Ingest pipeline can easily scale horizontally with an “auto scaling” setup in your cloud platform (Google Cloud, AWS, Azure, or Digital Ocean, provide this functionality), or using modern serverless infrastructure, for the datastore, it’s not so easy.

Databases are often the real bottleneck for data-intensive systems because they have to support a large number of write requests per second.

Write requests can hardly be scaled.

I talked about database scalability in a recent article:

Yes, there are many technologies out there that claim their ability for “infinite scale”. Think Elastic, Scilla DB, SingleStore, Rockset, MongoDB and many more. Maybe technically they can do it without a problem, but that the costs are compatible with your business constraints is far from obvious.

Here is the flow limiter.

What is rate limiting and how do I use it in data processing pipelines?

In Inspector, the rate limiter protects the data store from inadvertent or malicious overuse by limiting the rate at which an application can store monitoring data.

Without rate limiting, each application can make a request as often as it wants, resulting in “spikes” of requests that starve other consumers. Once enabled, rate limiting can only make a fixed number of write requests per second to the Datastore. A rate limiting algorithm automates the process.


But a surveillance system cannot lose data. This would amount to generating false metrics. But at the same time, it should be able to store all the data without breaking down the whole system at a reasonable cost.

For this reason, requests that exceed the limit are not lost, but are rescheduled in the message queue, waiting for a window of time with free capacity.

Fixed window

The fixed window algorithm divides the timeline into windows of fixed size and assigns a counter to each window.

Each request, based on its arrival time, is mapped to a window. If the counter in the window has reached the limit, requests falling within that window should be rejected.

The current timestamp floor usually sets the windows, so if we set the window size to 1 minute. Then the windows are (12:00:00 – 12:01:00), (12:01:00 – 12:02:00)etc

Suppose the limit is 2 requests per minute:


The request at 00:00:24 and 00:00:36 increases the window counter to 2. The next request that arrives at 00:00:49 is rescheduled because the counter has exceeded the limit. Then the request arriving at 00:01:12 can be served because it belongs to a new window.

There are two main drawbacks to this algorithm:

Many consumers are waiting for a reset window

If a window becomes too busy, the entire capacity can be consumed in a single second, overloading the system (eg during peak hours like the Black Friday sale).

A burst of traffic that occurs near the edge of a window can double the rate of requests being processed

Suppose the counter is empty and 10 request peaks arrive at 00:00:59, they will be accepted. If another spike of 10 requests arrives at 00:01:00, they would also be accepted since this is a new window and the counter will be set to 0 for this window. This would mean that the server is now processing 20 requests in seconds (not really 10 requests/minute).

Sliding window

The sliding window counter is similar to the fixed window, but it dampens traffic bursts near the boundary by adding a weighted number in the previous window to the number in the current window.

Let me show you a concrete example.


Suppose a new request arrives at “1:15”. To decide whether we should accept this request or refuse it, we will base ourselves on the approximation.

The current rate will be calculated taking into account the weighted sum below:

limit = 100 requests/hour

rate = 84 * ((60-15)/60) + 36
     = 84 * 0.75 + 36
     = 99

rate < 100
    hence, the request will be accepted.


As discussed in this article, we didn’t use rate limiting to control incoming traffic to a public API, but we used it internally to protect the data store against data bursts.

We started with the fixed window and now we have moved to the sliding window algorithm improving the speed at which developers see the data available in their dashboard.

New to Inspector?

Are you looking for a “code-driven” monitoring tool instead of having to install server-level stuff?

Get a monitoring environment purpose-built for software developers without any server or infrastructure setup.

With Inspector, you’ll never install server-level stuff or make complex configurations in your cloud infrastructure.

Inspector works with a lightweight software library that you can install into your application like any other dependency. Learn about the supported technology on our GitHub (

Visit our website for more details:

Also posted here

Comments are closed.