February 14, 2026 - David Oyinbo

How to Implement an In-Memory Rate Limiter in Rust

This is an implementation diary for AbsoluteLocalRateLimiter. I’ll build it up from scratch like we’re pairing: project setup, the data model, then the concurrency choices (DashMap + atomics), then the code.

RustRate LimitingConcurrencyDashMapSystems

What I wanted (in plain English)

I needed a local (in-process) limiter that can answer one question quickly: “For this key, right now, are we still under the limit for the current window?”

And if the answer is “no”, I didn’t want a boolean, I wanted something you can actually use to back off.

So the end goal became:

inc(key, rate_limit, count) to record usage
is_allowed(key) that returns either Allowed or Rejected { retry_after_ms, remaining_after_waiting }
safe under concurrency (because production doesn’t run in a single thread)

Before code: the mental model

This limiter is a sliding window limiter. For each key, we track how much traffic happened in the last window_size_seconds. Instead of storing “every request timestamp”, we store buckets (grouped increments). That keeps memory bounded and makes eviction cheap.

Also, because we constantly push new buckets at the back and evict old buckets from the front, VecDeque is the right container. A Vec is great for pushing at the end, but removing from the front shifts everything and gets expensive. VecDeque gives O(1) push/pop on both ends, which is exactly what a sliding window needs.

For a given key:

we keep a queue of buckets (oldest at the front)
we also keep a running total of counts currently in the window

When asked “are we allowed?” we:

evict old buckets (anything outside the window)
compare total to window_limit

Step -1: Create a tiny crate for it

If you’re reading this without any context about my larger project, don’t worry. You can follow along with a small, standalone crate.

Create a sub-crate:

cargo new --lib rate_limiter

For the rest of this post, assume all code lives under rate_limiter/src.

Here’s the small folder layout we’ll use (I’m intentionally not using long module paths in this writeup):

rate_limiter/
  Cargo.toml
  src/
    lib.rs
    common.rs
    absolute_local_rate_limiter.rs
    tests.rs

In src/lib.rs, we just wire modules together:

rate_limiter/src/lib.rsrust

mod absolute_local_rate_limiter;
pub use absolute_local_rate_limiter::*;

mod common;

#[cfg(test)]
mod tests;

Step 0: Create the small “vocabulary” types

Before the limiter itself, I needed some shared types that make the rest of the code readable, I could have picked better names but these are the ones I am working with, you can always change them if you want. This lives in rate_limiter/src/common.rs.

InstantRate is a bucket: “some count at some timestamp”.

RateLimit is the per-key state.

RateLimitDecision is the return type of is_allowed().

rate_limiter/src/common.rsrust

use std::{collections::VecDeque, sync::atomic::AtomicU64, time::Instant};

pub(crate) struct InstantRate {
    pub count: AtomicU64,
    pub timestamp: Instant,
}

pub(crate) struct RateLimit {
    pub limit: u64,
    pub series: VecDeque<InstantRate>,
    pub total: AtomicU64,
}

impl RateLimit {
    pub fn new(limit: u64) -> Self {
        Self {
            limit,
            series: VecDeque::new(),
            total: AtomicU64::new(0),
        }
    }
}

pub enum RateLimitDecision {
    Allowed,
    Rejected {
        window_size_seconds: u64,
        retry_after_ms: u64,
        remaining_after_waiting: u64,
    },
}

The important detail here is that counts are atomic. That decision exists because we want a fast, low-contention inc() path.

If InstantRate.count and RateLimit.total were plain u64, every increment would require a mutable reference to the per-key state, which means an exclusive lock for that key shard. Atomics let the “merge into last bucket” case update counts while holding a non-mutable map reference.

One more (I think very practical) detail is that I use Ordering::Relaxed for these counters. I’m not using these atomics as a signaling mechanism (no “publish data then notify” pattern). I only need numerically correct increments/decrements, and the decision is derived from the current numeric totals. Relaxed atomics keep the hot path cheaper.

Step 1: Define the limiter struct + options

The options are simple and intentional:

window_size_seconds: how long the rolling window is
rate_group_size_ms: how aggressively to merge increments into a single bucket

rate_limiter/src/absolute_local_rate_limiter.rsrust

#[derive(Clone, Debug)]
pub struct LocalRateLimiterOptions {
    pub window_size_seconds: u64,
    pub rate_group_size_ms: u16,
}

And then the limiter itself keeps per-key state in a DashMap.

rate_limiter/src/absolute_local_rate_limiter.rsrust

use dashmap::DashMap;

use crate::{
    LocalRateLimiterOptions,
    common::{InstantRate, RateLimit, RateLimitDecision},
};

pub struct AbsoluteLocalRateLimiter {
    window_size_seconds: u64,
    rate_group_size_ms: u16,
    series: DashMap<String, RateLimit>,
}

impl AbsoluteLocalRateLimiter {
    pub(crate) fn new(options: LocalRateLimiterOptions) -> Self {
        Self {
            window_size_seconds: options.window_size_seconds,
            rate_group_size_ms: options.rate_group_size_ms,
            series: DashMap::new(),
        }
    } // end constructor
}

Why `DashMap`?

We need a concurrent map keyed by String. The usual Rust options are:

Mutex<HashMap<...>>: simplest, but one big lock (high contention)
RwLock<HashMap<...>>: nicer for reads, but writes still block everything
DashMap<...>: sharded locking, so unrelated keys don’t fight over a single lock

DashMap is a good fit because rate limiting tends to have many keys (or at least you want the design to survive that), and you don’t want key A blocking key B just because they share a global mutex.

Under the hood, DashMap shards its storage and locks per shard. That gives you a nice property: operations on unrelated keys can proceed in parallel. It also gives you a clean API split:

get() / contains_key() for read handles
get_mut() / entry() for write handles

And that split matters because write handles are the expensive ones. Each key gets its own RateLimit, and the rest of the story is: mutate that per-key struct safely.

Step 2: Implement `inc()` (and keep it cheap)

inc() does one job: record usage.

The naive version is to always push a new timestamp into a queue. The realistic version is for us to group increments that happen close together, group them into the same bucket (so we don’t build a timestamp museum).

Here’s the real inc() logic from rate_limiter/src/absolute_local_rate_limiter.rs.

rate_limiter/src/absolute_local_rate_limiter.rsrust

use std::{sync::atomic::Ordering, time::Instant};

pub fn inc(&self, key: &str, rate_limit: u64, count: u64) {
    if !self.series.contains_key(key) {
        self.series
            .entry(key.to_string())
            .or_insert_with(|| RateLimit::new(rate_limit));
    }

    let Some(rate_limit_series) = self.series.get(key) else {
        unreachable!("AbsoluteLocalRateLimiter::inc: key should be in map");
    };

    if let Some(last_entry) = rate_limit_series.series.back()
        && last_entry.timestamp.elapsed().as_millis() <= self.rate_group_size_ms as u128
    {
        // Cheap path: merge into latest bucket.
        last_entry.count.fetch_add(count, Ordering::Relaxed);
        rate_limit_series.total.fetch_add(count, Ordering::Relaxed);
    } else {
        // Structural path: create a new bucket (needs mutable access).
        drop(rate_limit_series);

        let mut rate_limit_series = self
            .series
            .entry(key.to_string())
            .or_insert_with(|| RateLimit::new(rate_limit));

        rate_limit_series.series.push_back(InstantRate {
            count: count.into(),
            timestamp: Instant::now(),
        });

        rate_limit_series.total.fetch_add(count, Ordering::Relaxed);
    }
} // end method inc

The three decisions hiding inside `inc()`

This function looks simple, but it encodes three important choices:

1. First limit wins.

I only set RateLimit.limit when the key is first created (or_insert_with(|| RateLimit::new(rate_limit))). That means later calls to inc() don’t “upgrade” the limit. This avoids a subtle class of bugs where someone accidentally calls inc(key, 999, ...) and silently makes the limiter permissive.

2. Rate grouping is a performance knob.

rate_group_size_ms decides whether we merge into the last bucket or create a new one. This one came from a very practical “oh no” moment: under very high traffic (imagine 100k messages/sec for a single hot key), I do not want to push 100k InstantRate items into a VecDeque every second. That would blow up memory, and it would make eviction expensive because now is_allowed() has to pop through a mountain of tiny buckets. Grouping fixes that by changing the question from “how many events?” to “how much volume happened in this short time slice?”. In other words, we collapse many increments into one bucket.

This is also why I leaned on atomics (AtomicU64) for both the bucket count and the overall total:

when a new increment falls into the current group, we can just fetch_add the counts
we avoid taking a mutable reference (and its exclusive shard lock) for the common case

3. Don’t take a mutable reference unless you actually need one.

Notice the split:

when we can merge into the last bucket: we use self.series.get(key) and only touch atomics
when we need to push a new bucket: we drop(rate_limit_series) and reacquire mutable access with entry(...).or_insert_with(...)

This is important because a mutable reference in DashMap holds an exclusive lock on the shard. If we took get_mut() every time, even the cheap merge case would serialize more under contention. Another way to say it: most increments only need to bump numbers, not reshape the data structure. Atomics let the “bump numbers” path stay on a read handle, and we only pay for a write handle when we actually need to push a new bucket. Also, “upgrade a read lock to write lock” patterns tend to create dead-end designs. Dropping the ref and reacquiring is explicit and predictable.

Step 3: Implement `is_allowed()` (evict, decide, return something useful)

is_allowed() does three things:

If we don’t know the key, we allow.
If the window moved, we evict expired buckets and subtract their counts.
If we’re over the cap, we return a rejection that tells you what to do next.

This is the exact shape of the final function, with the “why” baked into the names.

rate_limiter/src/absolute_local_rate_limiter.rsrust

pub fn is_allowed(&self, key: &str) -> RateLimitDecision {
    let Some(rate_limit) = self.series.get(key) else {
        return RateLimitDecision::Allowed;
    };

    // If the oldest bucket is out of window, we evict.
    let rate_limit = match rate_limit.series.front() {
        None => rate_limit,
        Some(instant_rate)
            if instant_rate.timestamp.elapsed().as_secs() <= self.window_size_seconds =>
        {
            rate_limit
        }
        Some(_) => {
            drop(rate_limit);

            let Some(mut rate_limit) = self.series.get_mut(key) else {
                return RateLimitDecision::Allowed;
            };

            while let Some(instant_rate) = rate_limit.series.front()
                && instant_rate.timestamp.elapsed().as_secs() > self.window_size_seconds
            {
                rate_limit.total.fetch_sub(
                    instant_rate.count.load(std::sync::atomic::Ordering::Relaxed),
                    std::sync::atomic::Ordering::Relaxed,
                );

                rate_limit.series.pop_front();
            }

            drop(rate_limit);

            let Some(rate_limit) = self.series.get(key) else {
                return RateLimitDecision::Allowed;
            };

            rate_limit
        }
    };

    let window_limit = self.window_size_seconds * rate_limit.limit;

    if rate_limit.total.load(std::sync::atomic::Ordering::Relaxed) < window_limit {
        return RateLimitDecision::Allowed;
    }

    let (retry_after_ms, remaining_after_waiting) = match rate_limit.series.front() {
        None => (0, 0),
        Some(instant_rate) => {
            let window_ms = self.window_size_seconds.saturating_mul(1000);
            let elapsed_ms = instant_rate.timestamp.elapsed().as_millis();
            let elapsed_ms = u64::try_from(elapsed_ms).unwrap_or(u64::MAX);
            let retry_after_ms = window_ms.saturating_sub(elapsed_ms);

            let current_total = rate_limit.total.load(std::sync::atomic::Ordering::Relaxed);
            let oldest_count = instant_rate.count.load(std::sync::atomic::Ordering::Relaxed);
            let remaining_after_waiting = current_total.saturating_sub(oldest_count);

            (retry_after_ms, remaining_after_waiting)
        }
    };

    RateLimitDecision::Rejected {
        window_size_seconds: self.window_size_seconds,
        retry_after_ms,
        remaining_after_waiting,
    }
} // end method is_allowed

Two notes I learned the hard way:

eviction uses elapsed().as_secs(), which truncates. So if you set window_size_seconds = 1, don’t be surprised if tests wait ~2s to reliably cross the boundary.
the rejection payload is the whole point. retry_after_ms is the “when can I try again?” and remaining_after_waiting is the “how bad is it even after waiting?”

The two paths inside `is_allowed()` (and why `get_mut()` is not the default)

If you’re new to concurrency in Rust, here’s the intuition: a mutable handle in DashMap (get_mut) holds an exclusive lock on a shard. While you’re holding it, other threads that hit keys living on that same shard can get blocked. So in is_allowed(), I keep the common case on the read path and only pay for a write lock when the data structure must actually change.

1. Read path (no eviction needed)

If the oldest bucket is still within the window, we don’t need to mutate anything. We can:

compute window_limit
compare it against the atomic total
return Allowed or Rejected with metadata

No queue pops, no reshaping, no write lock.

2. Write path (eviction needed)

If the oldest bucket is outside the window, the queue needs cleanup. That is when we pay for get_mut():

pop from the front until the front bucket is in-window
fetch_sub from total as we evict

Once eviction is done, we drop the mutable handle and go back to a normal read handle to finish the decision.

This is the same pattern as inc():

cheap, read-oriented fast path for the common case
write path only when the data structure must change

It’s tempting to always call get_mut() and simplify the control flow, but you’d be trading clarity for avoidable contention: the “no eviction needed” path is pure reads + atomics, and it should stay that way.

Step 4: Prove it with tests (aka: stop trusting yourself)

The tests live in rate_limiter/src/tests.rs. They’re what turned this from “it looks correct” to “it stays correct”.

Here are a few that capture the key behaviors.

Reject at the exact boundary

rate_limiter/src/tests.rsrust

#[test]
fn rejects_at_exact_window_limit() {
    let limiter = limiter(1, 1000);
    let key = "k";
    let rate_limit = 2;

    limiter.inc(key, rate_limit, 1);
    assert!(matches!(limiter.is_allowed(key), RateLimitDecision::Allowed));

    limiter.inc(key, rate_limit, 1);
    assert!(matches!(limiter.is_allowed(key), RateLimitDecision::Rejected { .. }));
}

Group increments that happen close together

rate_limiter/src/tests.rsrust

#[test]
fn rate_grouping_merges_within_group() {
    let limiter = limiter(1, 50);
    let key = "k";

    limiter.inc(key, 3, 3);
    std::thread::sleep(std::time::Duration::from_millis(10));
    limiter.inc(key, 3, 3);
    assert!(matches!(limiter.is_allowed(key), RateLimitDecision::Rejected { .. }));

    // If grouping worked, both increments share the oldest timestamp and expire together.
    std::thread::sleep(std::time::Duration::from_millis(2005));
    assert!(matches!(limiter.is_allowed(key), RateLimitDecision::Allowed));
}

Rejections include useful backoff signals

rate_limiter/src/tests.rsrust

#[test]
fn rejected_includes_retry_after_and_remaining_after_waiting() {
    let limiter = limiter(1, 10);
    let key = "k";

    limiter.inc(key, 5, 3);
    std::thread::sleep(std::time::Duration::from_millis(20));
    limiter.inc(key, 5, 4);

    let decision = limiter.is_allowed(key);
    let RateLimitDecision::Rejected {
        window_size_seconds,
        retry_after_ms,
        remaining_after_waiting,
    } = decision else {
        panic!("expected rejected decision");
    };

    assert_eq!(window_size_seconds, 1);
    assert!(retry_after_ms <= 1000);
    assert_eq!(remaining_after_waiting, 4);
}

Where it landed

The final code isn’t trying to be fancy. It’s trying to be:

readable on re-read
safe under concurrency
cheap in the common case (grouping + atomics)
honest about backoff (rejections tell you what to do)

And if future-me ever breaks it, the tests will roast me immediately.

Note: Memory and cleanup

One limitation of this in-memory implementation is that per-key state is kept once a key is seen. If you keep receiving new keys over time, the map will grow and those last values can stick around indefinitely (which is basically a slow memory leak). In a production setup, you’d typically add an intermittent cleanup job that runs in the background (e.g., on an interval) and removes keys whose buckets are empty or whose latest timestamp is older than some TTL. I’m not implementing that cleanup loop here, but it’s worth calling out.

Related Logs

April 21, 2024

Understanding the Client-Server Model in Distributed Computing

Exploring the fundamental architecture in distributed computing that facilitates efficient allocation of tasks and workloads between clients and servers through network communication

distributed-computingclient-servernetworking

View log

April 28, 2024

Is Programming Enough? The Tales of a Software Engineer

Exploring the essential soft skills beyond programming that software engineers need to succeed in the business world - from communication and time management to collaboration and problem-solving

software-engineeringcareer-developmentsoft-skills

View log

June 20, 2023

Embracing Perfection: A Journey into Rust Programming

Exploring Rust's core features and concepts that make it a powerful and compelling language for developers seeking memory safety and performance.

rustprogrammingbackend

View log

How to Implement an In-Memory Rate Limiter in Rust

What I wanted (in plain English)

Before code: the mental model

Step -1: Create a tiny crate for it

Step 0: Create the small “vocabulary” types

Step 1: Define the limiter struct + options

Why DashMap?

Step 2: Implement inc() (and keep it cheap)

The three decisions hiding inside inc()

1. First limit wins.

2. Rate grouping is a performance knob.

3. Don’t take a mutable reference unless you actually need one.

Step 3: Implement is_allowed() (evict, decide, return something useful)

The two paths inside is_allowed() (and why get_mut() is not the default)

1. Read path (no eviction needed)

2. Write path (eviction needed)

Step 4: Prove it with tests (aka: stop trusting yourself)

Reject at the exact boundary

Group increments that happen close together

Rejections include useful backoff signals

Where it landed

Note: Memory and cleanup

Related Logs

Understanding the Client-Server Model in Distributed Computing

Is Programming Enough? The Tales of a Software Engineer

Embracing Perfection: A Journey into Rust Programming

Let's build something together

Why `DashMap`?

Step 2: Implement `inc()` (and keep it cheap)

The three decisions hiding inside `inc()`

Step 3: Implement `is_allowed()` (evict, decide, return something useful)

The two paths inside `is_allowed()` (and why `get_mut()` is not the default)