How to Implement an In-Memory Rate Limiter in Rust
This is an implementation diary for AbsoluteLocalRateLimiter. I’ll build it up from scratch like we’re pairing: project setup, the data model, then the concurrency choices (DashMap + atomics), then the code.
What I wanted (in plain English)
I needed a local (in-process) limiter that can answer one question quickly: “For this key, right now, are we still under the limit for the current window?”
And if the answer is “no”, I didn’t want a boolean, I wanted something you can actually use to back off.
So the end goal became:
inc(key, rate_limit, count)to record usageis_allowed(key)that returns eitherAllowedorRejected { retry_after_ms, remaining_after_waiting }- safe under concurrency (because production doesn’t run in a single thread)
Before code: the mental model
This limiter is a sliding window limiter. For each key, we track how much traffic happened in the last window_size_seconds. Instead of storing “every request timestamp”, we store buckets (grouped increments). That keeps memory bounded and makes eviction cheap.
Also, because we constantly push new buckets at the back and evict old buckets from the front, VecDeque is the right container. A Vec is great for pushing at the end, but removing from the front shifts everything and gets expensive. VecDeque gives O(1) push/pop on both ends, which is exactly what a sliding window needs.
For a given key:
- we keep a queue of buckets (oldest at the front)
- we also keep a running
totalof counts currently in the window
When asked “are we allowed?” we:
- evict old buckets (anything outside the window)
- compare
totaltowindow_limit
Step -1: Create a tiny crate for it
If you’re reading this without any context about my larger project, don’t worry. You can follow along with a small, standalone crate.
Create a sub-crate:
cargo new --lib rate_limiter
For the rest of this post, assume all code lives under rate_limiter/src.
Here’s the small folder layout we’ll use (I’m intentionally not using long module paths in this writeup):
rate_limiter/
Cargo.toml
src/
lib.rs
common.rs
absolute_local_rate_limiter.rs
tests.rs
In src/lib.rs, we just wire modules together:
mod absolute_local_rate_limiter;
pub use absolute_local_rate_limiter::*;
mod common;
#[cfg(test)]
mod tests;
Step 0: Create the small “vocabulary” types
Before the limiter itself, I needed some shared types that make the rest of the code readable, I could have picked better names but these are the ones I am working with, you can always change them if you want. This lives in rate_limiter/src/common.rs.
InstantRate is a bucket: “some count at some timestamp”.
RateLimit is the per-key state.
RateLimitDecision is the return type of is_allowed().
use std::{collections::VecDeque, sync::atomic::AtomicU64, time::Instant};
pub(crate) struct InstantRate {
pub count: AtomicU64,
pub timestamp: Instant,
}
pub(crate) struct RateLimit {
pub limit: u64,
pub series: VecDeque<InstantRate>,
pub total: AtomicU64,
}
impl RateLimit {
pub fn new(limit: u64) -> Self {
Self {
limit,
series: VecDeque::new(),
total: AtomicU64::new(0),
}
}
}
pub enum RateLimitDecision {
Allowed,
Rejected {
window_size_seconds: u64,
retry_after_ms: u64,
remaining_after_waiting: u64,
},
}
The important detail here is that counts are atomic. That decision exists because we want a fast, low-contention inc() path.
If InstantRate.count and RateLimit.total were plain u64, every increment would require a mutable reference to the per-key state, which means an exclusive lock for that key shard. Atomics let the “merge into last bucket” case update counts while holding a non-mutable map reference.
One more (I think very practical) detail is that I use Ordering::Relaxed for these counters. I’m not using these atomics as a signaling mechanism (no “publish data then notify” pattern). I only need numerically correct increments/decrements, and the decision is derived from the current numeric totals. Relaxed atomics keep the hot path cheaper.
Step 1: Define the limiter struct + options
The options are simple and intentional:
window_size_seconds: how long the rolling window israte_group_size_ms: how aggressively to merge increments into a single bucket
#[derive(Clone, Debug)]
pub struct LocalRateLimiterOptions {
pub window_size_seconds: u64,
pub rate_group_size_ms: u16,
}
And then the limiter itself keeps per-key state in a DashMap.
use dashmap::DashMap;
use crate::{
LocalRateLimiterOptions,
common::{InstantRate, RateLimit, RateLimitDecision},
};
pub struct AbsoluteLocalRateLimiter {
window_size_seconds: u64,
rate_group_size_ms: u16,
series: DashMap<String, RateLimit>,
}
impl AbsoluteLocalRateLimiter {
pub(crate) fn new(options: LocalRateLimiterOptions) -> Self {
Self {
window_size_seconds: options.window_size_seconds,
rate_group_size_ms: options.rate_group_size_ms,
series: DashMap::new(),
}
} // end constructor
}
Why DashMap?
We need a concurrent map keyed by String. The usual Rust options are:
Mutex<HashMap<...>>: simplest, but one big lock (high contention)RwLock<HashMap<...>>: nicer for reads, but writes still block everythingDashMap<...>: sharded locking, so unrelated keys don’t fight over a single lock
DashMap is a good fit because rate limiting tends to have many keys (or at least you want the design to survive that), and you don’t want key A blocking key B just because they share a global mutex.
Under the hood, DashMap shards its storage and locks per shard. That gives you a nice property: operations on unrelated keys can proceed in parallel. It also gives you a clean API split:
get()/contains_key()for read handlesget_mut()/entry()for write handles
And that split matters because write handles are the expensive ones. Each key gets its own RateLimit, and the rest of the story is: mutate that per-key struct safely.
Step 2: Implement inc() (and keep it cheap)
inc() does one job: record usage.
The naive version is to always push a new timestamp into a queue. The realistic version is for us to group increments that happen close together, group them into the same bucket (so we don’t build a timestamp museum).
Here’s the real inc() logic from rate_limiter/src/absolute_local_rate_limiter.rs.
use std::{sync::atomic::Ordering, time::Instant};
pub fn inc(&self, key: &str, rate_limit: u64, count: u64) {
if !self.series.contains_key(key) {
self.series
.entry(key.to_string())
.or_insert_with(|| RateLimit::new(rate_limit));
}
let Some(rate_limit_series) = self.series.get(key) else {
unreachable!("AbsoluteLocalRateLimiter::inc: key should be in map");
};
if let Some(last_entry) = rate_limit_series.series.back()
&& last_entry.timestamp.elapsed().as_millis() <= self.rate_group_size_ms as u128
{
// Cheap path: merge into latest bucket.
last_entry.count.fetch_add(count, Ordering::Relaxed);
rate_limit_series.total.fetch_add(count, Ordering::Relaxed);
} else {
// Structural path: create a new bucket (needs mutable access).
drop(rate_limit_series);
let mut rate_limit_series = self
.series
.entry(key.to_string())
.or_insert_with(|| RateLimit::new(rate_limit));
rate_limit_series.series.push_back(InstantRate {
count: count.into(),
timestamp: Instant::now(),
});
rate_limit_series.total.fetch_add(count, Ordering::Relaxed);
}
} // end method inc
The three decisions hiding inside inc()
This function looks simple, but it encodes three important choices:
1. First limit wins.
I only set RateLimit.limit when the key is first created (or_insert_with(|| RateLimit::new(rate_limit))). That means later calls to inc() don’t “upgrade” the limit. This avoids a subtle class of bugs where someone accidentally calls inc(key, 999, ...) and silently makes the limiter permissive.
2. Rate grouping is a performance knob.
rate_group_size_ms decides whether we merge into the last bucket or create a new one. This one came from a very practical “oh no” moment: under very high traffic (imagine 100k messages/sec for a single hot key), I do not want to push 100k InstantRate items into a VecDeque every second. That would blow up memory, and it would make eviction expensive because now is_allowed() has to pop through a mountain of tiny buckets. Grouping fixes that by changing the question from “how many events?” to “how much volume happened in this short time slice?”. In other words, we collapse many increments into one bucket.
This is also why I leaned on atomics (AtomicU64) for both the bucket count and the overall total:
- when a new increment falls into the current group, we can just
fetch_addthe counts - we avoid taking a mutable reference (and its exclusive shard lock) for the common case
3. Don’t take a mutable reference unless you actually need one.
Notice the split:
- when we can merge into the last bucket: we use
self.series.get(key)and only touch atomics - when we need to push a new bucket: we
drop(rate_limit_series)and reacquire mutable access withentry(...).or_insert_with(...)
This is important because a mutable reference in DashMap holds an exclusive lock on the shard. If we took get_mut() every time, even the cheap merge case would serialize more under contention. Another way to say it: most increments only need to bump numbers, not reshape the data structure. Atomics let the “bump numbers” path stay on a read handle, and we only pay for a write handle when we actually need to push a new bucket. Also, “upgrade a read lock to write lock” patterns tend to create dead-end designs. Dropping the ref and reacquiring is explicit and predictable.
Step 3: Implement is_allowed() (evict, decide, return something useful)
is_allowed() does three things:
- If we don’t know the key, we allow.
- If the window moved, we evict expired buckets and subtract their counts.
- If we’re over the cap, we return a rejection that tells you what to do next.
This is the exact shape of the final function, with the “why” baked into the names.
pub fn is_allowed(&self, key: &str) -> RateLimitDecision {
let Some(rate_limit) = self.series.get(key) else {
return RateLimitDecision::Allowed;
};
// If the oldest bucket is out of window, we evict.
let rate_limit = match rate_limit.series.front() {
None => rate_limit,
Some(instant_rate)
if instant_rate.timestamp.elapsed().as_secs() <= self.window_size_seconds =>
{
rate_limit
}
Some(_) => {
drop(rate_limit);
let Some(mut rate_limit) = self.series.get_mut(key) else {
return RateLimitDecision::Allowed;
};
while let Some(instant_rate) = rate_limit.series.front()
&& instant_rate.timestamp.elapsed().as_secs() > self.window_size_seconds
{
rate_limit.total.fetch_sub(
instant_rate.count.load(std::sync::atomic::Ordering::Relaxed),
std::sync::atomic::Ordering::Relaxed,
);
rate_limit.series.pop_front();
}
drop(rate_limit);
let Some(rate_limit) = self.series.get(key) else {
return RateLimitDecision::Allowed;
};
rate_limit
}
};
let window_limit = self.window_size_seconds * rate_limit.limit;
if rate_limit.total.load(std::sync::atomic::Ordering::Relaxed) < window_limit {
return RateLimitDecision::Allowed;
}
let (retry_after_ms, remaining_after_waiting) = match rate_limit.series.front() {
None => (0, 0),
Some(instant_rate) => {
let window_ms = self.window_size_seconds.saturating_mul(1000);
let elapsed_ms = instant_rate.timestamp.elapsed().as_millis();
let elapsed_ms = u64::try_from(elapsed_ms).unwrap_or(u64::MAX);
let retry_after_ms = window_ms.saturating_sub(elapsed_ms);
let current_total = rate_limit.total.load(std::sync::atomic::Ordering::Relaxed);
let oldest_count = instant_rate.count.load(std::sync::atomic::Ordering::Relaxed);
let remaining_after_waiting = current_total.saturating_sub(oldest_count);
(retry_after_ms, remaining_after_waiting)
}
};
RateLimitDecision::Rejected {
window_size_seconds: self.window_size_seconds,
retry_after_ms,
remaining_after_waiting,
}
} // end method is_allowed
Two notes I learned the hard way:
- eviction uses
elapsed().as_secs(), which truncates. So if you setwindow_size_seconds = 1, don’t be surprised if tests wait ~2s to reliably cross the boundary. - the rejection payload is the whole point.
retry_after_msis the “when can I try again?” andremaining_after_waitingis the “how bad is it even after waiting?”
The two paths inside is_allowed() (and why get_mut() is not the default)
If you’re new to concurrency in Rust, here’s the intuition: a mutable handle in DashMap (get_mut) holds an exclusive lock on a shard. While you’re holding it, other threads that hit keys living on that same shard can get blocked. So in is_allowed(), I keep the common case on the read path and only pay for a write lock when the data structure must actually change.
1. Read path (no eviction needed)
If the oldest bucket is still within the window, we don’t need to mutate anything. We can:
- compute
window_limit - compare it against the atomic
total - return
AllowedorRejectedwith metadata
No queue pops, no reshaping, no write lock.
2. Write path (eviction needed)
If the oldest bucket is outside the window, the queue needs cleanup. That is when we pay for get_mut():
- pop from the front until the front bucket is in-window
fetch_subfromtotalas we evict
Once eviction is done, we drop the mutable handle and go back to a normal read handle to finish the decision.
This is the same pattern as inc():
- cheap, read-oriented fast path for the common case
- write path only when the data structure must change
It’s tempting to always call get_mut() and simplify the control flow, but you’d be trading clarity for avoidable contention: the “no eviction needed” path is pure reads + atomics, and it should stay that way.
Step 4: Prove it with tests (aka: stop trusting yourself)
The tests live in rate_limiter/src/tests.rs. They’re what turned this from “it looks correct” to “it stays correct”.
Here are a few that capture the key behaviors.
Reject at the exact boundary
#[test]
fn rejects_at_exact_window_limit() {
let limiter = limiter(1, 1000);
let key = "k";
let rate_limit = 2;
limiter.inc(key, rate_limit, 1);
assert!(matches!(limiter.is_allowed(key), RateLimitDecision::Allowed));
limiter.inc(key, rate_limit, 1);
assert!(matches!(limiter.is_allowed(key), RateLimitDecision::Rejected { .. }));
}
Group increments that happen close together
#[test]
fn rate_grouping_merges_within_group() {
let limiter = limiter(1, 50);
let key = "k";
limiter.inc(key, 3, 3);
std::thread::sleep(std::time::Duration::from_millis(10));
limiter.inc(key, 3, 3);
assert!(matches!(limiter.is_allowed(key), RateLimitDecision::Rejected { .. }));
// If grouping worked, both increments share the oldest timestamp and expire together.
std::thread::sleep(std::time::Duration::from_millis(2005));
assert!(matches!(limiter.is_allowed(key), RateLimitDecision::Allowed));
}
Rejections include useful backoff signals
#[test]
fn rejected_includes_retry_after_and_remaining_after_waiting() {
let limiter = limiter(1, 10);
let key = "k";
limiter.inc(key, 5, 3);
std::thread::sleep(std::time::Duration::from_millis(20));
limiter.inc(key, 5, 4);
let decision = limiter.is_allowed(key);
let RateLimitDecision::Rejected {
window_size_seconds,
retry_after_ms,
remaining_after_waiting,
} = decision else {
panic!("expected rejected decision");
};
assert_eq!(window_size_seconds, 1);
assert!(retry_after_ms <= 1000);
assert_eq!(remaining_after_waiting, 4);
}
Where it landed
The final code isn’t trying to be fancy. It’s trying to be:
- readable on re-read
- safe under concurrency
- cheap in the common case (grouping + atomics)
- honest about backoff (rejections tell you what to do)
And if future-me ever breaks it, the tests will roast me immediately.
Note: Memory and cleanup
One limitation of this in-memory implementation is that per-key state is kept once a key is seen. If you keep receiving new keys over time, the map will grow and those last values can stick around indefinitely (which is basically a slow memory leak). In a production setup, you’d typically add an intermittent cleanup job that runs in the background (e.g., on an interval) and removes keys whose buckets are empty or whose latest timestamp is older than some TTL. I’m not implementing that cleanup loop here, but it’s worth calling out.
Related Logs
Understanding the Client-Server Model in Distributed Computing
Exploring the fundamental architecture in distributed computing that facilitates efficient allocation of tasks and workloads between clients and servers through network communication
Is Programming Enough? The Tales of a Software Engineer
Exploring the essential soft skills beyond programming that software engineers need to succeed in the business world - from communication and time management to collaboration and problem-solving