Why I Built Trypema: Rate Limiting, Rust, and One Stubborn Problem
The story behind a Rust rate limiting library I built while solving a real problem, and how I ended up creating a hybrid provider nobody asked for.
A Bit of Context
I have been building a realtime-as-a-service platform for the past year. My team is planning to launch it mid-year. Without going into too much detail, the platform needs to handle a lot of concurrent connections and operations, so performance has always been a top concern.
After a while, we got to the point where we needed to add pricing tiers. And pricing tiers, almost inevitably, lead you straight to rate limiting.
I expected this part to be straightforward. It was not.
The Post That Started Everything
While researching approaches, I came across Ably's post on distributed rate limiting. I found myself rereading certain sections. The part that stuck with me was the idea of a suppressed mode, a strategy where instead of hard-rejecting traffic at the boundary, you give systems a chance to gracefully degrade near capacity. Let the right work through while the system stays stable.
That philosophy felt right for what we were building. Realtime systems do not behave well with hard cutoffs. You want the system to start backing off before things break, not after.
Looking for an Existing Crate
So I did what any reasonable engineer does: I looked for an existing solution. The Rust ecosystem has a few rate limiter crates. The most notable distributed one was redis-cell, which implements a GCRA algorithm via Redis and is honestly well-made.
The problem was throughput. For our use case, redis-cell did not meet the requirements. And beyond throughput, I could not find anything that implemented the suppressed strategy from the Ably post. The options were mostly hard-limit crates with no concept of graceful degradation.
At that point I had two choices: settle for something that did not fully fit, or build it myself.
Building It Inside the Project
I started building the rate limiter directly inside the platform's codebase. At some point I stepped back and looked at what I had written and thought: this is actually general enough to be its own library. If it meets the requirements for our platform, it should be useful to other people too.
That was the moment Trypema went from an internal utility to a project worth shipping.
The Performance Problem
The first round of benchmarks was humbling. My local rate limiter was running at roughly half the throughput of governor, which is the most commonly used Rust rate limiter. That was not acceptable.
So I spent time optimizing. The main challenge was figuring out how to keep the count atomic with as little lock contention as possible. Once I got that right, things started moving. After several rounds of profiling and adjustments, the local provider pulled up to parity with governor. In many cases, it now beats it.
On a uniform-key workload (100k keys), Trypema hits 9.64M ops/s versus Governor's 6.28M ops/s, around 53% higher throughput. On a hot-key workload (single key, 1,000 ops/s limit), both libraries are close: Trypema at 3.49M ops/s, Governor at 3.88M ops/s, with Governor holding a slight edge on P99 latency (25µs vs 33µs). Not bad for something that started at half the speed.
You can look at the local benchmark comparison yourself.
The Redis Wall
For the Redis-backed provider, I hit a different problem. No matter what I tried, I could not get close to redis-cell's throughput. I tried everything I could think of, different script structures, different counting approaches, fewer round trips. Redis-cell was still faster. On a hot-key workload, redis-cell sits at 64k ops/s. Trypema's Redis provider lands at 47.6k ops/s. Both are network-bound, both with P99 latency in the 370-500µs range. The gap is not enormous, but it is consistent.
Then I asked myself a different question: what if I stopped trying to beat it on its own terms?
The insight was this: every pure Redis admission check requires a network round trip. That latency is a ceiling on throughput regardless of how efficient the server-side logic is. If I buffer counts locally and only sync to Redis at a configurable flush interval, I remove the per-check network cost entirely (of course, there were other huddles to jump, but this was the general philosophy). The system becomes slightly less strict at the boundary of a window, but for most real-world use cases that tradeoff is completely acceptable.
That is how the hybrid provider came to exist. Nobody asked for it. It came out of trying to solve a constraint.
One thing I learned the hard way: aggressive flushing was worse than not flushing at all. When I tried to flush as frequently as possible, the flush operations became heavy on Redis, reads in between could bottleneck, and the network overhead compounded. Pacing the flush interval properly was the thing that made the hybrid work. Once it did, the numbers were hard to ignore: 11.36M ops/s with a P99 of 1µs, compared to redis-cell's 64k ops/s at 367µs. That is roughly 170x higher throughput. The full picture is in the Redis benchmark comparison.
I will be honest: I ran these benchmarks in my local environment, and a lot can affect results at that level, including ambient temperature. That is why I ran the tests multiple times. But I genuinely think the right move is for someone to run these on a controlled EC2 instance and publish the results. I am curious what a cleaner environment looks like. The benchmark tool is included in the repo.
The Name
The name "Trypema" comes from the Koine Greek word "trypematos," meaning hole or opening. The reference is to the phrase "through the eye of a needle" from Matthew 19:24.
Rate limiting is about narrowing the gate under load. Letting the right work through while keeping the system stable. The name felt like it fit.
Where It Is Now
Trypema is out. It has a local provider, a Redis provider, a hybrid provider, and two strategies: Absolute (deterministic sliding window) and Suppressed (graceful degradation near capacity). It ships with fractional rate support, retry hints, and a cleanup loop for high-cardinality key sets.
If you are building something in Rust and need rate limiting, particularly if you are in a distributed setup or want more than a hard cutoff, give it a try. The docs have quickstarts for both local and Redis-backed setups.
And if you run those benchmarks on a proper server, let me know what you find.
Related Logs
Understanding the Client-Server Model in Distributed Computing
Exploring the fundamental architecture in distributed computing that facilitates efficient allocation of tasks and workloads between clients and servers through network communication