All projects
In progress

Distributed Rate Limiter

Token-bucket rate limiting as a horizontally scalable service — Redis-backed, atomic via Lua, exposed as Express middleware and a standalone sidecar.

Architecture diagram for Distributed Rate Limiter

Problem

Every public API needs rate limiting, but the naive version — an in-memory counter per process — breaks the moment you run two instances behind a load balancer. A client hammering the API gets 2 × limit because each instance keeps its own books. Sticky sessions paper over it until a deploy reshuffles connections; client-side throttling is a suggestion, not a control.

I wanted a limiter that behaves like one global bucket no matter how many API replicas exist, adds well under a millisecond of overhead inside the same VPC, and fails in a way I choose (fail-open for availability, fail-closed for abuse-sensitive endpoints).

Architecture

The service sits between the API gateway and application handlers. Each incoming request resolves to a rate-limit key (API key, user id, or IP as a fallback) and the limiter answers a single question: allow or deny, and if denied, retry when?

State lives in Redis. Each key maps to a token bucket stored as a small hash (tokens, last_refill). The refill-and-take logic runs as a Lua script via EVALSHA, so check-and-decrement is atomic — no read-modify-write race between replicas:

local tokens = tonumber(redis.call('HGET', KEYS[1], 'tokens') or capacity)
local elapsed = now - tonumber(redis.call('HGET', KEYS[1], 'last_refill') or now)
tokens = math.min(capacity, tokens + elapsed * refill_rate)
if tokens < 1 then return {0, tokens} end
redis.call('HSET', KEYS[1], 'tokens', tokens - 1, 'last_refill', now)
return {1, tokens - 1}

The Node service wraps this in two shapes: an Express middleware for apps that want it in-process, and a tiny HTTP/gRPC sidecar for everything else. Buckets carry a TTL slightly above the full-refill time, so idle keys expire and memory stays proportional to active clients, not all clients ever seen.

Deployment target is AWS: ElastiCache for Redis, the sidecar on ECS, limits configured per route via a small JSON policy file baked into the image.

Trade-offs

  • Token bucket over sliding-window log. The log gives perfect accuracy but costs O(requests) memory per key. The bucket is O(1) and allows short bursts up to capacity — which is actually the behavior most APIs want.
  • Centralized Redis over local counters with gossip. A network hop per request (~0.3 ms in-VPC) buys exact global limits and trivially correct semantics. Local-first designs are faster but only approximately correct.
  • Fail-open by default. If Redis is unreachable, requests pass and a counter increments. Rate limiting protects capacity; it shouldn't become the outage. Auth-sensitive routes can opt into fail-closed per policy.
  • EVALSHA over MULTI/EXEC. The whole decision is one round trip and the script is the single source of truth for bucket math — at the cost of Lua living in the codebase and needing its own tests.

What I'd do differently

Start with the policy model, not the bucket math. The algorithm took a day; deciding how limits compose (per-key and per-route and global) is the actual design problem, and I retrofitted it. I'd also add a local short-circuit cache for keys that are already hard-denied — denying a banned client shouldn't cost a Redis round trip. Still on the roadmap: Redis Cluster support (hash-tagging keys so a bucket never splits across slots) and a small dashboard streaming deny rates per route.