URL Shortener with Analytics

Problem

The shortener itself is a solved problem — the interesting part is the tension it creates. A redirect must be as close to instant as possible (every millisecond is user-visible), but the whole reason to self-host one is the analytics: who clicked, when, from where, on what device. Recording analytics synchronously puts a database write on the hottest path in the system. The design goal was simple to state: the redirect must never wait for analytics.

Architecture

Two paths, deliberately unequal.

Hot path (redirect): GET /:code hits the Node service, checks an in-process LRU cache (codes are immutable once created, so cache invalidation is a non-problem), falls back to Postgres on a miss, and issues a 301. The click event — timestamp, code, referrer, coarse user-agent — is pushed onto an in-memory buffer and the response goes out. The buffer is flushed to a click_events table in batches (every second or 500 events, whichever comes first) by a background loop in the same process.

Cold path (analytics): a rollup worker aggregates click_events into hourly buckets per link — clicks, unique referrers, device split — and writes them to a link_stats table. The React dashboard reads only from rollups; raw events are pruned after 90 days. Short codes are 7-character base62 strings derived from a sequence with a random offset, which keeps them non-guessable enough without a collision-check loop.

Everything ships as two containers (API + worker) plus Postgres, composed with Docker; the demo runs the API on a single small instance.

Trade-offs

Buffered writes over write-per-click. Batching cut p99 redirect latency from ~28 ms to ~6 ms in load tests, at the cost of an honesty window: a crash can lose up to a second of click data. For analytics — not billing — that's the right trade. I wrote it down in the README so future-me doesn't "fix" it.
Postgres over a dedicated OLAP store. ClickHouse would shrug at this volume, but it's another system to run. Hourly rollups keep dashboard queries on indexed, pre-aggregated rows; Postgres comfortably handles millions of raw events before this decision needs revisiting.
301 over 302. Permanent redirects let browsers and CDNs cache, which is free speed but means cached repeat clicks go uncounted. I chose honest speed over inflated numbers; a flag flips any link to 302 when counting matters more.
In-process cache over Redis. One less moving part. The cost is a cold cache per instance after deploys — acceptable at this scale, and the Postgres fallback is still a single indexed lookup.

What I'd do differently

The in-memory event buffer was the right call until the day I wanted a second API instance — then "flush loop per process" quietly became "N flush loops with N failure modes." Next iteration moves the buffer to a proper queue (which is exactly what pushed me to build the job queue project). I'd also capture coarse geo (country from a local MaxMind lookup) from day one; it's the first thing anyone asks of link analytics and it's painful to backfill from pruned events.