Building an API Gateway & Rate Limiter from Scratch in Node.js
I Built an npm Package for API Rate Limiting
I built @gagandeep023/api-gateway, an open-source npm package that adds rate limiting, API key authentication, IP filtering, real-time analytics, and a live monitoring dashboard to any Express app. One install, zero external dependencies beyond Express itself. No Redis, no database, no cache layer. Everything runs in-memory.
npm install @gagandeep023/api-gatewayThe package started as a feature inside my portfolio site (this very site, gagandeep023.com). The gateway protects every API endpoint you are hitting right now. After realizing the code had zero dependencies on portfolio-specific logic, I extracted it into a standalone package that anyone can drop into their Express app.
This post covers two things: how you can use the package in your own project, and the technical deep dive into how I built each piece from scratch.
- GitHub: github.com/Gagandeep023/api-gateway
- npm: npmjs.com/package/@gagandeep023/api-gateway
- Live demo: gagandeep023.com (the gateway protects this site)
What You Get
- Three rate limiting algorithms: Token Bucket, Sliding Window Log, and Fixed Window Counter, configurable per tier
- Two-level rate limiting: a global limit protecting infrastructure plus per-tier limits enforcing SLA boundaries
- API key authentication with configurable tiers (free, pro, unlimited) and key management endpoints
- Device-level TOTP authentication for browser-based access without manual API key entry
- IP allowlist/blocklist filtering with dynamic mode switching
- Real-time analytics engine with a 10,000-entry circular buffer for bounded memory usage
- SSE-powered React dashboard with live charts, stats, request logs, and API key management UI
- Full TypeScript support with subpath exports for backend, frontend, and types
Quick Start: 5 Minutes to a Protected API
Install the package and add three lines of setup to your Express app. The gateway creates a middleware chain that handles rate limiting, auth, and analytics automatically.
import express from 'express';
import { createGatewayMiddleware, createGatewayRoutes } from '@gagandeep023/api-gateway/backend';
const app = express();
app.use(express.json());
// Create the gateway (defaults: 100 req/min free tier, 10k global limit)
const gateway = createGatewayMiddleware();
// Mount management routes FIRST (dashboard, analytics, key management)
app.use('/api/gateway', createGatewayRoutes({
rateLimiterService: gateway.rateLimiterService,
analyticsService: gateway.analyticsService,
config: gateway.config,
}));
// Apply rate limiting to all /api routes
app.use('/api', gateway.middleware);
// Your routes go after the middleware
app.get('/api/hello', (req, res) => {
res.json({ message: 'Hello, world!' });
});
app.listen(3001);That is it. Every request to /api/* now passes through the rate limiter, gets logged to the analytics buffer, and returns standard X-RateLimit headers. The management routes at /api/gateway/* are mounted first so they bypass rate limiting. You never want the dashboard to become unusable during a rate limit storm.
Adding the Frontend Dashboard
Drop in the React component and you get a live monitoring panel with request charts, error rates, and API key management.
import { GatewayDashboard } from '@gagandeep023/api-gateway/frontend';
import '@gagandeep023/api-gateway/frontend/styles.css';
function App() {
return <GatewayDashboard apiBaseUrl="http://localhost:3001/api" />;
}The dashboard connects via SSE for live updates every 5 seconds. It shows requests per minute, top endpoints, error rate, average response time, rate limit hits, active IPs, and API key sessions. You can create and revoke API keys directly from the UI.
Custom Configuration
The defaults work out of the box, but everything is configurable. You can define custom tiers with different algorithms, set IP rules, and pre-configure API keys.
const gateway = createGatewayMiddleware({
rateLimits: {
tiers: {
free: { algorithm: 'tokenBucket', maxRequests: 100, windowMs: 60000, refillRate: 10 },
pro: { algorithm: 'slidingWindow', maxRequests: 1000, windowMs: 60000 },
unlimited: { algorithm: 'none' },
},
defaultTier: 'free',
globalLimit: { maxRequests: 10000, windowMs: 60000 },
},
ipRules: {
allowlist: [],
blocklist: ['10.0.0.1'],
mode: 'blocklist',
},
apiKeys: {
keys: [{
id: 'key_001',
key: 'gw_live_your_secret_key',
name: 'Production App',
tier: 'pro',
createdAt: new Date().toISOString(),
active: true,
}],
},
});The package uses subpath exports to keep backend and frontend concerns separate. Import from @gagandeep023/api-gateway/backend for Express middleware, @gagandeep023/api-gateway/frontend for the React dashboard, and @gagandeep023/api-gateway/types for TypeScript interfaces. A Node.js-only project never pulls in React dependencies, and a frontend-only consumer does not need Express.
How I Built It: The Architecture
Now for the technical deep dive. The gateway operates as a chain of Express middleware functions that execute before any route handler. Every inbound request passes through four stages sequentially. Each stage can short-circuit the request with an appropriate HTTP error, preventing downstream stages from executing.
Inbound HTTP Request
|
v
+------------------+
| Request Logger | <-- Records start timestamp
| (Stage 1) | Hooks into res.finish event
+------------------+
|
v
+------------------+ +---------+
| API Key Auth | --> | 401 | Invalid/revoked key
| (Stage 2) | +---------+
+------------------+
| Sets req.clientId (key ID or IP)
| Sets req.tier (free / pro / unlimited)
v
+------------------+ +---------+
| IP Filter | --> | 403 | Blocked IP or not in allowlist
| (Stage 3) | +---------+
+------------------+
|
v
+------------------+ +---------+
| Rate Limiter | --> | 429 | Limit exceeded
| (Stage 4) | +---------+
+------------------+
| Sets X-RateLimit-* headers
v
+------------------+
| Route Handler |
+------------------+
|
v
Response sent
|
v
+------------------+
| Logger Callback | <-- res.finish fires
| Logs to | Captures status, response time
| AnalyticsService| Writes to circular buffer
+------------------+The middleware order is critical. Logging comes first so every request is captured, including rejected ones. Auth comes second because the rate limiter needs the client identity and tier. IP filtering comes third to reject blocked IPs before consuming rate limit tokens. The rate limiter is last because it needs all context from prior stages.
From Portfolio Feature to Standalone Package
The extraction from my portfolio codebase was not a simple copy-paste. The original implementation read configuration from JSON files on disk. The package version needed to accept configuration as constructor arguments so consumers could pass their own tiers, IP rules, and API keys programmatically. File-based state was replaced with in-memory defaults that consumers can override. The middleware composer went from a hardcoded Express Router to a factory function (createGatewayMiddleware) that returns the middleware chain plus service instances for use in management routes.
The factory pattern was the key design choice. createGatewayMiddleware returns not just the middleware Router, but also the raw service instances: rateLimiterService, analyticsService, and the resolved config. Consumers need these to wire up the management routes. Without exposing them, you would be locked into a specific route structure.
Two-Level Rate Limiting
The system enforces rate limits at two levels. A global limit protects infrastructure (10,000 req/min shared across all clients using a Fixed Window Counter). Per-tier limits enforce SLA boundaries using the algorithm configured for that tier. Every request must pass both checks.
Request arrives with clientId + tier
|
v
+------------------+
| GLOBAL LIMIT | Fixed Window: 10,000 req/min
| (all clients) | Shared counter across all tiers
+------------------+
| |
PASS FAIL --> 429 "Rate limit exceeded"
|
v
+------------------+
| TIER LOOKUP | tier = "free" | "pro" | "unlimited"
| from config |
+------------------+
|
v
+----+----+----+
| | | |
v v v v
TB SW FW NONE
| | | |
+----+----+ +--> PASS (unlimited)
|
PASS/FAIL
|
v
Set response headers:
X-RateLimit-Limit: <tier max>
X-RateLimit-Remaining: <tokens left>
X-RateLimit-Reset: <unix timestamp>
TB = Token Bucket (free tier default)
SW = Sliding Window (pro tier default)
FW = Fixed Window (global only)
NONE = No limit (unlimited tier)Under the Hood: Three Rate Limiting Algorithms
Most rate limiting libraries give you one algorithm. I implemented three so I could pick the right one for each tier and internalize the tradeoffs. Here is how each one works and why you would choose it.
Token Bucket (Default for Free Tier)
Token Bucket is the most widely deployed algorithm. AWS API Gateway, Stripe, and GitHub all use variations of it. The mental model: a bucket holds N tokens. Each request removes one token. Tokens refill at a constant rate R per second. When the bucket is empty, requests are rejected until tokens refill.
Tokens
100 |**** ****
| * ****
75 | * ***
| * ***
50 | *** ***
| ** ***
25 | ** ***
| ** ***
0 |................***...........................
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--> Time
| Burst of | Rejected | Refilling at |
| requests | (empty) | R tokens/sec |
Capacity: 100 tokens (max burst size)
Refill Rate: 10 tokens/second
Steady-state throughput: 10 req/sec
Burst allowance: up to 100 requests instantlyThe key implementation insight: tokens is a floating-point number, not an integer. Partial refills accumulate naturally using elapsed time calculation rather than requiring a background timer. On each request, calculate seconds elapsed since last refill, multiply by refill rate, add to current tokens (capped at max). This lazy evaluation approach uses zero CPU between requests.
Per-client state is just { tokens: number, lastRefill: timestamp }. That is 16 bytes per client. Even with 100,000 concurrent clients, only 1.6 MB of memory. Stored in a Map<clientId, BucketState> with O(1) lookup.
Sliding Window Log (Default for Pro Tier)
Sliding Window Log is the most accurate algorithm but the most memory-hungry. Instead of a counter, it stores the timestamp of every single request. On each new request, it discards timestamps older than the window, counts what remains, and compares against the limit.
Window: 60 seconds Max: 5 requests in window
Timeline (seconds):
0 10 20 30 40 50 60 70 80 90
| | | | | | | | | |
Request arrives at t=75
Stored timestamps: [12, 35, 48, 62, 71]
| | | |
v v v v
EXPIRED (>60s) VALID (within window)
After cleanup: [62, 71]
Count: 2
2 < 5 --> ALLOWED, add timestamp: [62, 71, 75]
---
Request arrives at t=76
Stored timestamps: [62, 71, 75]
After cleanup: [62, 71, 75] (all within 60s)
Count: 3
3 < 5 --> ALLOWED, add: [62, 71, 75, 76]The accuracy advantage: there is no boundary problem. A client cannot exploit window edges to get double the allowed rate. The window literally slides with time. The tradeoff is memory: O(N) per client where N is the max request limit. For a pro tier with 1,000 requests/minute, that is 8 KB per client. With 10,000 pro clients, 80 MB. This is why I use Token Bucket for the high-volume free tier and reserve Sliding Window for the lower-volume pro tier.
Fixed Window Counter (Global Limit Only)
Fixed Window Counter divides time into discrete windows. Each window has a counter. When a request arrives, check if the window expired (reset if so), increment, compare against limit. Two numbers per client: count and windowStart.
Limit: 100 requests per 60-second window
Window 1 Window 2
|<--- 60 seconds --->|<--- 60 seconds --->|
Scenario: Attacker sends 100 requests at t=59,
then 100 requests at t=61
Window 1 Window 2
| ***|*** |
100 req 100 req
at t=59 at t=61
Result: 200 requests in 2 seconds, both ALLOWED
Intended limit: 100 per 60 seconds
---
Mitigation: Use as global-only limiter where
approximate enforcement is acceptable, or
combine with per-client Token Bucket.Despite the boundary problem, Fixed Window is the right choice for the global limit. A 2x burst is acceptable when the limit is already generous (10,000 req/min). Minimal memory, zero computational overhead beyond a comparison and increment.
Algorithm Comparison
+------------------+---------------+---------------+---------------+ | | Token Bucket | Sliding Window| Fixed Window | +------------------+---------------+---------------+---------------+ | Memory/client | O(1) - 16B | O(N) - 8KB* | O(1) - 16B | | Accuracy | Good | Perfect | Approximate | | Burst handling | Controlled | Strict | Boundary issue| | CPU/request | O(1) | O(N) cleanup | O(1) | | Best for | General use | Billing/SLA | Global limits | +------------------+---------------+---------------+---------------+ * N = maxRequests per window (e.g., 1000 for pro tier)
API Key Authentication
The auth middleware extracts the API key from the X-API-Key header or the apiKey query parameter. If no key is present, the request goes through as a free tier client identified by IP address. The gateway never fully blocks unauthenticated traffic, it just rate-limits it more aggressively.
Inbound Request
|
v
Extract API key from:
1. X-API-Key header
2. ?apiKey= query param
|
v
Key present?
| |
NO YES
| |
v v
Assign: Lookup key in config
clientId = req.ip |
tier = "free" +----+----+
| | |
v FOUND NOT FOUND
next() active? |
| | v
YES NO 401 "Invalid or
| | revoked API key"
v v
Assign: 401
clientId = key.id
tier = key.tier
|
v
next()Keys have fields: id (for rate limit tracking), key (the secret), name (human label), tier (rate limit tier), createdAt, and active (boolean for revocation). You can pre-configure keys in the config or create and revoke them at runtime through the management endpoints.
Device Authentication with TOTP
In v0.4.0, I added device-level authentication using Time-based One-Time Passwords. The problem it solves: browser-level access control without requiring users to manually enter API keys. A browser registers itself as a device and receives a rotating TOTP code that authenticates requests automatically.
The implementation uses HMAC-SHA256 with 1-hour time windows. When a browser registers via POST /auth/register with its browserId (a UUID generated client-side), the server stores the device and generates a shared secret. Subsequent requests include a TOTP key in the format totp_<browserId>_<code>, where the code is a 16-character hex string derived from HMAC-SHA256(secret, floor(timestamp / 3600000)).
Browser Server
| |
| POST /auth/register |
| { browserId: "uuid-1234" } |
|------------------------------------->|
| |
| 200 OK |
| { deviceId, secret, expiresAt } |
|<-------------------------------------|
| |
| GET /api/data |
| X-API-Key: totp_uuid-1234_<code> |
|------------------------------------->|
| |
| Server validates: |
| 1. Extract browserId from key |
| 2. Look up device in registry |
| 3. Compute expected TOTP code |
| 4. Timing-safe compare |
| 5. Check current + prev window |
| |
| 200 OK (authenticated) |
|<-------------------------------------|
| |Validation checks both the current and previous time window (plus or minus 1 hour) to handle clock skew and requests near window boundaries. All comparisons use crypto.timingSafeEqual to prevent timing attacks. The device registry enforces rate limits: 10 registrations per minute per IP, max 30 devices per IP. Devices expire after 1 week.
The device registry uses file-based persistence with debounced writes. Instead of writing to disk on every registration, changes are batched and flushed every 2 seconds. This drops disk I/O from potentially hundreds of writes per second to one write every 2 seconds.
IP Filtering
The IP filter supports two mutually exclusive modes. In blocklist mode (default), all IPs are allowed except those explicitly blocked. In allowlist mode, only listed IPs are allowed, everything else gets a 403. Allowlist mode is useful for internal-only APIs. Blocklist mode is the default for public-facing endpoints where you want to block specific bad actors.
Analytics Engine: Circular Buffer
The analytics service stores request logs in a circular buffer, a fixed-size array where new entries overwrite the oldest when capacity is reached. This guarantees bounded memory regardless of traffic volume. The buffer holds 10,000 entries.
Capacity: 8 slots (simplified from 10,000)
After 5 inserts: After 10 inserts (wraps around):
Index: 0 1 2 3 4 Index: 0 1 2 3 4 5 6 7
[A][B][C][D][E] [I][J][C][D][E][F][G][H]
^ ^ ^
head head (oldest = C at index 2)
count = 5 count = 8 (capped)
head = 0 head = 2 (next overwrite position)
Reading in order: C, D, E, F, G, H, I, J
(start at head, wrap around)
Memory: ALWAYS 10,000 * sizeof(RequestLog)
Never grows, never shrinksEach log entry contains: timestamp, HTTP method, path, status code, response time, clientId, IP, and API key. Analytics are computed on-the-fly by scanning the buffer. With only 10,000 entries, this completes in under 1ms. The computed metrics include requests per minute, error rate, average response time, active clients (unique IPs in last 5 min), top endpoints, and rate limit hits.
Live Dashboard with Server-Sent Events
The dashboard receives live analytics via Server-Sent Events, not WebSockets. SSE is the right choice because the data flow is strictly one-way: server pushes analytics to the browser. SSE works over standard HTTP, requires no protocol upgrade, and reconnects automatically on connection drop.
Browser Server
| |
| GET /api/gateway/analytics/live |
|----------------------------------->|
| |
| 200 OK |
| Content-Type: text/event-stream |
| Connection: keep-alive |
|<-----------------------------------|
| |
| data: {"totalRequests":42,...} | <-- Every 5 seconds
|<-----------------------------------|
| |
| data: {"totalRequests":47,...} |
|<-----------------------------------|
| |
| [Connection drops] |
| |
| GET /api/gateway/analytics/live | <-- Auto-reconnect
|----------------------------------->| (browser-native)
| |On the server, the SSE endpoint runs a 5-second interval that computes an analytics snapshot from the circular buffer and writes it as a JSON event. When the client disconnects, the close event clears the interval to prevent memory leaks. On the frontend, each message updates React state, which re-renders the stats cards and charts. A rolling 20-item history of RPM values powers the time-series line chart.
State Management: What Lives Where
IN-MEMORY (lost on restart) CONFIGURABLE (passed in)
+--------------------------+ +--------------------------+
| RateLimiterService | | Rate limit tiers |
| - Token Bucket states | | - Algorithm per tier |
| Map<clientId, { | | - Global limit config |
| tokens, lastRefill | +--------------------------+
| }> | | API keys |
| - Sliding Window states | | - Key definitions |
| Map<clientId, { | | - Tier assignments |
| timestamps[] | +--------------------------+
| }> | | IP rules |
| - Fixed Window states | | - Allowlist/blocklist |
| - rateLimitHits counter | | - Current mode |
+--------------------------+ +--------------------------+
| AnalyticsService |
| - Circular buffer | FILE-BASED (persistent)
| RequestLog[10000] | +--------------------------+
+--------------------------+ | devices.json |
| DeviceRegistryService | | - Debounced writes (2s) |
| - Active browser devices| +--------------------------+
+--------------------------+The split is intentional. Rate limit state is ephemeral because a server restart gives every client a fresh allowance, which is acceptable. Configuration is passed in by the consumer. The services are structured behind simple interfaces (checkLimit, addLog, getAnalytics), making it straightforward to swap the backing store to Redis without changing middleware or route code.
Technical Challenges I Hit
Building the algorithms was the easy part. Packaging and deploying them is where the real problems showed up.
TypeScript Subpath Exports
The hardest part of packaging was not the code, it was TypeScript's module resolution. The package.json exports field maps ./backend to the correct .js, .mjs, and .d.ts files. But TypeScript only respects these mappings when the consumer's tsconfig uses module: "Node16" or "NodeNext". With the older "CommonJS" setting, TypeScript silently cannot find the types. This forced a choice: require modern module resolution, or hack around it with wildcard types. I went with requiring Node16 and documenting it.
SSE Breaking Behind Nginx
Server-Sent Events worked perfectly in development but broke in production behind Nginx. The dashboard showed a blank analytics panel with no errors. After debugging Nginx access logs, I realized the SSE connection was established but events were accumulating in Nginx's response buffer instead of streaming to the client. The fix: X-Accel-Buffering: no, Cache-Control: no-cache, and proxy_buffering off in the Nginx location block.
EventSource Cannot Send Headers
The browser's native EventSource API does not support custom headers. When I added API key auth to the dashboard, the SSE stream broke because EventSource sent no X-API-Key header. The fix was replacing EventSource with a fetch-based SSE reader using ReadableStream and TextDecoder to manually parse the text/event-stream format. More complex, but it enables full header control.
Circular Buffer Ordering
Getting read order correct in a circular buffer is deceptively tricky. When the buffer wraps around, the oldest entry is at the head pointer, not index 0. Reading in chronological order requires slicing head-to-end, then 0-to-head, and concatenating. Off-by-one errors caused the recent requests table to show duplicates and miss entries. The fix was a dedicated getOrderedLogs method that handles the two-segment slice correctly.
Timing-Safe TOTP Comparison
Standard string comparison (===) leaks timing information. If the first character matches, the comparison takes slightly longer. An attacker can exploit this to brute-force TOTP codes one character at a time. Node's crypto.timingSafeEqual always compares all bytes regardless of mismatch position. But it requires both inputs as same-length Buffers. The TOTP validation normalizes both codes to fixed-length hex strings before comparison.
Peer Dependency Version Ranges
Express, React, and Recharts are peer dependencies to prevent version conflicts. But React 18 and 19 have different hook internals. Recharts 2 and 3 changed chart component props. The package supports both major versions of each (react ^18.0.0 || ^19.0.0, recharts ^2.0.0 || ^3.0.0) by only using APIs that exist in both versions. Getting these ranges right was trial and error.
Production Gaps
This package is built for single-process deployments. For multi-instance production, here is what you would change.
Distributed State with Redis
Replace in-memory Maps with Redis. For Token Bucket, use a Lua script to atomically refill and decrement. For Sliding Window, use a Redis sorted set with ZREMRANGEBYSCORE and ZCARD. For Fixed Window, use INCR with TTL. The service interfaces are designed so this swap does not require changing any middleware code.
Graceful Degradation
If the rate limiter throws an error, the middleware should fail open: allow the request through, log the error, and alert. A broken rate limiter should not become a denial-of-service against your own users.
Key Design Decisions
- Middleware chain over monolithic handler: Each stage is independently testable and replaceable. Single responsibility per middleware.
- Circular buffer over append-only log: Guarantees bounded memory (10,000 * ~200 bytes = 2 MB max). Tradeoff: you lose old data. For compliance, add a separate audit log.
- SSE over WebSockets: Simpler protocol, browser-native reconnection, works through HTTP proxies. Only limitation: unidirectional.
- Three algorithms instead of one: Each tier gets the right algorithm for its use case. Token Bucket for general use, Sliding Window for accuracy-sensitive tiers, Fixed Window for global limits.
- Factory function returning service instances: Consumers get both the middleware chain and raw services, so they can wire up custom management routes.
Version History
- v0.1.0: Initial release. Three rate limiting algorithms, IP filtering, API key auth, analytics circular buffer, SSE dashboard, full TypeScript support.
- v0.3.1: Replaced EventSource with fetch-based SSE reader, enabling API key auth on the live analytics stream. Added authenticated field to request logs.
- v0.4.0: Device authentication with TOTP. Browser-level device registration, 1-hour TOTP windows with HMAC-SHA256, file-based device registry with debounced writes, registration rate limiting, automatic device expiration.
What I Learned
The algorithms are surprisingly simple. Token Bucket is two numbers and a time delta. Sliding Window is an array filter. Fixed Window is a counter with a timestamp. The real complexity is in the surrounding system: middleware ordering, state isolation per client, header conventions, and making all the pieces compose cleanly.
Packaging is its own skill. Writing code that works in your own project is different from writing code that works in someone else's project. Subpath exports, peer dependencies, TypeScript module resolution, CSS bundling, dual CJS/ESM output, and version compatibility all require careful configuration that has nothing to do with the actual business logic.
Understanding these pieces at the implementation level makes you a better consumer of production tools like Redis rate limiting libraries, because you know exactly what they are doing underneath. If you want to try the package or dig into the source, everything is linked below.
Try It
npm install @gagandeep023/api-gateway- GitHub: github.com/Gagandeep023/api-gateway
- npm: npmjs.com/package/@gagandeep023/api-gateway
- Live demo: gagandeep023.com (the gateway protects this site)
Get more posts like this
I write about system design, backend engineering, and building npm packages from scratch. Follow along on Substack for new posts.
Subscribe on Substack →