Building a Rate Limiter from Scratch in Node.js (2026)

Every API you ship without rate limiting is a ticking time bomb. One aggressive client, one bot loop, one bad actor — and your server is toast. I've seen production APIs go down because someone left a retry loop running overnight.

The good news? You don't need Redis or a third-party service to get started. You can build a solid rate limiter in pure Node.js that handles most use cases. Let's build two different algorithms and plug them into Express.

Why Rate Limiting Matters

Rate limiting isn't just about stopping attacks. It's about protecting your server from accidental overload, controlling costs on metered APIs, ensuring fair usage across clients, and giving yourself breathing room when things go sideways. If you're running any public-facing API, you need this.

Algorithm 1: Token Bucket

The token bucket is the classic approach. You have a bucket that holds tokens. Each request costs one token. Tokens refill at a fixed rate. If the bucket is empty, the request gets rejected.

Here's the implementation. The key insight is that tokens refill lazily — we calculate how many tokens should have been added since the last check, rather than running a timer:

class TokenBucket {
  constructor(maxTokens, refillRate) {
    this.maxTokens = maxTokens;
    this.tokens = maxTokens;
    this.refillRate = refillRate; // tokens per second
    this.lastRefill = Date.now();
  }

  refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.maxTokens,
      this.tokens + elapsed * this.refillRate
    );
    this.lastRefill = now;
  }

  consume() {
    this.refill();
    if (this.tokens >= 1) {
      this.tokens -= 1;
      return true; // allowed
    }
    return false; // rate limited
  }
}

Algorithm 2: Sliding Window

The sliding window tracks actual timestamps of requests. Instead of tokens, you count: how many requests has this client made in the last N milliseconds?

class SlidingWindow {
  constructor(windowMs, maxRequests) {
    this.windowMs = windowMs;
    this.maxRequests = maxRequests;
    this.requests = [];
  }

  isAllowed() {
    const now = Date.now();
    this.requests = this.requests.filter(
      ts => now - ts < this.windowMs
    );

    if (this.requests.length < this.maxRequests) {
      this.requests.push(now);
      return true;
    }
    return false;
  }
}

One thing to watch: this stores a timestamp per request, so memory grows with traffic. For high-volume APIs, switch to a sliding window counter (approximate but fixed memory) or use the token bucket instead.

Plugging It into Express

Here's an Express middleware using the token bucket. Each IP gets its own bucket, stored in a Map:

const buckets = new Map();

function rateLimiter(maxTokens, refillRate) {
  return (req, res, next) => {
    const key = req.ip;

    if (!buckets.has(key)) {
      buckets.set(key, new TokenBucket(maxTokens, refillRate));
    }

    const bucket = buckets.get(key);

    if (bucket.consume()) {
      next();
    } else {
      res.status(429).json({
        error: 'Too many requests',
        retryAfter: Math.ceil(1 / refillRate)
      });
    }
  };
}

// 10 requests max, refills 2 tokens/sec
app.use('/api', rateLimiter(10, 2));

A few things to add in production: clean up stale buckets periodically, use a better key than req.ip behind proxies (try X-Forwarded-For or an API key), and add rate limit headers so clients know their limits.

Token Bucket vs Sliding Window

Use token bucket when you want burst tolerance, care more about average rate, or need memory efficiency. Use sliding window when you need strict per-window limits, exact request counting, or simpler mental model. Both work — pick whichever matches your API's behavior.

When to Graduate to Redis

This in-memory approach has limits: single process only (multiple Node.js instances each maintain separate bucket maps), no persistence across restarts, and memory grows with unique clients. When you need distributed rate limiting across multiple servers, persistent state, or millions of unique keys — that's when you bring in Redis with something like ioredis and a Lua script for atomic operations.

Wrapping Up

Rate limiting feels complex until you build it. Two classes, one middleware, and your API actually enforces boundaries. Start with the in-memory version, and graduate to Redis when your scale demands it.