Rate Limiting and Throttling for MCP Servers

Shane

2/13/2026

19 min read

Complete guide to implementing rate limiting and throttling in MCP servers, covering token bucket algorithms, sliding window counters, per-user limits, per-tool limits, connection throttling, backpressure handling, and building production rate-limited MCP server infrastructure.

mcp model-context-protocol backpressure rate-limiting throttling token bucket sliding window api limits

Rate Limiting and Throttling for MCP Servers

Overview

An MCP server without rate limiting is one aggressive LLM session away from being unusable. When an AI agent decides to call your database query tool 500 times in rapid succession, every other connected client starves. Rate limiting protects your server from overload, prevents individual clients from monopolizing resources, and keeps downstream services like databases and APIs within their own capacity. I have seen a single rogue client bring down an entire MCP deployment because nobody thought to set limits on tool invocations.

Prerequisites

Node.js 16 or later
@modelcontextprotocol/sdk package installed
Basic understanding of MCP server architecture
Familiarity with Express.js middleware patterns
Understanding of HTTP status codes (429 Too Many Requests)
Familiarity with concurrency and resource management concepts

Rate Limiting Algorithms

Token Bucket

The token bucket algorithm is the workhorse of rate limiting. A bucket starts full of tokens. Each request consumes one token. Tokens refill at a steady rate. When the bucket is empty, requests are rejected. This allows short bursts while enforcing an average rate.

// rate-limiters/token-bucket.js
function TokenBucket(options) {
  this.capacity = options.capacity || 100;
  this.refillRate = options.refillRate || 10;  // tokens per second
  this.tokens = this.capacity;
  this.lastRefill = Date.now();
}

TokenBucket.prototype.refill = function() {
  var now = Date.now();
  var elapsed = (now - this.lastRefill) / 1000;
  var newTokens = elapsed * this.refillRate;

  this.tokens = Math.min(this.capacity, this.tokens + newTokens);
  this.lastRefill = now;
};

TokenBucket.prototype.consume = function(count) {
  this.refill();
  count = count || 1;

  if (this.tokens >= count) {
    this.tokens -= count;
    return {
      allowed: true,
      remaining: Math.floor(this.tokens),
      retryAfter: 0
    };
  }

  var deficit = count - this.tokens;
  var retryAfter = Math.ceil(deficit / this.refillRate);

  return {
    allowed: false,
    remaining: 0,
    retryAfter: retryAfter
  };
};

TokenBucket.prototype.getStatus = function() {
  this.refill();
  return {
    tokens: Math.floor(this.tokens),
    capacity: this.capacity,
    refillRate: this.refillRate,
    percentFull: Math.round((this.tokens / this.capacity) * 100)
  };
};

module.exports = TokenBucket;

Sliding Window Counter

The sliding window avoids the boundary problem of fixed windows. Instead of resetting the counter at the top of each minute, it considers requests from a rolling time window.

// rate-limiters/sliding-window.js
function SlidingWindow(options) {
  this.windowMs = options.windowMs || 60000;
  this.maxRequests = options.maxRequests || 100;
  this.requests = [];
}

SlidingWindow.prototype.cleanup = function() {
  var cutoff = Date.now() - this.windowMs;
  while (this.requests.length > 0 && this.requests[0] < cutoff) {
    this.requests.shift();
  }
};

SlidingWindow.prototype.tryRequest = function() {
  this.cleanup();

  if (this.requests.length >= this.maxRequests) {
    var oldestInWindow = this.requests[0];
    var retryAfter = Math.ceil((oldestInWindow + this.windowMs - Date.now()) / 1000);

    return {
      allowed: false,
      remaining: 0,
      retryAfter: Math.max(1, retryAfter),
      resetAt: new Date(oldestInWindow + this.windowMs).toISOString()
    };
  }

  this.requests.push(Date.now());

  return {
    allowed: true,
    remaining: this.maxRequests - this.requests.length,
    retryAfter: 0
  };
};

SlidingWindow.prototype.getStatus = function() {
  this.cleanup();
  return {
    currentRequests: this.requests.length,
    maxRequests: this.maxRequests,
    windowMs: this.windowMs,
    remaining: this.maxRequests - this.requests.length
  };
};

module.exports = SlidingWindow;

Fixed Window with Atomic Counter

For simplicity and low memory usage, fixed windows work well when you do not need the precision of sliding windows.

// rate-limiters/fixed-window.js
function FixedWindow(options) {
  this.windowMs = options.windowMs || 60000;
  this.maxRequests = options.maxRequests || 100;
  this.count = 0;
  this.windowStart = Date.now();
}

FixedWindow.prototype.tryRequest = function() {
  var now = Date.now();

  // Reset window if expired
  if (now - this.windowStart >= this.windowMs) {
    this.count = 0;
    this.windowStart = now;
  }

  if (this.count >= this.maxRequests) {
    var resetAt = this.windowStart + this.windowMs;
    return {
      allowed: false,
      remaining: 0,
      retryAfter: Math.ceil((resetAt - now) / 1000),
      resetAt: new Date(resetAt).toISOString()
    };
  }

  this.count++;
  return {
    allowed: true,
    remaining: this.maxRequests - this.count,
    retryAfter: 0
  };
};

module.exports = FixedWindow;

Per-Client Rate Limiting

Client-Keyed Rate Limiter

Each connected client gets their own rate limit bucket. One noisy client cannot consume another client's quota.

// rate-limiters/per-client.js
var TokenBucket = require("./token-bucket");

function PerClientRateLimiter(options) {
  this.defaultCapacity = options.capacity || 60;
  this.defaultRefillRate = options.refillRate || 10;
  this.clients = new Map();
  this.tierLimits = options.tiers || {};

  // Cleanup stale entries every 5 minutes
  var self = this;
  this.cleanupInterval = setInterval(function() {
    self.cleanup();
  }, 300000);
}

PerClientRateLimiter.prototype.getOrCreateBucket = function(clientId, tier) {
  if (!this.clients.has(clientId)) {
    var limits = this.tierLimits[tier] || {
      capacity: this.defaultCapacity,
      refillRate: this.defaultRefillRate
    };

    this.clients.set(clientId, {
      bucket: new TokenBucket({
        capacity: limits.capacity,
        refillRate: limits.refillRate
      }),
      tier: tier || "default",
      createdAt: Date.now(),
      lastAccess: Date.now(),
      totalRequests: 0,
      rejectedRequests: 0
    });
  }

  var entry = this.clients.get(clientId);
  entry.lastAccess = Date.now();
  return entry;
};

PerClientRateLimiter.prototype.tryRequest = function(clientId, tier) {
  var entry = this.getOrCreateBucket(clientId, tier);
  var result = entry.bucket.consume(1);

  entry.totalRequests++;
  if (!result.allowed) {
    entry.rejectedRequests++;
  }

  return Object.assign({}, result, {
    clientId: clientId,
    tier: entry.tier,
    totalRequests: entry.totalRequests,
    rejectedRequests: entry.rejectedRequests
  });
};

PerClientRateLimiter.prototype.getClientStats = function(clientId) {
  var entry = this.clients.get(clientId);
  if (!entry) return null;

  return {
    clientId: clientId,
    tier: entry.tier,
    bucket: entry.bucket.getStatus(),
    totalRequests: entry.totalRequests,
    rejectedRequests: entry.rejectedRequests,
    rejectionRate: entry.totalRequests > 0
      ? ((entry.rejectedRequests / entry.totalRequests) * 100).toFixed(1) + "%"
      : "0%",
    connectedSince: new Date(entry.createdAt).toISOString()
  };
};

PerClientRateLimiter.prototype.getAllStats = function() {
  var stats = [];
  var self = this;
  this.clients.forEach(function(entry, clientId) {
    stats.push(self.getClientStats(clientId));
  });
  return stats;
};

PerClientRateLimiter.prototype.cleanup = function() {
  var staleThreshold = Date.now() - 3600000;  // 1 hour
  var removed = 0;

  this.clients.forEach(function(entry, clientId) {
    if (entry.lastAccess < staleThreshold) {
      this.clients.delete(clientId);
      removed++;
    }
  }.bind(this));

  if (removed > 0) {
    console.log("Rate limiter cleanup: removed " + removed + " stale entries");
  }
};

PerClientRateLimiter.prototype.destroy = function() {
  clearInterval(this.cleanupInterval);
  this.clients.clear();
};

module.exports = PerClientRateLimiter;

Tier Configuration

// config/rate-limit-tiers.js
var tiers = {
  free: {
    capacity: 30,       // 30 requests burst
    refillRate: 5,       // 5 requests/second sustained
    maxConcurrent: 2,    // 2 simultaneous tool calls
    description: "Free tier - basic access"
  },
  standard: {
    capacity: 100,
    refillRate: 20,
    maxConcurrent: 5,
    description: "Standard tier - production use"
  },
  premium: {
    capacity: 500,
    refillRate: 50,
    maxConcurrent: 20,
    description: "Premium tier - high volume"
  },
  internal: {
    capacity: 1000,
    refillRate: 200,
    maxConcurrent: 50,
    description: "Internal services - minimal limits"
  }
};

module.exports = tiers;

Per-Tool Rate Limiting

Tool-Specific Rate Controls

Different tools have different cost profiles. A read_file tool is cheap, but a query_database tool hits your database, and an http_get tool calls external APIs with their own rate limits. Each tool needs its own limit.

// rate-limiters/per-tool.js
var SlidingWindow = require("./sliding-window");

function PerToolRateLimiter(toolLimits) {
  this.toolLimits = toolLimits || {};
  this.globalLimiters = {};
  this.clientToolLimiters = {};
}

PerToolRateLimiter.prototype.getToolLimit = function(toolName) {
  return this.toolLimits[toolName] || {
    globalPerMinute: 1000,
    perClientPerMinute: 60,
    costWeight: 1
  };
};

PerToolRateLimiter.prototype.tryToolRequest = function(toolName, clientId) {
  var limits = this.getToolLimit(toolName);
  var results = { tool: toolName, clientId: clientId };

  // Check global tool limit
  if (!this.globalLimiters[toolName]) {
    this.globalLimiters[toolName] = new SlidingWindow({
      windowMs: 60000,
      maxRequests: limits.globalPerMinute
    });
  }

  var globalResult = this.globalLimiters[toolName].tryRequest();
  if (!globalResult.allowed) {
    results.allowed = false;
    results.reason = "global_limit";
    results.retryAfter = globalResult.retryAfter;
    results.message = "Tool '" + toolName + "' is at capacity. Try again in " + globalResult.retryAfter + "s.";
    return results;
  }

  // Check per-client tool limit
  var clientKey = clientId + ":" + toolName;
  if (!this.clientToolLimiters[clientKey]) {
    this.clientToolLimiters[clientKey] = new SlidingWindow({
      windowMs: 60000,
      maxRequests: limits.perClientPerMinute
    });
  }

  var clientResult = this.clientToolLimiters[clientKey].tryRequest();
  if (!clientResult.allowed) {
    results.allowed = false;
    results.reason = "client_limit";
    results.retryAfter = clientResult.retryAfter;
    results.message = "You have exceeded the rate limit for '" + toolName + "'. " +
                      "Limit: " + limits.perClientPerMinute + "/min. Try again in " + clientResult.retryAfter + "s.";
    return results;
  }

  results.allowed = true;
  results.remaining = clientResult.remaining;
  results.globalRemaining = globalResult.remaining;
  return results;
};

PerToolRateLimiter.prototype.getToolStats = function() {
  var stats = {};
  var self = this;

  Object.keys(this.globalLimiters).forEach(function(tool) {
    stats[tool] = {
      global: self.globalLimiters[tool].getStatus(),
      limits: self.getToolLimit(tool)
    };
  });

  return stats;
};

module.exports = PerToolRateLimiter;

Tool Limit Configuration

// config/tool-limits.js
var toolLimits = {
  // Cheap tools - high limits
  echo: {
    globalPerMinute: 5000,
    perClientPerMinute: 200,
    costWeight: 1
  },
  list_files: {
    globalPerMinute: 2000,
    perClientPerMinute: 100,
    costWeight: 1
  },

  // Medium-cost tools
  read_file: {
    globalPerMinute: 1000,
    perClientPerMinute: 60,
    costWeight: 2
  },
  query: {
    globalPerMinute: 500,
    perClientPerMinute: 30,
    costWeight: 5
  },

  // Expensive tools - strict limits
  http_get: {
    globalPerMinute: 100,
    perClientPerMinute: 10,
    costWeight: 10
  },
  ai_analyze: {
    globalPerMinute: 50,
    perClientPerMinute: 5,
    costWeight: 20
  },
  shell_exec: {
    globalPerMinute: 30,
    perClientPerMinute: 5,
    costWeight: 25
  }
};

module.exports = toolLimits;

Concurrency Limiting

Concurrent Request Throttle

Rate limiting controls requests over time. Concurrency limiting controls requests at a single point in time. Both are needed. A client sending 10 simultaneous database queries is different from sending 10 sequential ones.

// rate-limiters/concurrency.js
function ConcurrencyLimiter(options) {
  this.maxConcurrent = options.maxConcurrent || 10;
  this.maxQueue = options.maxQueue || 50;
  this.timeout = options.timeout || 30000;
  this.active = 0;
  this.queue = [];
  this.stats = {
    totalProcessed: 0,
    totalQueued: 0,
    totalRejected: 0,
    totalTimedOut: 0,
    maxConcurrentSeen: 0
  };
}

ConcurrencyLimiter.prototype.acquire = function() {
  var self = this;

  if (this.active < this.maxConcurrent) {
    this.active++;
    if (this.active > this.stats.maxConcurrentSeen) {
      this.stats.maxConcurrentSeen = this.active;
    }
    return Promise.resolve(createRelease(this));
  }

  if (this.queue.length >= this.maxQueue) {
    this.stats.totalRejected++;
    return Promise.reject(new Error(
      "Concurrency queue full. Active: " + this.active +
      ", Queued: " + this.queue.length +
      ", Max concurrent: " + this.maxConcurrent
    ));
  }

  this.stats.totalQueued++;

  return new Promise(function(resolve, reject) {
    var timer = setTimeout(function() {
      var idx = self.queue.findIndex(function(item) { return item.timer === timer; });
      if (idx > -1) {
        self.queue.splice(idx, 1);
        self.stats.totalTimedOut++;
        reject(new Error("Concurrency queue timeout after " + self.timeout + "ms"));
      }
    }, self.timeout);

    self.queue.push({
      resolve: resolve,
      timer: timer
    });
  });
};

ConcurrencyLimiter.prototype.release = function() {
  this.active--;
  this.stats.totalProcessed++;

  if (this.queue.length > 0) {
    var next = this.queue.shift();
    clearTimeout(next.timer);
    this.active++;
    next.resolve(createRelease(this));
  }
};

ConcurrencyLimiter.prototype.getStatus = function() {
  return {
    active: this.active,
    maxConcurrent: this.maxConcurrent,
    queued: this.queue.length,
    maxQueue: this.maxQueue,
    stats: this.stats
  };
};

function createRelease(limiter) {
  var released = false;
  return function() {
    if (!released) {
      released = true;
      limiter.release();
    }
  };
}

module.exports = ConcurrencyLimiter;

Per-Client Concurrency

// rate-limiters/per-client-concurrency.js
var ConcurrencyLimiter = require("./concurrency");

function PerClientConcurrency(options) {
  this.defaultMax = options.maxConcurrent || 5;
  this.tierLimits = options.tiers || {};
  this.clients = new Map();
}

PerClientConcurrency.prototype.acquire = function(clientId, tier) {
  if (!this.clients.has(clientId)) {
    var maxConcurrent = (this.tierLimits[tier] && this.tierLimits[tier].maxConcurrent) ||
                        this.defaultMax;
    this.clients.set(clientId, new ConcurrencyLimiter({
      maxConcurrent: maxConcurrent,
      maxQueue: maxConcurrent * 3,
      timeout: 30000
    }));
  }

  return this.clients.get(clientId).acquire();
};

PerClientConcurrency.prototype.getClientStatus = function(clientId) {
  var limiter = this.clients.get(clientId);
  return limiter ? limiter.getStatus() : null;
};

module.exports = PerClientConcurrency;

Backpressure Handling

Communicating Limits to Clients

When a rate limit is hit, the response should tell the client exactly when to retry and what limits are in effect.

// rate-limiters/response-headers.js
function setRateLimitHeaders(res, result, options) {
  var prefix = options.headerPrefix || "X-RateLimit";

  res.setHeader(prefix + "-Limit", options.limit || 0);
  res.setHeader(prefix + "-Remaining", result.remaining || 0);

  if (result.resetAt) {
    res.setHeader(prefix + "-Reset", result.resetAt);
  }

  if (!result.allowed) {
    res.setHeader("Retry-After", result.retryAfter || 1);
  }
}

function formatRateLimitError(result) {
  return {
    error: "rate_limit_exceeded",
    message: result.message || "Too many requests. Please slow down.",
    retryAfter: result.retryAfter,
    limit: result.limit,
    remaining: 0,
    tier: result.tier,
    tool: result.tool
  };
}

module.exports = {
  setRateLimitHeaders: setRateLimitHeaders,
  formatRateLimitError: formatRateLimitError
};

MCP-Level Backpressure

For MCP tool invocations, rate limit responses need to be MCP-compatible — they should return an error result, not an HTTP error, because the tool invocation is already inside an established SSE session.

// rate-limiters/mcp-rate-limit-wrapper.js
var PerClientRateLimiter = require("./per-client");
var PerToolRateLimiter = require("./per-tool");
var ConcurrencyLimiter = require("./concurrency");

function McpRateLimitWrapper(options) {
  this.clientLimiter = new PerClientRateLimiter({
    capacity: options.clientCapacity || 60,
    refillRate: options.clientRefillRate || 10,
    tiers: options.tiers
  });

  this.toolLimiter = new PerToolRateLimiter(options.toolLimits || {});
  this.concurrency = new ConcurrencyLimiter({
    maxConcurrent: options.maxConcurrent || 50,
    maxQueue: options.maxQueue || 100,
    timeout: options.timeout || 30000
  });
}

McpRateLimitWrapper.prototype.wrapTool = function(mcpServer, name, description, schema, handler) {
  var self = this;

  mcpServer.tool(name, description, schema, function(params, context) {
    var clientId = context.connectionId || "unknown";
    var tier = context.userTier || "default";

    // Check client rate limit
    var clientResult = self.clientLimiter.tryRequest(clientId, tier);
    if (!clientResult.allowed) {
      return {
        content: [{
          type: "text",
          text: "Rate limit exceeded. You have used your request quota. " +
                "Please wait " + clientResult.retryAfter + " seconds before trying again.\n\n" +
                "Tier: " + clientResult.tier + "\n" +
                "Total requests: " + clientResult.totalRequests
        }],
        isError: true
      };
    }

    // Check tool-specific limit
    var toolResult = self.toolLimiter.tryToolRequest(name, clientId);
    if (!toolResult.allowed) {
      return {
        content: [{
          type: "text",
          text: toolResult.message + "\n\n" +
                "Remaining global capacity: " + (toolResult.globalRemaining || "N/A")
        }],
        isError: true
      };
    }

    // Acquire concurrency slot
    return self.concurrency.acquire().then(function(release) {
      try {
        var result = handler(params, context);

        if (result && typeof result.then === "function") {
          return result.then(function(res) {
            release();
            return res;
          }).catch(function(err) {
            release();
            throw err;
          });
        }

        release();
        return result;
      } catch (err) {
        release();
        throw err;
      }
    }).catch(function(err) {
      if (err.message.indexOf("Concurrency") > -1) {
        return {
          content: [{
            type: "text",
            text: "Server is at maximum capacity. " + err.message +
                  "\nPlease wait a moment and try again."
          }],
          isError: true
        };
      }
      throw err;
    });
  });
};

McpRateLimitWrapper.prototype.getStats = function() {
  return {
    clients: this.clientLimiter.getAllStats(),
    tools: this.toolLimiter.getToolStats(),
    concurrency: this.concurrency.getStatus()
  };
};

module.exports = McpRateLimitWrapper;

Cost-Based Rate Limiting

Weighted Token Consumption

Not all tool calls are equal. A simple echo costs nothing. A database query costs CPU and I/O. An external API call costs money. Weight tokens by cost so expensive operations consume more of the client's budget.

// rate-limiters/cost-based.js
var TokenBucket = require("./token-bucket");

function CostBasedLimiter(options) {
  this.budgetPerMinute = options.budgetPerMinute || 100;
  this.clients = new Map();
  this.toolCosts = options.toolCosts || {};
  this.defaultCost = options.defaultCost || 1;
}

CostBasedLimiter.prototype.getClientBucket = function(clientId) {
  if (!this.clients.has(clientId)) {
    this.clients.set(clientId, new TokenBucket({
      capacity: this.budgetPerMinute,
      refillRate: this.budgetPerMinute / 60  // refill over 1 minute
    }));
  }
  return this.clients.get(clientId);
};

CostBasedLimiter.prototype.tryRequest = function(clientId, toolName) {
  var cost = this.toolCosts[toolName] || this.defaultCost;
  var bucket = this.getClientBucket(clientId);
  var result = bucket.consume(cost);

  return {
    allowed: result.allowed,
    remaining: result.remaining,
    retryAfter: result.retryAfter,
    cost: cost,
    tool: toolName,
    message: result.allowed
      ? null
      : "Budget exhausted. Tool '" + toolName + "' costs " + cost + " tokens. " +
        "Remaining budget: " + result.remaining + ". Retry in " + result.retryAfter + "s."
  };
};

CostBasedLimiter.prototype.getClientBudget = function(clientId) {
  var bucket = this.getClientBucket(clientId);
  var status = bucket.getStatus();
  return {
    clientId: clientId,
    budgetRemaining: status.tokens,
    budgetCapacity: status.capacity,
    percentRemaining: status.percentFull + "%"
  };
};

module.exports = CostBasedLimiter;

Cost Configuration

// config/tool-costs.js
var toolCosts = {
  // Lightweight operations (1 token)
  echo: 1,
  list_files: 1,
  get_config: 1,
  server_info: 1,

  // Read operations (2-5 tokens)
  read_file: 2,
  list_tables: 2,
  describe_table: 3,
  query: 5,

  // Write operations (5-10 tokens)
  write_file: 5,
  execute_sql: 10,

  // External API calls (10-25 tokens)
  http_get: 10,
  http_post: 15,

  // Expensive operations (20-50 tokens)
  ai_analyze: 25,
  generate_report: 30,
  bulk_import: 50
};

module.exports = toolCosts;

Complete Working Example

A production MCP server with multi-layer rate limiting: per-client, per-tool, concurrency control, and cost-based budgeting.

// server.js - Rate-Limited MCP Server
var express = require("express");
var McpServer = require("@modelcontextprotocol/sdk/server/mcp.js").McpServer;
var SSEServerTransport = require("@modelcontextprotocol/sdk/server/sse.js").SSEServerTransport;
var PerClientRateLimiter = require("./rate-limiters/per-client");
var PerToolRateLimiter = require("./rate-limiters/per-tool");
var ConcurrencyLimiter = require("./rate-limiters/concurrency");
var toolLimits = require("./config/tool-limits");
var tiers = require("./config/rate-limit-tiers");

// Rate limiters
var clientLimiter = new PerClientRateLimiter({
  capacity: 60,
  refillRate: 10,
  tiers: tiers
});

var toolLimiter = new PerToolRateLimiter(toolLimits);

var concurrency = new ConcurrencyLimiter({
  maxConcurrent: 50,
  maxQueue: 100,
  timeout: 30000
});

// Express app
var app = express();
app.use(express.json());

// MCP server
var mcpServer = new McpServer({
  name: "rate-limited-mcp",
  version: "1.0.0"
});

// Helper: wrap tool with rate limiting
function rateLimitedTool(name, description, schema, handler) {
  mcpServer.tool(name, description, schema, function(params, context) {
    var clientId = context.connectionId || "unknown";
    var tier = context.userTier || "standard";

    // Layer 1: Per-client rate limit
    var clientResult = clientLimiter.tryRequest(clientId, tier);
    if (!clientResult.allowed) {
      return {
        content: [{
          type: "text",
          text: "[Rate Limited] Client quota exceeded.\n" +
                "Tier: " + tier + "\n" +
                "Retry after: " + clientResult.retryAfter + "s\n" +
                "Total requests this session: " + clientResult.totalRequests
        }],
        isError: true
      };
    }

    // Layer 2: Per-tool rate limit
    var toolResult = toolLimiter.tryToolRequest(name, clientId);
    if (!toolResult.allowed) {
      return {
        content: [{
          type: "text",
          text: "[Rate Limited] " + toolResult.message
        }],
        isError: true
      };
    }

    // Layer 3: Concurrency limit
    return concurrency.acquire().then(function(release) {
      try {
        var result = handler(params, context);

        if (result && typeof result.then === "function") {
          return result.then(function(res) {
            release();
            return res;
          }).catch(function(err) {
            release();
            return {
              content: [{ type: "text", text: "Error: " + err.message }],
              isError: true
            };
          });
        }

        release();
        return result;
      } catch (err) {
        release();
        return {
          content: [{ type: "text", text: "Error: " + err.message }],
          isError: true
        };
      }
    }).catch(function(err) {
      return {
        content: [{
          type: "text",
          text: "[Capacity] Server at maximum concurrency. " + err.message
        }],
        isError: true
      };
    });
  });
}

// Register rate-limited tools
rateLimitedTool("echo", "Echo a message", {
  message: { type: "string" }
}, function(params) {
  return { content: [{ type: "text", text: "Echo: " + params.message }] };
});

rateLimitedTool("slow_operation", "Simulate a slow database query", {
  delayMs: { type: "number", description: "Simulated delay in milliseconds" }
}, function(params) {
  var delay = Math.min(params.delayMs || 1000, 10000);
  return new Promise(function(resolve) {
    setTimeout(function() {
      resolve({
        content: [{
          type: "text",
          text: "Operation completed after " + delay + "ms"
        }]
      });
    }, delay);
  });
});

rateLimitedTool("check_limits", "Check your current rate limit status", {}, function(params, context) {
  var clientId = context.connectionId || "unknown";
  var clientStats = clientLimiter.getClientStats(clientId);
  var toolStats = toolLimiter.getToolStats();
  var concurrencyStatus = concurrency.getStatus();

  return {
    content: [{
      type: "text",
      text: JSON.stringify({
        client: clientStats,
        concurrency: {
          active: concurrencyStatus.active,
          maxConcurrent: concurrencyStatus.maxConcurrent,
          queued: concurrencyStatus.queued
        },
        tools: toolStats
      }, null, 2)
    }]
  };
});

// Health and monitoring endpoints
app.get("/health", function(req, res) {
  res.json({ status: "healthy" });
});

app.get("/rate-limits/stats", function(req, res) {
  res.json({
    clients: clientLimiter.getAllStats(),
    tools: toolLimiter.getToolStats(),
    concurrency: concurrency.getStatus()
  });
});

app.get("/rate-limits/client/:clientId", function(req, res) {
  var stats = clientLimiter.getClientStats(req.params.clientId);
  if (!stats) {
    res.status(404).json({ error: "Client not found" });
    return;
  }
  res.json(stats);
});

// SSE endpoint
var sessions = {};

app.get("/mcp/sse", function(req, res) {
  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");
  res.setHeader("X-Accel-Buffering", "no");

  var transport = new SSEServerTransport("/mcp/messages", res);
  var sessionId = transport.sessionId;

  sessions[sessionId] = {
    transport: transport,
    connectedAt: new Date(),
    ip: req.ip
  };

  console.log("Session opened: " + sessionId +
              " (active: " + Object.keys(sessions).length + ")");

  transport.onclose = function() {
    delete sessions[sessionId];
    console.log("Session closed: " + sessionId);
  };

  mcpServer.connect(transport);
});

app.post("/mcp/messages", function(req, res) {
  var session = sessions[req.query.sessionId];
  if (!session) {
    res.status(400).json({ error: "Invalid session" });
    return;
  }
  session.transport.handlePostMessage(req, res);
});

// Start server
var port = process.env.MCP_PORT || 3100;
app.listen(port, function() {
  console.log("Rate-limited MCP server on port " + port);
  console.log("Rate limit tiers:", Object.keys(tiers).join(", "));
  console.log("Tool-specific limits:", Object.keys(toolLimits).join(", "));
  console.log("Max concurrent: " + concurrency.maxConcurrent);
});

// Graceful shutdown
process.on("SIGTERM", function() {
  clientLimiter.destroy();
  process.exit(0);
});

# Start the server
node server.js

# Check overall rate limit stats
curl -s http://localhost:3100/rate-limits/stats | jq .

# Check specific client stats
curl -s http://localhost:3100/rate-limits/client/abc123 | jq .

Output from /rate-limits/stats:

{
  "clients": [
    {
      "clientId": "conn_abc123",
      "tier": "standard",
      "bucket": {
        "tokens": 85,
        "capacity": 100,
        "refillRate": 20,
        "percentFull": 85
      },
      "totalRequests": 47,
      "rejectedRequests": 2,
      "rejectionRate": "4.3%",
      "connectedSince": "2026-02-10T10:15:00.000Z"
    }
  ],
  "tools": {
    "echo": {
      "global": { "currentRequests": 23, "maxRequests": 5000, "remaining": 4977 },
      "limits": { "globalPerMinute": 5000, "perClientPerMinute": 200 }
    },
    "query": {
      "global": { "currentRequests": 12, "maxRequests": 500, "remaining": 488 },
      "limits": { "globalPerMinute": 500, "perClientPerMinute": 30 }
    }
  },
  "concurrency": {
    "active": 3,
    "maxConcurrent": 50,
    "queued": 0,
    "maxQueue": 100,
    "stats": {
      "totalProcessed": 158,
      "totalQueued": 5,
      "totalRejected": 0,
      "totalTimedOut": 0,
      "maxConcurrentSeen": 12
    }
  }
}

Common Issues and Troubleshooting

Rate Limits Not Applying to Reconnected Clients

Client disconnects and reconnects with new session ID, bypassing rate limits

Session-based rate limiting resets when the client reconnects. Use a persistent identifier (API key, user ID, IP address) instead of session ID.

// Fix: Key rate limits by API key or user ID, not session
var clientKey = context.userId || context.apiKey || req.ip;
var result = clientLimiter.tryRequest(clientKey, tier);

Memory Growing from Rate Limiter State

Warning: High memory usage - rate limiter tracking 50,000 clients

Without cleanup, rate limiter entries accumulate forever. Implement periodic cleanup of stale entries.

// Fix: Clean up inactive client entries
setInterval(function() {
  clientLimiter.cleanup();  // Remove entries inactive for > 1 hour
}, 300000);  // Every 5 minutes

Sliding Window Using Too Much Memory

Each client window stores every request timestamp — 1000 requests = 1000 timestamps

Sliding window counters store individual timestamps. For high-volume tools, this grows linearly with request count.

// Fix: Use token bucket for high-volume tools (constant memory)
// Use sliding window only where precise counting matters
var limiter = requestsPerMinute > 500
  ? new TokenBucket({ capacity: requestsPerMinute, refillRate: requestsPerMinute / 60 })
  : new SlidingWindow({ windowMs: 60000, maxRequests: requestsPerMinute });

Concurrency Limiter Deadlock on Error

Error: Tool threw exception but concurrency slot was not released

If a tool handler throws before the release function is called, the concurrency slot leaks permanently.

// Fix: Always release in a finally-equivalent pattern
return concurrency.acquire().then(function(release) {
  try {
    var result = handler(params);
    if (result && typeof result.then === "function") {
      return result.then(function(res) { release(); return res; })
                   .catch(function(err) { release(); throw err; });
    }
    release();
    return result;
  } catch (err) {
    release();  // Critical: release on sync errors too
    throw err;
  }
});

Rate Limit Headers Not Reaching Client Through Proxy

Client sees no X-RateLimit-* headers despite server sending them

Nginx and other proxies strip custom headers by default in some configurations.

# Fix: Pass custom headers through
proxy_pass_header X-RateLimit-Limit;
proxy_pass_header X-RateLimit-Remaining;
proxy_pass_header X-RateLimit-Reset;
proxy_pass_header Retry-After;

Best Practices

Apply rate limits at multiple layers — per-client limits prevent individual abuse, per-tool limits protect expensive operations, and global limits protect the server as a whole. No single layer catches everything.
Return informative rate limit errors — tell the client which limit was hit, how long to wait, and what their current quota is. Vague "too many requests" messages lead to blind retries that make the problem worse.
Use token bucket for most scenarios — it handles bursty traffic naturally (full bucket absorbs bursts) while enforcing a sustained rate. Sliding windows are more precise but use more memory.
Weight rate limits by operation cost — an echo is not the same as a database query. Assign cost weights to each tool so cheap operations do not consume the same budget as expensive ones.
Always release concurrency slots in error paths — a leaked concurrency slot permanently reduces server capacity. Use try-catch patterns that guarantee release regardless of success or failure.
Clean up stale rate limiter entries — clients disconnect, but their rate limiter state persists in memory. Run periodic cleanup to remove entries for clients that have not been seen in over an hour.
Set concurrency limits based on downstream capacity — if your database pool has 20 connections, do not allow 100 concurrent database query tool invocations. The concurrency limit should reflect actual backend capacity.
Log rate limit events for capacity planning — track which clients hit limits, which tools are hottest, and how often the concurrency queue fills up. This data drives decisions about scaling and tier pricing.

Rate Limiting and Throttling for MCP Servers

Rate Limiting and Throttling for MCP Servers

Overview

Prerequisites

Rate Limiting Algorithms

Token Bucket

Sliding Window Counter

Fixed Window with Atomic Counter

Per-Client Rate Limiting

Client-Keyed Rate Limiter

Tier Configuration

Per-Tool Rate Limiting

Tool-Specific Rate Controls

Tool Limit Configuration

Concurrency Limiting

Concurrent Request Throttle

Per-Client Concurrency

Backpressure Handling

Communicating Limits to Clients

MCP-Level Backpressure

Cost-Based Rate Limiting

Weighted Token Consumption

Cost Configuration

Complete Working Example

Common Issues and Troubleshooting

Rate Limits Not Applying to Reconnected Clients

Memory Growing from Rate Limiter State

Sliding Window Using Too Much Memory

Concurrency Limiter Deadlock on Error

Rate Limit Headers Not Reaching Client Through Proxy

Best Practices

References

Quick Links

Need Expert Help?