Rate Limiting
Rate limiting is a technique API providers use to restrict the number of requests a client can make within a defined time window, such as 1,000 calls per minute per API key; When a client's request count exceeds the limit, the server returns an HTTP 429 (Too Many Requests) response, often with a Retry-After header indicating when the client may resume; Builders consuming external APIs must implement exponential backoff and respect rate limit headers to avoid service disruptions
Rate limiting is a technique API providers use to restrict the number of requests a client can make within a defined time window, such as 1,000 calls per minute per API key. It protects backend services from overload and enforces fair usage across tenants.
How it works
When a client’s request count exceeds the limit, the server returns an HTTP 429 (Too Many Requests) response, often with a Retry-After header indicating when the client may resume. Common algorithms include token bucket, leaky bucket, and fixed window counters.
Key facts
- HTTP 429: The standard status code returned when a rate limit is exceeded
- Retry-After header: Indicates the number of seconds before the client should retry
- Tier-based limits: Paid plans typically receive higher rate limits than free tiers
For builders
Builders consuming external APIs must implement exponential backoff and respect rate limit headers to avoid service disruptions. When building APIs, rate limiting per API key and per IP prevents a single bad actor from degrading service for all users.
Sources
- IETF. RFC 9110: HTTP Semantics. datatracker.ietf.org
- IETF. RFC 9112: HTTP/1.1. datatracker.ietf.org
- Fielding, R. (2000). Architectural Styles and the Design of Network-based Software Architectures (REST). UC Irvine. ics.uci.edu
- OWASP. API Security Top 10 (2023). owasp.org
- MDN Web Docs. HTTP reference. developer.mozilla.org