Skip to content
Article Issue #5193

AI Gateway

What to know

AI Gateway is a proxy or platform that abstracts multiple LLM inference providers behind a consistent API, enabling applications to switch models, implement fallback routing, enforce rate limits, cache responses, and capture observability data without changing application code; Application code sends requests to the AI gateway using a standardized format (often OpenAI-compatible); An AI gateway is particularly valuable for teams using more than one model provider or anticipating provider migration

AI Gateway, WikiWalls Glossary illustration

« Back to Glossary Index

AI Gateway is a proxy or platform that abstracts multiple LLM inference providers behind a consistent API, enabling applications to switch models, implement fallback routing, enforce rate limits, cache responses, and capture observability data without changing application code. It performs for LLM APIs a role similar to what API gateways do for microservices.

How it works

Application code sends requests to the AI gateway using a standardized format (often OpenAI-compatible). The gateway authenticates the request, applies routing rules based on model name, cost policy, or load, forwards the request to the selected provider, streams or returns the response, logs request and response data, and optionally applies caching for identical prompts. Fallback rules redirect traffic if a provider returns errors or exceeds latency thresholds.

Key facts

  • Open-source options: LiteLLM, LM-Router, and Portkey are widely used open-source AI gateways.
  • Managed options: Braintrust, Helicone, and cloud-native gateways from AWS and GCP offer managed gateway services.
  • Capabilities: Routing, caching, rate limiting, cost tracking, PII scrubbing, and model fallback are standard features.
  • OpenAI compatibility: Most gateways expose an OpenAI-compatible API, enabling drop-in replacement without SDK changes.

For builders

An AI gateway is particularly valuable for teams using more than one model provider or anticipating provider migration. It centralizes authentication management, provides a single point for cost monitoring and budget enforcement, and enables A/B testing of models without code deploys. Caching semantically identical prompts at the gateway layer can yield significant cost savings for high-volume applications with repetitive queries.

Sources

« Back to Definition Index
Administrator · 41 published guides · Joined 2016

Welcome to wikiwalls

The WikiWalls Journal · Free, weekly

One careful fix in your inbox each Wednesday.

No affiliate links inside the diagnosis. No sponsored "top 10". One careful fix per week — unsubscribe in one click.

No tracking pixels · No spam · Edited by a human.