Article Issue #5196

Streaming Completions

What to know

Streaming Completions is a delivery mode for LLM API responses where the server sends each generated token or token chunk to the client as soon as it is produced, using Server-Sent Events (SSE) or chunked HTTP transfer encoding; The client sets stream=true in the API request; Streaming should be the default for any user-facing generation feature where responses are longer than a few sentences

Wikiwalls Team Administrator

May 15, 2026 2 min read

« Back to Glossary Index

Streaming Completions is a delivery mode for LLM API responses where the server sends each generated token or token chunk to the client as soon as it is produced, using Server-Sent Events (SSE) or chunked HTTP transfer encoding. The client can begin rendering output to users before generation is finished, dramatically improving perceived responsiveness for long responses.

How it works

The client sets stream=true in the API request. The server sends a series of delta events, each containing one or more newly generated tokens, followed by a final done event. The client accumulates the deltas to reconstruct the full response. For structured output and function calling, streaming requires the client to buffer and parse incomplete JSON before the final token arrives.

Key facts

Protocol: Server-Sent Events (SSE) over HTTP/1.1 or HTTP/2 is the standard delivery mechanism.
TTFT vs. TPS: Streaming exposes time-to-first-token as a distinct latency metric separate from total generation time.
Structured output: Streaming JSON responses requires handling partial JSON parsing; most SDKs provide helpers for this.
Cancellation: Clients can cancel the stream mid-generation to avoid paying for tokens the user dismissed, reducing costs.

For builders

Streaming should be the default for any user-facing generation feature where responses are longer than a few sentences. The reduction in perceived latency significantly improves user experience and completion rates. For backend pipelines processing completions programmatically rather than displaying them, non-streaming is simpler and equally efficient. Always implement stream cancellation to avoid charging full completion costs when users navigate away.

Sources

« Back to Definition Index

If this saved you an afternoon — and we will send the next one straight to your inbox.

Wikiwalls Team

Administrator · 41 published guides · Joined 2016

Welcome to wikiwalls

How it works

Key facts

For builders

Sources

More from WikiWalls

Cursor vs Copilot vs Cody vs Windsurf, after a 30-day production diary

The Cheapest Production-Grade LLM, ranked at constant output quality

Best Mini-PC for Homelab: Beelink, Minisforum, GMKtec Tested

Best AI Note Apps: Mem vs Reflect vs Tana vs Saner.ai

One careful fix in your inbox each Wednesday.