Article Issue #5200

Code Completion (AI)

What to know

Code Completion (AI) is the real-time generation of code suggestions by a language model based on the current cursor position, surrounding code context, and optionally repository-level semantic information; The editor sends a fill-in-the-middle (FIM) prompt, including the code prefix before the cursor and the code suffix after it, to the model; For teams building internal coding assistant tools or integrating code completion into developer platforms, latency is the dominant quality metric: even highly accurate suggestions lose adoption if they arrive too late in the editing flow

Wikiwalls Team Administrator

May 15, 2026 2 min read

« Back to Glossary Index

Code Completion (AI) is the real-time generation of code suggestions by a language model based on the current cursor position, surrounding code context, and optionally repository-level semantic information. Completions range from inline word-level autocomplete to multi-line function generation displayed as ghost text that the developer accepts or ignores.

How it works

The editor sends a fill-in-the-middle (FIM) prompt, including the code prefix before the cursor and the code suffix after it, to the model. The model generates the most likely continuation for the middle. FIM training, where models are specifically trained on prefix-suffix-middle triples, significantly improves completion quality compared to left-to-right generation alone.

Key facts

FIM format: Fill-in-the-middle uses special tokens to delimit prefix, suffix, and middle sections in the prompt.
Latency requirement: Completions must appear within 100-200ms to feel instantaneous; specialized smaller models or cached prefixes are used to meet this.
Context sources: Open files, recent edits, linter diagnostics, and retrieved similar code from the repo all improve suggestion relevance.
Ghost text UX: Suggestions appear as dimmed inline text; Tab accepts the full suggestion, and Escape dismisses it.

For builders

For teams building internal coding assistant tools or integrating code completion into developer platforms, latency is the dominant quality metric: even highly accurate suggestions lose adoption if they arrive too late in the editing flow. Deploying smaller distilled models or using prompt caching on frequently seen file headers can keep completion latency under the perceptible threshold.