Code Completion (AI)
Code Completion (AI) is the real-time generation of code suggestions by a language model based on the current cursor position, surrounding code context, and optionally repository-level semantic information; The editor sends a fill-in-the-middle (FIM) prompt, including the code prefix before the cursor and the code suffix after it, to the model; For teams building internal coding assistant tools or integrating code completion into developer platforms, latency is the dominant quality metric: even highly accurate suggestions lose adoption if they arrive too late in the editing flow
Code Completion (AI) is the real-time generation of code suggestions by a language model based on the current cursor position, surrounding code context, and optionally repository-level semantic information. Completions range from inline word-level autocomplete to multi-line function generation displayed as ghost text that the developer accepts or ignores.
How it works
The editor sends a fill-in-the-middle (FIM) prompt, including the code prefix before the cursor and the code suffix after it, to the model. The model generates the most likely continuation for the middle. FIM training, where models are specifically trained on prefix-suffix-middle triples, significantly improves completion quality compared to left-to-right generation alone.
Key facts
- FIM format: Fill-in-the-middle uses special tokens to delimit prefix, suffix, and middle sections in the prompt.
- Latency requirement: Completions must appear within 100-200ms to feel instantaneous; specialized smaller models or cached prefixes are used to meet this.
- Context sources: Open files, recent edits, linter diagnostics, and retrieved similar code from the repo all improve suggestion relevance.
- Ghost text UX: Suggestions appear as dimmed inline text; Tab accepts the full suggestion, and Escape dismisses it.
For builders
For teams building internal coding assistant tools or integrating code completion into developer platforms, latency is the dominant quality metric: even highly accurate suggestions lose adoption if they arrive too late in the editing flow. Deploying smaller distilled models or using prompt caching on frequently seen file headers can keep completion latency under the perceptible threshold.
Sources
- Chen, M., et al. (2021). Evaluating Large Language Models Trained on Code (Codex / HumanEval). arXiv:2107.03374. arxiv.org
- Roziere, B., et al. (2023). Code Llama: Open Foundation Models for Code. arXiv:2308.12950. arxiv.org
- Jimenez, C., et al. (2023). SWE-bench. arXiv:2310.06770. arxiv.org
- GitHub. (2023). The economic impact of the AI-powered developer lifecycle. github.blog
- Anthropic. Research on coding agents. anthropic.com