Skip to content
Article Issue #5176

Fine-Tuning (LLM)

What to know

Fine-Tuning (LLM) is a training process that takes a pretrained foundation model and continues updating its weights on a curated dataset tailored to a specific task or domain; Fine-tuning supplies the model with labeled input-output pairs and runs gradient descent to minimize prediction error on those examples; Fine-tuning is appropriate when prompt engineering alone cannot achieve required consistency, when latency demands shorter prompts than few-shot examples allow, or when proprietary style and tone must be enforced at scale

Fine-Tuning (LLM), WikiWalls Glossary illustration

« Back to Glossary Index

Fine-Tuning (LLM) is a training process that takes a pretrained foundation model and continues updating its weights on a curated dataset tailored to a specific task or domain. The goal is to encode behavior, tone, format preferences, or specialized knowledge directly into the model rather than engineering prompts to elicit that behavior at inference time.

How it works

Fine-tuning supplies the model with labeled input-output pairs and runs gradient descent to minimize prediction error on those examples. Parameter-efficient methods like LoRA (Low-Rank Adaptation) freeze most weights and train only small adapter matrices, drastically reducing GPU memory requirements. Supervised fine-tuning is often followed by RLHF to align outputs with human preferences.

Key facts

  • Data requirement: Effective fine-tuning typically requires hundreds to thousands of high-quality examples, not millions.
  • LoRA: Low-Rank Adaptation reduces trainable parameters by injecting small rank-decomposition matrices into attention layers.
  • Catastrophic forgetting: Aggressive fine-tuning on narrow data can degrade general capabilities present in the base model.
  • Hosted fine-tuning: OpenAI, Anthropic, and Google offer fine-tuning APIs that handle the infrastructure.

For builders

Fine-tuning is appropriate when prompt engineering alone cannot achieve required consistency, when latency demands shorter prompts than few-shot examples allow, or when proprietary style and tone must be enforced at scale. For most product teams, prompt engineering and RAG should be exhausted before investing in fine-tuning, as the data curation and evaluation overhead is substantial.

Sources

« Back to Definition Index
Administrator · 41 published guides · Joined 2016

Welcome to wikiwalls

The WikiWalls Journal · Free, weekly

One careful fix in your inbox each Wednesday.

No affiliate links inside the diagnosis. No sponsored "top 10". One careful fix per week — unsubscribe in one click.

No tracking pixels · No spam · Edited by a human.