Skip to content
Article Issue #5185

Embedding (AI)

What to know

Embedding (AI) is a fixed-length vector representation of data produced by a neural network, where the geometric relationships between vectors reflect semantic relationships between the original inputs; An embedding model, often a transformer encoder, processes input text and produces a single vector by pooling the token representations at its final layer; Generating embeddings is typically the first step in building a RAG pipeline or semantic search feature

Embedding (AI), WikiWalls Glossary illustration

« Back to Glossary Index

Embedding (AI) is a fixed-length vector representation of data produced by a neural network, where the geometric relationships between vectors reflect semantic relationships between the original inputs. Text embeddings encode the meaning of a word, sentence, or document into typically 256 to 4096 floating-point dimensions, enabling mathematical operations over meaning.

How it works

An embedding model, often a transformer encoder, processes input text and produces a single vector by pooling the token representations at its final layer. The model is trained on large corpora using objectives like contrastive learning, which pulls embeddings for semantically similar pairs closer and pushes dissimilar pairs apart. The resulting embedding space captures syntax, semantics, and topic at once.

Key facts

  • Dimensionality: Common embedding dimensions are 768 (BERT-sized), 1536 (OpenAI text-embedding-3-small), and 3072 (larger models).
  • Similarity metric: Cosine similarity and dot product are the standard distance measures for comparing embeddings.
  • Models: OpenAI text-embedding-3, Cohere Embed, and open-weight models like nomic-embed-text are widely used.
  • Storage: Embeddings are stored in vector databases, which provide efficient approximate nearest-neighbor search.

For builders

Generating embeddings is typically the first step in building a RAG pipeline or semantic search feature. Selecting an embedding model involves balancing retrieval quality against cost and latency: larger models produce better embeddings but cost more per token. Embeddings are also used for deduplication, clustering, anomaly detection, and intent classification in production systems.

Sources

« Back to Definition Index
Administrator · 41 published guides · Joined 2016

Welcome to wikiwalls

The WikiWalls Journal · Free, weekly

One careful fix in your inbox each Wednesday.

No affiliate links inside the diagnosis. No sponsored "top 10". One careful fix per week — unsubscribe in one click.

No tracking pixels · No spam · Edited by a human.