Semantic Search
Semantic Search is a search method that represents both the query and the document corpus as embedding vectors and retrieves documents whose vectors are nearest to the query vector in embedding space, capturing meaning rather than surface form; An embedding model converts the query into a vector at search time; Semantic search is foundational to any RAG-powered application and dramatically improves search UX compared to keyword-only approaches for knowledge base and documentation tools
Semantic Search is a search method that represents both the query and the document corpus as embedding vectors and retrieves documents whose vectors are nearest to the query vector in embedding space, capturing meaning rather than surface form. A query for ‘how to deploy a container’ will match documents about Kubernetes pod scheduling even if those documents never use the word ‘deploy.’
How it works
An embedding model converts the query into a vector at search time. The same model, or a compatible one, was used to embed all documents during indexing. A vector database performs approximate nearest-neighbor search to return the top-k document chunks with the highest cosine similarity to the query. Optional reranking with a cross-encoder model rescores and reorders the initial candidates for improved relevance.
Key facts
- vs. keyword search: BM25 keyword search excels on exact terminology; semantic search excels on paraphrase and conceptual queries. Hybrid approaches often win.
- Embedding alignment: Query and document embeddings must be produced by the same or compatible models to ensure vector space consistency.
- Chunking matters: Retrieval quality depends heavily on how documents are split; overly large or small chunks degrade precision.
- Latency: HNSW indexes return nearest neighbors in single-digit milliseconds even at million-scale corpora.
For builders
Semantic search is foundational to any RAG-powered application and dramatically improves search UX compared to keyword-only approaches for knowledge base and documentation tools. Combining semantic similarity with BM25 keyword scores in a hybrid retrieval step is now considered best practice, offering robustness across query types. Reranking the top 20-50 candidates with a cross-encoder before returning the final top-5 is a high-ROI quality improvement.
Sources
- Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401. arxiv.org
- Gao, Y., et al. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997. arxiv.org
- Reimers, N., Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv:1908.10084. arxiv.org
- Pinecone. What is a vector database? pinecone.io
- Johnson, J., Douze, M., Jegou, H. (2017). Billion-scale similarity search with GPUs (FAISS). Facebook Research. github.com