Vector Database
Vector Database is a specialized database designed to persist embedding vectors and execute approximate nearest-neighbor (ANN) queries efficiently, enabling applications to find the most semantically similar items to a query vector in milliseconds across millions or billions of stored embeddings; Vector databases build an ANN index, commonly HNSW (Hierarchical Navigable Small World graphs) or IVF (Inverted File Index), over stored vectors; Choosing between a dedicated vector database and a general-purpose database with vector extension depends on scale and operational complexity
Vector Database is a specialized database designed to persist embedding vectors and execute approximate nearest-neighbor (ANN) queries efficiently, enabling applications to find the most semantically similar items to a query vector in milliseconds across millions or billions of stored embeddings. It is the persistence layer powering RAG, semantic search, and recommendation systems.
How it works
Vector databases build an ANN index, commonly HNSW (Hierarchical Navigable Small World graphs) or IVF (Inverted File Index), over stored vectors. At query time, the index allows the database to traverse a small subset of the search space rather than computing distances to every stored vector. Most vector databases also store metadata alongside vectors and support hybrid filtering by both semantic similarity and structured attributes.
Key facts
- Dedicated databases: Pinecone, Weaviate, Qdrant, Milvus, and Chroma are purpose-built vector databases.
- Extensions: pgvector for PostgreSQL and Elasticsearch’s kNN search add vector capabilities to existing databases.
- Index types: HNSW provides high recall and low latency but uses significant memory; IVF is more memory-efficient but requires a training step.
- Hybrid search: Most vector databases support combining ANN with metadata filters or BM25 keyword scoring.
For builders
Choosing between a dedicated vector database and a general-purpose database with vector extension depends on scale and operational complexity. For prototypes or smaller datasets under a few million vectors, pgvector in an existing PostgreSQL instance is often sufficient. At hundreds of millions of vectors or with strict latency SLAs, a purpose-built solution with tuned indexing and distributed sharding becomes necessary.
Sources
- Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401. arxiv.org
- Gao, Y., et al. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997. arxiv.org
- Reimers, N., Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv:1908.10084. arxiv.org
- Pinecone. What is a vector database? pinecone.io
- Johnson, J., Douze, M., Jegou, H. (2017). Billion-scale similarity search with GPUs (FAISS). Facebook Research. github.com