Vector Databases and Semantic Search in Practice
Vector databases have moved from research curiosity to production infrastructure. They power RAG systems, recommendation engines, image search, and anomaly detection. But choosing and operating a vector database in production requires understanding trade-offs that benchmarks do not capture. Here is what we have learned.
What Makes Vector Search Different
Traditional search matches keywords. Vector search finds semantic similarity. The query "How do I reset my password?" will find documents about "account recovery" and "credential renewal" even if those exact words never appear in the query.
This works because embedding models convert text (or images, audio, code) into high-dimensional vectors where semantically similar items are close together in vector space.

Choosing a Vector Database
The landscape is crowded. Here is our decision framework:
| Requirement | Recommended | Why |
|---|---|---|
| Small scale (< 1M vectors) | pgvector | Runs in your existing PostgreSQL, zero new infrastructure |
| Medium scale (1M-50M) | Qdrant or Weaviate | Purpose-built, excellent filtering, good developer experience |
| Large scale (50M+) | Milvus or Pinecone | Distributed architecture, handles billions of vectors |
| Multi-modal search | Weaviate | Native support for text, image, and cross-modal search |
Our default recommendation for most projects: start with pgvector. You probably already run PostgreSQL, and pgvector handles millions of vectors adequately. Migrate to a dedicated solution only when you hit performance limits.
Indexing Strategies
The choice of index determines your speed-accuracy trade-off:
- HNSW (Hierarchical Navigable Small World): best recall, higher memory usage. Our default for most production systems.
- IVF (Inverted File Index): good recall with lower memory. Suitable for very large datasets.
- Product Quantization: extreme compression for massive datasets. Trades accuracy for memory efficiency.
Build parameters matter enormously. For HNSW, we typically use ef_construction=200 and M=16 as starting points, then tune based on recall benchmarks.
Embedding Pipeline
Your embedding pipeline is as important as the database:
- Model selection: match your embedding model to your content type and language.
- Chunking: split documents into semantically coherent chunks (512-1024 tokens typically).
- Metadata: store structured metadata alongside vectors for filtered search.
- Batch processing: embed documents in batches for throughput. Real-time embedding for queries.
Hybrid Search
Pure vector search has a weakness: it can miss exact keyword matches that matter. Hybrid search combines the best of both worlds:
- Vector similarity for semantic understanding.
- BM25 keyword matching for exact terms, product codes, and proper nouns.
- Reciprocal Rank Fusion to combine the two result sets.
In our benchmarks, hybrid search improves retrieval precision by 8-12% over pure vector search for enterprise content.
Production Optimization
- Query caching: cache frequent query embeddings to skip re-encoding. Saves 20-50ms per query.
- Prefetching: for re-ranking pipelines, overfetch by 3-5x and then re-rank.
- Index warming: pre-load indexes into memory at startup to avoid cold-start latency.
- Monitoring: track p95 latency, recall, and index freshness. Set alerts on degradation.
Cost Optimization
Vector databases can be expensive at scale. Strategies:
- Dimensionality reduction: use Matryoshka embeddings or PCA to reduce vector dimensions from 1536 to 512 with minimal quality loss.
- Quantization: compress vectors from float32 to int8 for 4x memory reduction.
- Tiered storage: keep hot data in memory, cold data on disk with lazy loading.
- Selective indexing: not everything needs to be searchable. Index only documents that users actually search for.
Conclusion
Vector databases are a foundational component of modern AI applications. Start simple with pgvector, invest in your embedding pipeline, implement hybrid search for best results, and optimize for production performance as you scale. The technology is mature enough for production — the competitive advantage comes from how well you implement it.
Related articles
Building Production-Ready RAG Systems
A practical guide to designing Retrieval-Augmented Generation systems that perform reliably at scale — from chunking strategies to evaluation frameworks.
Fine-Tuning LLMs on Enterprise Data
When off-the-shelf models are not enough: a step-by-step guide to fine-tuning large language models on your company data for better accuracy and lower costs.
Building an AI Document Processing Pipeline
From scanned PDFs to structured data: a complete architecture for intelligent document processing using OCR, LLMs, and validation pipelines.