Vector Engine
Overview
The Vector Engine is NeurondB's high-performance approximate nearest neighbor (ANN) search system, designed for billion-scale vector similarity search with millisecond latency. It combines state-of-the-art indexing algorithms (HNSW and IVF) with multiple distance metrics and advanced quantization techniques to deliver production-ready vector search capabilities directly in PostgreSQL.
Key Capabilities
- HNSW Indexing: Hierarchical Navigable Small World graphs for excellent recall and speed
- IVF Indexing: Inverted File indexes for large-scale datasets with efficient memory usage
- Multiple Distance Metrics: Cosine, L2 (Euclidean), inner product, and custom metrics
- Quantization: Scalar and product quantization to reduce memory footprint by 4-8x
- GPU Acceleration: Optional CUDA/ROCm/Metal support for 10-100x faster distance computation
- Adaptive Index Selection: Automatic index type selection based on dataset characteristics
Performance Benchmarks
| Dataset Size | Index Type | Query Latency (p95) | Recall@10 |
|---|---|---|---|
| 1M vectors | HNSW | 2.3ms | 98.5% |
| 10M vectors | HNSW | 4.7ms | 97.2% |
| 100M vectors | IVF | 8.1ms | 94.8% |
| 1B vectors | IVF + PQ | 12.4ms | 92.1% |
Indexing Algorithms
HNSW (Hierarchical Navigable Small World)
HNSW builds a multi-layer graph structure where each layer is a subset of the previous layer, creating a hierarchical navigation system. This design enables logarithmic search complexity and excellent recall rates.
Advantages
- Excellent recall (95-99% for typical configurations)
- Fast query performance (2-5ms for millions of vectors)
- Incremental updates without full rebuilds
- Works well for datasets up to 100M vectors
Configuration
Create HNSW index
-- Basic HNSW index with cosine distance
CREATE INDEX documents_embedding_idx ON documents
USING hnsw (embedding vector_cosine_ops);
-- Tuned HNSW index for high recall
CREATE INDEX documents_embedding_idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);
-- Runtime tuning for query performance
SET hnsw.ef_search = 100; -- Higher = better recall, slower queriesIVF (Inverted File)
IVF partitions the vector space into clusters (centroids) using k-means clustering. At query time, only the nearest clusters are searched, dramatically reducing the search space for large datasets.
Advantages
- Efficient for very large datasets (100M+ vectors)
- Lower memory footprint than HNSW
- Faster index build time
- Scales to billions of vectors with quantization
Configuration
Create IVF index
-- Basic IVF index
CREATE INDEX documents_embedding_idx ON documents
USING ivf (embedding vector_l2_ops)
WITH (lists = 100);
-- IVF for large datasets (sqrt(num_rows) is a good starting point)
CREATE INDEX documents_embedding_idx ON documents
USING ivf (embedding vector_l2_ops)
WITH (lists = 3162); -- For ~10M vectors
-- Runtime tuning
SET ivf.probes = 20; -- Number of clusters to searchIndex Selection Guide
| Dataset Size | Recommended Index | Reason |
|---|---|---|
| < 1M vectors | HNSW | Best recall and speed |
| 1M - 10M vectors | HNSW | Excellent performance, manageable memory |
| 10M - 100M vectors | HNSW or IVF | HNSW if memory allows, IVF for constraints |
| 100M+ vectors | IVF + PQ | Memory efficiency with quantization |
Distance Metrics
NeurondB supports multiple distance metrics, each optimized for different use cases and data types. The choice of distance metric significantly impacts search quality and should match your embedding model's training objective.
Cosine Distance
Measures the angle between vectors, normalized to [0, 2]. Ideal for text embeddings where direction matters more than magnitude. Most embedding models (OpenAI, Cohere, Sentence Transformers) are optimized for cosine similarity.
Cosine distance search
-- Cosine distance operator (<=>)
SELECT id, content, embedding <=> query_vec AS distance
FROM documents,
(SELECT embed_text('search query', 'text-embedding-ada-002') AS query_vec) q
ORDER BY distance
LIMIT 10;L2 (Euclidean) Distance
The straight-line distance between two points in vector space. Lower values indicate greater similarity. Commonly used for image embeddings and spatial data where absolute distances matter.
L2 distance search
-- L2 distance operator (<->)
SELECT id, content, embedding <-> query_vec AS distance
FROM documents,
(SELECT '[0.1, 0.2, 0.3, ...]'::vector AS query_vec) q
ORDER BY distance
LIMIT 10;Inner Product (Dot Product)
Computes the dot product of two vectors. Higher values indicate greater similarity. Use with normalized vectors for maximum inner product search (MIPS), which is equivalent to cosine similarity for normalized vectors.
Inner product search
-- Inner product operator (<#>)
SELECT id, content, embedding <#> query_vec AS score
FROM documents,
(SELECT '[0.1, 0.2, 0.3, ...]'::vector AS query_vec) q
ORDER BY score DESC -- Higher is better for inner product
LIMIT 10;Distance Metric Selection
| Use Case | Recommended Metric | Reason |
|---|---|---|
| Text embeddings (OpenAI, Cohere) | Cosine | Models trained for cosine similarity |
| Image embeddings (CLIP) | Cosine or L2 | Depends on model training |
| Spatial/geometric data | L2 | Preserves actual distances |
| Normalized vectors | Inner Product | Equivalent to cosine, faster computation |
Quantization Techniques
Quantization reduces memory footprint and can accelerate search by compressing vector representations. NeurondB supports both scalar quantization (per-dimension) and product quantization (subspace-based).
Scalar Quantization
Reduces precision from 32-bit floats to 8-bit integers (INT8) or 16-bit floats (FP16). Provides 4x memory reduction with minimal accuracy loss (typically < 1% recall degradation).
Scalar quantization
-- Create quantized index (INT8)
CREATE INDEX documents_embedding_idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (quantization = 'int8');
-- Memory savings: 4x reduction
-- Example: 1M vectors × 1536 dims × 4 bytes = 6.1 GB
-- With INT8: 1M × 1536 × 1 byte = 1.5 GBProduct Quantization (PQ)
Divides vectors into subvectors and quantizes each subspace independently. Provides 8-16x memory reduction with slightly higher accuracy loss (typically 2-5% recall degradation). Ideal for billion-scale datasets.
Product quantization
-- Create PQ index
CREATE INDEX documents_embedding_idx ON documents
USING ivf (embedding vector_l2_ops)
WITH (
lists = 1000,
quantization = 'pq',
pq_m = 64, -- Number of subvectors
pq_k = 256 -- Codebook size per subvector
);
-- Memory savings: 8-16x reduction
-- Example: 1B vectors × 1536 dims × 4 bytes = 6.1 TB
-- With PQ: ~400-800 GBQuantization Trade-offs
| Quantization Type | Memory Reduction | Recall Impact | Query Speed | Best For |
|---|---|---|---|---|
| None (FP32) | 1x | 100% | Baseline | Small datasets, maximum accuracy |
| INT8 (Scalar) | 4x | 99%+ | Faster | Medium datasets, balanced |
| FP16 (Scalar) | 2x | 99.5%+ | Similar | GPU acceleration |
| PQ | 8-16x | 95-98% | Faster | Billion-scale datasets |
Performance Characteristics
Query Latency
- HNSW: 2-5ms for 1M vectors, 4-8ms for 10M vectors
- IVF: 5-10ms for 100M vectors, 10-15ms for 1B vectors
- With GPU: 2-3x faster for batch queries
Index Build Time
- HNSW: ~10-30 minutes for 10M vectors
- IVF: ~5-15 minutes for 10M vectors
- With GPU: 3-5x faster clustering for IVF
Memory Usage
- HNSW: ~1.5-2x vector size (graph structure overhead)
- IVF: ~1.1-1.3x vector size (centroid storage)
- With INT8: 4x reduction
- With PQ: 8-16x reduction
Optimization Tips
- Use HNSW for datasets < 100M vectors for best recall
- Use IVF + PQ for billion-scale datasets
- Enable GPU acceleration for batch operations
- Tune
ef_search(HNSW) orprobes(IVF) to balance recall and latency - Monitor index fragmentation and rebuild when needed
Use Cases
Semantic Search
Find documents, products, or content based on meaning rather than exact keywords.
Semantic search example
-- Find similar documents
SELECT id, title, content
FROM documents
WHERE embedding <=> embed_text('machine learning algorithms', 'text-embedding-ada-002') < 0.3
ORDER BY embedding <=> embed_text('machine learning algorithms', 'text-embedding-ada-002')
LIMIT 10;Recommendation Systems
Find similar items, users, or content for personalized recommendations.
Image Search
Search images by visual similarity or text descriptions using CLIP embeddings.
Anomaly Detection
Identify outliers by finding vectors far from their nearest neighbors.
Clustering
Group similar vectors together for data analysis and organization.
Related Documentation
- Indexing Guide - Detailed index configuration and tuning
- Distance Metrics - Complete distance metric reference
- Quantization - Advanced quantization techniques
- GPU Acceleration - Accelerate vector operations with GPU
- Performance Tuning - Optimize vector search performance