Distance Metrics
L2 Distance (<->)
Euclidean distance between two vectors. Balanced accuracy and performance for normalized embeddings and general semantic search workloads.
Use Cases: Semantic document search, Image similarity, Recommendations
L2 distance query
SELECT id, title, embedding <-> query_embedding AS distance
FROM neurondb_vectors
ORDER BY distance
LIMIT 10;Tuning Tips
- Normalize embeddings during ingestion to keep magnitudes comparable
- Consider IVF+PQ indexing for billion-scale collections
- Set neurondb.metric_preference = l2 to influence planner choices
Inner Product (<#>)
Negative inner product (equivalent to maximizing dot product). Ideal when embeddings are already length-normalized or you want to prioritize directional similarity.
Use Cases: Recommendation ranking, Two-tower retrieval models, Vector reranking pipelines
Inner product query
SELECT id, product_name, embedding <#> embed_text('wireless earbuds') AS score
FROM products
ORDER BY score
LIMIT 20;Tuning Tips
- Normalize embeddings with embed_text(..., normalize => true)
- Track score variance with pg_stat_insights histograms
- Set neurondb.inner_product_bias to adjust for magnitude break-even points
Cosine Distance (<=>)
Measure of angular distance between vectors (1 - cosine similarity). Works well for text embeddings and hybrid keyword/semantic ranking.
Use Cases: LLM retrieval augmented generation, Support ticket similarity, Knowledge base search
Cosine distance query
SELECT doc_id, summary, embedding <=> embed_text('llm retrieval best practices') AS distance
FROM kb_articles
ORDER BY distance
LIMIT 15;Tuning Tips
- Combine with CLASSIFIER reranker functions for hybrid scoring
- Monitor neurondb.cosine_precision to adjust GPU/CPU execution balance
- Use neurondb.hybrid_weight to blend cosine and lexical scores
L1 / Manhattan Distance (<+>)
Summation of absolute differences per dimension. Useful for sparse or quantized embeddings where L2 can exaggerate outliers.
Use Cases: Anomaly detection, Time-series embeddings, Quantized representations
Manhattan distance query
SELECT sensor_id,
embedding <+> embed_series($1::float4[]) AS divergence
FROM telemetry_vectors
WHERE measurement_window = $2
ORDER BY divergence DESC
LIMIT 5;Tuning Tips
- Pair with sparsevec or PQ compressed vectors for memory efficiency
- Set neurondb.l1_gpu_threshold to control GPU offload
- Track divergence trends in neurondb_metric_samples view
Hamming Distance (<%>)
Counts differing bits between binary vectors. Designed for binary embeddings, perceptual hashes, and fingerprinting workloads.
Use Cases: Perceptual image dedupe, Audio/video fingerprinting, Security anomaly detection
Hamming distance query
SELECT asset_id,
fingerprint <% embed_binary($1) AS distance
FROM media_fingerprints
ORDER BY distance
LIMIT 12;Tuning Tips
- Store fingerprints using neurondb.bit type to minimize storage
- Configure neurondb.hamming_bit_packing to align CPU vectorization
- Use neurondb_distance_profile() to audit distribution per collection
Next Steps
- Indexing Guide - Create indexes for distance metrics
- Quantization - Compress vectors
- Performance - Optimize distance calculations