DocumentationNeurondB Documentation

Distance Metrics

L2 Distance (<->)

Euclidean distance between two vectors. Balanced accuracy and performance for normalized embeddings and general semantic search workloads.

Use Cases: Semantic document search, Image similarity, Recommendations

L2 distance query

SELECT id, title, embedding <-> query_embedding AS distance
FROM   neurondb_vectors
ORDER  BY distance
LIMIT  10;

Tuning Tips

  • Normalize embeddings during ingestion to keep magnitudes comparable
  • Consider IVF+PQ indexing for billion-scale collections
  • Set neurondb.metric_preference = l2 to influence planner choices

Inner Product (<#>)

Negative inner product (equivalent to maximizing dot product). Ideal when embeddings are already length-normalized or you want to prioritize directional similarity.

Use Cases: Recommendation ranking, Two-tower retrieval models, Vector reranking pipelines

Inner product query

SELECT id, product_name, embedding <#> embed_text('wireless earbuds') AS score
FROM   products
ORDER  BY score
LIMIT  20;

Tuning Tips

  • Normalize embeddings with embed_text(..., normalize => true)
  • Track score variance with pg_stat_insights histograms
  • Set neurondb.inner_product_bias to adjust for magnitude break-even points

Cosine Distance (<=>)

Measure of angular distance between vectors (1 - cosine similarity). Works well for text embeddings and hybrid keyword/semantic ranking.

Use Cases: LLM retrieval augmented generation, Support ticket similarity, Knowledge base search

Cosine distance query

SELECT doc_id, summary, embedding <=> embed_text('llm retrieval best practices') AS distance
FROM   kb_articles
ORDER  BY distance
LIMIT  15;

Tuning Tips

  • Combine with CLASSIFIER reranker functions for hybrid scoring
  • Monitor neurondb.cosine_precision to adjust GPU/CPU execution balance
  • Use neurondb.hybrid_weight to blend cosine and lexical scores

L1 / Manhattan Distance (<+>)

Summation of absolute differences per dimension. Useful for sparse or quantized embeddings where L2 can exaggerate outliers.

Use Cases: Anomaly detection, Time-series embeddings, Quantized representations

Manhattan distance query

SELECT sensor_id,
       embedding <+> embed_series($1::float4[]) AS divergence
FROM   telemetry_vectors
WHERE  measurement_window = $2
ORDER  BY divergence DESC
LIMIT  5;

Tuning Tips

  • Pair with sparsevec or PQ compressed vectors for memory efficiency
  • Set neurondb.l1_gpu_threshold to control GPU offload
  • Track divergence trends in neurondb_metric_samples view

Hamming Distance (<%>)

Counts differing bits between binary vectors. Designed for binary embeddings, perceptual hashes, and fingerprinting workloads.

Use Cases: Perceptual image dedupe, Audio/video fingerprinting, Security anomaly detection

Hamming distance query

SELECT asset_id,
       fingerprint <% embed_binary($1) AS distance
FROM   media_fingerprints
ORDER  BY distance
LIMIT  12;

Tuning Tips

  • Store fingerprints using neurondb.bit type to minimize storage
  • Configure neurondb.hamming_bit_packing to align CPU vectorization
  • Use neurondb_distance_profile() to audit distribution per collection

Next Steps