DocumentationNeurondB Documentation

Hybrid Search

What is Hybrid Search?

Hybrid search combines vector similarity (semantic meaning) with full-text search (keyword matching) to provide superior search results that understand both context and exact terms.

Vector Search Alone

  • ✓ Understands semantic meaning
  • ✓ Finds conceptually similar content
  • ✗ May miss exact keyword matches
  • ✗ Can return loosely related results

Full-Text Search Alone

  • ✓ Precise keyword matching
  • ✓ Fast for exact terms
  • ✗ No semantic understanding
  • ✗ Misses synonyms and context

Hybrid Search = Best of Both

  • Semantic understanding from vector embeddings
  • Precise keyword matching from full-text search
  • Superior relevance through combined scoring
  • Handles both conceptual and exact queries

Implementation

Basic Hybrid Query

Hybrid search query

WITH vector_results AS (
  SELECT id, content,
         embedding <=> embed_text('PostgreSQL replication') AS distance
  FROM documents
  ORDER BY distance
  LIMIT 20
),
text_results AS (
  SELECT id, content,
         ts_rank(to_tsvector('english', content), 
                 to_tsquery('english', 'PostgreSQL & replication')) AS rank
  FROM documents
  WHERE to_tsvector('english', content) @@ to_tsquery('english', 'PostgreSQL & replication')
  ORDER BY rank DESC
  LIMIT 20
)
SELECT 
  COALESCE(v.id, t.id) AS id,
  COALESCE(v.content, t.content) AS content,
  COALESCE(1.0 - v.distance, 0.0) * 0.6 + COALESCE(t.rank, 0.0) * 0.4 AS score
FROM vector_results v
FULL OUTER JOIN text_results t ON v.id = t.id
ORDER BY score DESC
LIMIT 10;

Scoring Methods

Weighted Sum

Combine vector and text scores with weights.

Weighted scoring

SELECT id, content,
       (1.0 - vector_distance) * 0.7 + text_rank * 0.3 AS hybrid_score
FROM (
  SELECT id, content,
         embedding <=> embed_text('query') AS vector_distance,
         ts_rank(to_tsvector('english', content), query_fts) AS text_rank
  FROM documents, to_tsquery('english', 'query') AS query_fts
) combined
ORDER BY hybrid_score DESC
LIMIT 10;

Reciprocal Rank Fusion (RRF)

Combine rankings using RRF algorithm.

RRF scoring

WITH vector_ranked AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> embed_text('query')) AS v_rank
  FROM documents
),
text_ranked AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(...) DESC) AS t_rank
  FROM documents
  WHERE to_tsvector('english', content) @@ to_tsquery('english', 'query')
)
SELECT v.id,
       1.0 / (60 + v.v_rank) + 1.0 / (60 + t.t_rank) AS rrf_score
FROM vector_ranked v
JOIN text_ranked t ON v.id = t.id
ORDER BY rrf_score DESC
LIMIT 10;

Best Practices

  • Use appropriate weights based on your use case (typically 60-70% vector, 30-40% text)
  • Normalize scores from both sources before combining
  • Consider reranking top-K results with cross-encoders
  • Monitor recall and precision metrics

Next Steps