DocumentationNeurondB Documentation
Hybrid Search
What is Hybrid Search?
Hybrid search combines vector similarity (semantic meaning) with full-text search (keyword matching) to provide superior search results that understand both context and exact terms.
Vector Search Alone
- ✓ Understands semantic meaning
- ✓ Finds conceptually similar content
- ✗ May miss exact keyword matches
- ✗ Can return loosely related results
Full-Text Search Alone
- ✓ Precise keyword matching
- ✓ Fast for exact terms
- ✗ No semantic understanding
- ✗ Misses synonyms and context
Hybrid Search = Best of Both
- Semantic understanding from vector embeddings
- Precise keyword matching from full-text search
- Superior relevance through combined scoring
- Handles both conceptual and exact queries
Implementation
Basic Hybrid Query
Hybrid search query
WITH vector_results AS (
SELECT id, content,
embedding <=> embed_text('PostgreSQL replication') AS distance
FROM documents
ORDER BY distance
LIMIT 20
),
text_results AS (
SELECT id, content,
ts_rank(to_tsvector('english', content),
to_tsquery('english', 'PostgreSQL & replication')) AS rank
FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'PostgreSQL & replication')
ORDER BY rank DESC
LIMIT 20
)
SELECT
COALESCE(v.id, t.id) AS id,
COALESCE(v.content, t.content) AS content,
COALESCE(1.0 - v.distance, 0.0) * 0.6 + COALESCE(t.rank, 0.0) * 0.4 AS score
FROM vector_results v
FULL OUTER JOIN text_results t ON v.id = t.id
ORDER BY score DESC
LIMIT 10;Scoring Methods
Weighted Sum
Combine vector and text scores with weights.
Weighted scoring
SELECT id, content,
(1.0 - vector_distance) * 0.7 + text_rank * 0.3 AS hybrid_score
FROM (
SELECT id, content,
embedding <=> embed_text('query') AS vector_distance,
ts_rank(to_tsvector('english', content), query_fts) AS text_rank
FROM documents, to_tsquery('english', 'query') AS query_fts
) combined
ORDER BY hybrid_score DESC
LIMIT 10;Reciprocal Rank Fusion (RRF)
Combine rankings using RRF algorithm.
RRF scoring
WITH vector_ranked AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> embed_text('query')) AS v_rank
FROM documents
),
text_ranked AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(...) DESC) AS t_rank
FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'query')
)
SELECT v.id,
1.0 / (60 + v.v_rank) + 1.0 / (60 + t.t_rank) AS rrf_score
FROM vector_ranked v
JOIN text_ranked t ON v.id = t.id
ORDER BY rrf_score DESC
LIMIT 10;Best Practices
- Use appropriate weights based on your use case (typically 60-70% vector, 30-40% text)
- Normalize scores from both sources before combining
- Consider reranking top-K results with cross-encoders
- Monitor recall and precision metrics
Next Steps
- Hybrid Overview - Detailed hybrid retrieval guide
- Reranking - Improve relevance with reranking
- RAG Pipelines - Build RAG applications