DocumentationNeurondB Documentation

Blend lexical and semantic ranking for precision

Scoring architecture

Hybrid retrieval uses multiple rankers and merges results. Use the default weighted sum or build custom scoring pipelines with SQL window functions.

Weighted hybrid scoring

WITH hybrid AS (
  SELECT
    d.id,
    lex.rank AS bm25_rank,
    sem.distance AS cosine_distance,
    lex.rank * 0.4 + (1 - sem.distance) * 0.6 AS combined_score
  FROM
    lex_ranked_documents lex
    JOIN sem_ranked_documents sem ON sem.id = lex.id
)
SELECT *
FROM   hybrid
ORDER  BY combined_score DESC
LIMIT  10;

Metadata filtering

Apply row-level filters and tenant boundaries before scoring to keep results relevant. Use JSONB containment or partitioning for customer isolation.

Tenant scoping

WITH query_input AS (
  SELECT embed_text('high availability failover guide') AS q_emb,
         'enterprise'::text AS tenant
)
SELECT id,
       title,
       metadata ->> 'tenant' AS tenant,
       embedding <-> (SELECT q_emb FROM query_input) AS distance
FROM   knowledge_base,
       query_input
WHERE  metadata ->> 'tenant' = query_input.tenant
ORDER  BY distance
LIMIT  15;

Reranking with cross-encoders

After retrieving top K candidates, rerank them with ONNX cross-encoders for better semantic matching. Combine with canary weights to fail open if the reranker is unavailable.

Rerank candidates

WITH
initial AS (
  SELECT id,
         title,
         embedding <-> embed_text('PostgreSQL failover') AS distance
  FROM   docs
  ORDER  BY distance
  LIMIT  80
),
ranked AS (
  SELECT id,
         neurondb_rerank(
           model_name => 'cross-encoder-nli-base',
           query      => 'PostgreSQL failover',
           document   => title
         ) AS cross_score
  FROM   initial
)
SELECT id, cross_score
FROM   ranked
ORDER  BY cross_score DESC
LIMIT  15;

Hybrid Search Function

NeuronDB provides a hybrid_search function that combines vector similarity and full-text search in a single call:

Hybrid search function

-- Hybrid search combining vector and text search
SELECT 
    search_result.id,
    hybrid_search_test.title,
    hybrid_search_test.content,
    search_result.score
FROM hybrid_search_test,
    LATERAL hybrid_search(
        'hybrid_search_test',                    -- table name
        embed_text('database systems', 'all-MiniLM-L6-v2'),  -- query vector
        'database systems',                       -- query text for FTS
        '{}'::text,                              -- additional config
        0.7,                                     -- alpha: vector weight (0-1)
        5                                        -- top K results
    ) AS search_result(id, score)
WHERE hybrid_search_test.id = search_result.id
ORDER BY search_result.score DESC
LIMIT 5;

Function Signature:

hybrid_search( table_name TEXT,    -- Source table name query_vector VECTOR, -- Query embedding vector query_text TEXT,    -- Query text for full-text search config TEXT,        -- Additional configuration (JSON string) alpha REAL,         -- Vector weight (0.0-1.0), 1-alpha = text weight top_k INTEGER       -- Number of results to return ) RETURNS TABLE ( id INTEGER,         -- Row ID from source table score REAL          -- Combined relevance score )

Next Steps

Multi-Vector Search - Multiple embeddings per document
Faceted Search - Category-aware filtering
Temporal Search - Time-decay relevance
RAG Playbooks - Complete RAG workflows
Distance Metrics - Tune distance functions
Reranking Guide - Cross-encoder reranking

PreviousHybrid Search

NextReranking