DocumentationNeurondB Documentation
Boost relevance with cross-encoder reranking
End-to-end pipeline
Retrieve top-K vectors, rerank them with cross-encoders, then evaluate with business-specific scoring. This pattern keeps latency predictable while improving quality.
Three-stage rerank
WITH initial AS (
SELECT id,
content,
embedding <-> embed_text('PostgreSQL failover') AS distance
FROM docs
ORDER BY distance
LIMIT 80
),
ranked AS (
SELECT id,
neurondb_rerank(
model_name => 'cross-encoder-nli-base',
query => 'PostgreSQL failover',
document => content
) AS cross_score
FROM initial
),
scored AS (
SELECT id,
cross_score,
distance,
0.7 * cross_score + 0.3 * (1 - distance) AS final_score
FROM ranked
JOIN initial USING (id)
)
SELECT id, final_score
FROM scored
ORDER BY final_score DESC
LIMIT 15;Batch efficiently
Batch reranking requests to maintain throughput. The NeurondB inference scheduler groups payloads and leverages GPU execution when available.
Tune batching
-- Limit max rerank latency to 40ms
SET neurondb.session_inference_max_latency = '40ms';
-- Process 32 candidates per batch
SET neurondb.session_rerank_batch_size = 32;Evaluate & guardrail
Monitor reranking quality and set up fallbacks for production reliability.
Quality metrics
SELECT
AVG(cross_score) AS avg_relevance,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY cross_score) AS median_score,
COUNT(*) FILTER (WHERE cross_score > 0.7) AS high_quality_count
FROM reranked_results;Reranking Methods
- Cross-Encoder - Neural reranking models
- LLM Reranking - GPT/Claude-powered scoring
- ColBERT - Late interaction models
- Ensemble - Combine multiple strategies
Next Steps
- Inference Runtime - ONNX model deployment
- Hybrid Retrieval - Combine with hybrid search
- RAG Pipelines - Complete RAG workflows