DocumentationNeurondB Documentation
RAG Pipeline
What is RAG?
Retrieval Augmented Generation (RAG) enhances LLM responses by retrieving relevant context from your database before generating answers. This grounds LLM outputs in your actual data, reducing hallucinations and improving accuracy.
RAG Workflow
- User Question: "What is PostgreSQL replication?"
- Retrieve: Find relevant documents using hybrid search
- Rerank: Score and sort results by relevance
- Generate: LLM creates answer using retrieved context
- Response: Return answer with source citations
Implementation
1. Document Ingestion
Ingest documents
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536),
metadata JSONB
);
-- Generate embeddings during insert
INSERT INTO documents (content, embedding, metadata)
SELECT
content,
neurondb_embed(content, 'text-embedding-ada-002'),
jsonb_build_object('source', 'docs', 'timestamp', now())
FROM source_documents;2. Retrieval
Retrieve relevant documents
-- Vector similarity search
SELECT id, content, metadata,
embedding <=> neurondb_embed('What is PostgreSQL?', 'text-embedding-ada-002') AS distance
FROM documents
ORDER BY distance
LIMIT 10;3. Generation
Generate answer with context
-- Combine retrieved context with LLM
WITH retrieved AS (
SELECT content
FROM documents
ORDER BY embedding <=> neurondb_embed('What is PostgreSQL?', 'text-embedding-ada-002')
LIMIT 5
)
SELECT neurondb_llm_generate(
'gpt-4',
'Answer the question using only the provided context: ' ||
string_agg(content, '
') ||
'
Question: What is PostgreSQL?'
) AS answer;Hybrid Search
Combine vector search with full-text search for better results.
Hybrid search
-- Combine vector and full-text search
SELECT id, content,
(embedding <=> query_vec) * 0.7 +
(1.0 - ts_rank(to_tsvector('english', content), query_fts) / 10) * 0.3 AS score
FROM documents,
(SELECT neurondb_embed('PostgreSQL replication', 'text-embedding-ada-002') AS query_vec,
to_tsquery('english', 'PostgreSQL & replication') AS query_fts) q
WHERE to_tsvector('english', content) @@ query_fts
ORDER BY score
LIMIT 10;Reranking
Use cross-encoder models to rerank initial results for better relevance.
Reranking
-- Rerank with cross-encoder
SELECT id, content,
neurondb_rerank(
'cross-encoder/ms-marco-MiniLM-L-6-v2',
content,
'What is PostgreSQL replication?'
) AS relevance_score
FROM (
SELECT id, content
FROM documents
ORDER BY embedding <=> neurondb_embed('What is PostgreSQL replication?', 'text-embedding-ada-002')
LIMIT 20
) candidates
ORDER BY relevance_score DESC
LIMIT 5;RAG Components
- LLM Integration - Hugging Face and OpenAI integration
- Document Processing - Text processing and NLP
Next Steps
- Hybrid Search - Advanced hybrid search techniques
- Reranking - Cross-encoder reranking guide
- ML Inference - ONNX model deployment