DocumentationNeurondB Documentation

RAG Pipeline

What is RAG?

Retrieval Augmented Generation (RAG) enhances LLM responses by retrieving relevant context from your database before generating answers. This grounds LLM outputs in your actual data, reducing hallucinations and improving accuracy.

RAG Workflow

User Question: "What is PostgreSQL replication?"
Retrieve: Find relevant documents using hybrid search
Rerank: Score and sort results by relevance
Generate: LLM creates answer using retrieved context
Response: Return answer with source citations

Implementation

1. Document Ingestion

Ingest documents

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536),
  metadata JSONB
);

-- Generate embeddings during insert
INSERT INTO documents (content, embedding, metadata)
SELECT 
  content,
  neurondb_embed(content, 'text-embedding-ada-002'),
  jsonb_build_object('source', 'docs', 'timestamp', now())
FROM source_documents;

2. Retrieval

Retrieve relevant documents

-- Vector similarity search
SELECT id, content, metadata,
       embedding <=> neurondb_embed('What is PostgreSQL?', 'text-embedding-ada-002') AS distance
FROM documents
ORDER BY distance
LIMIT 10;

3. Generation

Generate answer with context

-- Combine retrieved context with LLM
WITH retrieved AS (
  SELECT content
  FROM documents
  ORDER BY embedding <=> neurondb_embed('What is PostgreSQL?', 'text-embedding-ada-002')
  LIMIT 5
)
SELECT neurondb_llm_generate(
  'gpt-4',
  'Answer the question using only the provided context: ' || 
  string_agg(content, '

') || 
  '

Question: What is PostgreSQL?'
) AS answer;

Hybrid Search

Combine vector search with full-text search for better results.

Hybrid search

-- Combine vector and full-text search
SELECT id, content,
       (embedding <=> query_vec) * 0.7 + 
       (1.0 - ts_rank(to_tsvector('english', content), query_fts) / 10) * 0.3 AS score
FROM documents,
     (SELECT neurondb_embed('PostgreSQL replication', 'text-embedding-ada-002') AS query_vec,
             to_tsquery('english', 'PostgreSQL & replication') AS query_fts) q
WHERE to_tsvector('english', content) @@ query_fts
ORDER BY score
LIMIT 10;

Reranking

Use cross-encoder models to rerank initial results for better relevance.

Reranking

-- Rerank with cross-encoder
SELECT id, content,
       neurondb_rerank(
         'cross-encoder/ms-marco-MiniLM-L-6-v2',
         content,
         'What is PostgreSQL replication?'
       ) AS relevance_score
FROM (
  SELECT id, content
  FROM documents
  ORDER BY embedding <=> neurondb_embed('What is PostgreSQL replication?', 'text-embedding-ada-002')
  LIMIT 20
) candidates
ORDER BY relevance_score DESC
LIMIT 5;

RAG Components

LLM Integration - Hugging Face and OpenAI integration
Document Processing - Text processing and NLP

Next Steps

Hybrid Search - Advanced hybrid search techniques
Reranking - Cross-encoder reranking guide
ML Inference - ONNX model deployment

PreviousAnalytics