Understanding Embeddings
What Are Embeddings?
Embeddings are dense vector representations of data (text, images, audio) that capture semantic meaning in a high-dimensional space. Unlike traditional keyword-based representations, embeddings encode contextual relationships, allowing machines to understand similarity and meaning.
Key Concept
Traditional Search: Matches exact keywords → "machine learning" only finds documents with those exact words
Semantic Search (Embeddings): Understands meaning → "machine learning" also finds "neural networks", "AI models", "deep learning"
How Embeddings Capture Similarity
Text "cat" → [0.8, 0.2, 0.1, ...] ┐
Text "kitten" → [0.75, 0.25, 0.12, ...] ├─ Close together (similar meaning)
Text "dog" → [0.7, 0.3, 0.15, ...] ┘
Text "car" → [-0.3, 0.9, -0.5, ...] ← Far apart (different concept)Why Embeddings Matter
Understanding Context
Embeddings capture context and meaning. The word "bank" has different embeddings near "river" vs "money" based on context.
Language Independence
Similar concepts in different languages have similar embeddings. Search in English, find results in Spanish/French/Chinese.
Multimodal Capabilities
Text, images, and audio can be embedded in the same space. Search for images using text descriptions!
Text Embeddings
Generate embeddings from text using various models.
Basic Usage
Generate text embedding
-- Generate embedding
SELECT embed_text('Hello world', 'text-embedding-ada-002');
-- Use in similarity search
SELECT content, embedding <=> embed_text('PostgreSQL vector search') AS distance
FROM documents
ORDER BY distance
LIMIT 10;Supported Models
- OpenAI: text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
- Cohere: embed-english-v3.0, embed-multilingual-v3.0
- HuggingFace: all-MiniLM-L6-v2, all-mpnet-base-v2
Image Embeddings
Generate embeddings from images using CLIP models.
Image embedding
SELECT embed_image('/path/to/image.jpg', 'CLIP-ViT-B-32');Batch Generation
Generate embeddings for multiple items efficiently.
Batch embeddings
-- Batch text embeddings
SELECT embed_text_batch(
ARRAY['text1', 'text2', 'text3'],
'text-embedding-ada-002'
);
-- Use in bulk insert
INSERT INTO documents (content, embedding)
SELECT content, embed_text(content, 'text-embedding-ada-002')
FROM source_documents;Next Steps
- ONNX Inference - Deploy custom models
- Vector Indexing - Index embeddings for fast search
- RAG Pipelines - Build RAG applications