DocumentationNeurondB Documentation

Understanding Embeddings

What Are Embeddings?

Embeddings are dense vector representations of data (text, images, audio) that capture semantic meaning in a high-dimensional space. Unlike traditional keyword-based representations, embeddings encode contextual relationships, allowing machines to understand similarity and meaning.

Key Concept

Traditional Search: Matches exact keywords → "machine learning" only finds documents with those exact words

Semantic Search (Embeddings): Understands meaning → "machine learning" also finds "neural networks", "AI models", "deep learning"

How Embeddings Capture Similarity

Text "cat"    → [0.8, 0.2, 0.1, ...]     ┐
Text "kitten" → [0.75, 0.25, 0.12, ...]   ├─ Close together (similar meaning)
Text "dog"    → [0.7, 0.3, 0.15, ...]     ┘

Text "car"    → [-0.3, 0.9, -0.5, ...]    ← Far apart (different concept)

Why Embeddings Matter

Understanding Context

Embeddings capture context and meaning. The word "bank" has different embeddings near "river" vs "money" based on context.

Language Independence

Similar concepts in different languages have similar embeddings. Search in English, find results in Spanish/French/Chinese.

Multimodal Capabilities

Text, images, and audio can be embedded in the same space. Search for images using text descriptions!

Text Embeddings

Generate embeddings from text using various models.

Basic Usage

Generate text embedding

-- Generate embedding
SELECT embed_text('Hello world', 'text-embedding-ada-002');

-- Use in similarity search
SELECT content, embedding <=> embed_text('PostgreSQL vector search') AS distance
FROM documents
ORDER BY distance
LIMIT 10;

Supported Models

  • OpenAI: text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
  • Cohere: embed-english-v3.0, embed-multilingual-v3.0
  • HuggingFace: all-MiniLM-L6-v2, all-mpnet-base-v2

Image Embeddings

Generate embeddings from images using CLIP models.

Image embedding

SELECT embed_image('/path/to/image.jpg', 'CLIP-ViT-B-32');

Batch Generation

Generate embeddings for multiple items efficiently.

Batch embeddings

-- Batch text embeddings
SELECT embed_text_batch(
  ARRAY['text1', 'text2', 'text3'],
  'text-embedding-ada-002'
);

-- Use in bulk insert
INSERT INTO documents (content, embedding)
SELECT content, embed_text(content, 'text-embedding-ada-002')
FROM source_documents;

Next Steps