DocumentationNeurondB Documentation

Machine Learning & Embeddings

ML Capabilities

In-Database ML Inference

Run ML models directly inside PostgreSQL with zero data movement, batch inference for high throughput, real-time predictions with low latency, and automatic GPU acceleration when available.

Embedding Generation

Generate embeddings from text, images, and more. Supports OpenAI, Cohere, HuggingFace models, custom model deployment, automatic batching and caching, and multi-modal embeddings (text, image, audio).

Model Management

Deploy and manage ML models efficiently with model versioning and rollback, A/B testing support, resource quota management, and performance monitoring.

Supported Models

Text Embeddings

  • text-embedding-ada-002 (OpenAI) - 1536 dimensions - General text similarity
  • text-embedding-3-small (OpenAI) - 1536 dimensions - Efficient embeddings
  • text-embedding-3-large (OpenAI) - 3072 dimensions - High quality embeddings
  • embed-english-v3.0 (Cohere) - 1024 dimensions - English text
  • embed-multilingual-v3.0 (Cohere) - 1024 dimensions - Multilingual text

Sentence Transformers

  • all-MiniLM-L6-v2 (HuggingFace) - 384 dimensions - Fast, lightweight
  • all-mpnet-base-v2 (HuggingFace) - 768 dimensions - High quality
  • paraphrase-multilingual-MiniLM (HuggingFace) - 384 dimensions - 50+ languages

Multimodal

  • CLIP-ViT-B-32 (OpenAI) - 512 dimensions - Image + text
  • CLIP-ViT-L-14 (OpenAI) - 768 dimensions - High quality image search

ML Functions

embed_text()

Generate text embeddings with automatic caching.

Signature: embed_text(text TEXT, model TEXT DEFAULT 'all-MiniLM-L6-v2') RETURNS vector

Example

SELECT embed_text('Machine learning with PostgreSQL');

embed_text_batch()

Generate embeddings for multiple texts efficiently.

Signature: embed_text_batch(texts TEXT[], model TEXT DEFAULT 'all-MiniLM-L6-v2') RETURNS vector[]

Example

SELECT embed_text_batch(ARRAY['text1', 'text2'], 'all-MiniLM-L6-v2');

train_random_forest_classifier()

Train Random Forest classifier with GPU support.

Signature: train_random_forest_classifier(table_name TEXT, features_col TEXT, label_col TEXT, n_trees INT, max_depth INT)

Example

SELECT train_random_forest_classifier('training_data', 'features', 'label', 100, 10);

cluster_kmeans()

K-means clustering with GPU acceleration.

Signature: cluster_kmeans(table_name TEXT, vector_column TEXT, k INTEGER, max_iter INTEGER DEFAULT 100)

Example

SELECT cluster_kmeans('documents', 'embedding', 5, 100);

Next Steps