DocumentationNeurondB Documentation
Serve ONNX models directly from PostgreSQL
Load ONNX models
Register models once, version them, and share across schemas. Use GitHub releases or object storage URLs for centralized distribution.
Register a model
Register a model
SELECT neurondb_register_model(
name => 'text-embedding-3-small',
version => '1.0.0',
storage_url => 'https://github.com/pgElephant/NeurondB/releases/download/models/text-embedding-3-small.onnx',
runtime => 'onnx',
device => 'auto'
);Inspect registry
Inspect registry
SELECT name,
version,
metadata ->> 'owner' AS owner,
metadata ->> 'git_commit' AS git_commit,
created_at,
status
FROM neurondb_model_registry
ORDER BY created_at DESC;GPU batching & scheduling
NeurondB orchestrates micro-batches per GPU worker. Configure queue sizes, max latency, and fallbacks.
PostgreSQL configuration
postgresql.conf
neurondb.gpu_enabled = on
neurondb.gpu_device_ids = '0,1'
neurondb.inference_batch_size = 32
neurondb.inference_max_latency_ms = 25
neurondb.inference_timeout_ms = 1000Session-level overrides
Session-level overrides
SET neurondb.session_inference_batch_size = 16;
SET neurondb.session_inference_max_latency = '15ms';
SELECT neurondb_embed_batch(
model_name => 'text-embedding-3-small',
inputs => ARRAY['vector search', 'pg extension', 'gpu batching']
);Model caching
Models are automatically cached in shared memory for fast access across sessions.
Cache statistics
SELECT * FROM neurondb_model_cache_stats();Next Steps
- Embedding Generation - Generate embeddings
- Performance Tuning - Optimize inference
- Model Management - Version and deploy models