DocumentationNeurondB Documentation

Serve ONNX models directly from PostgreSQL

Load ONNX models

Register models once, version them, and share across schemas. Use GitHub releases or object storage URLs for centralized distribution.

Register a model

Register a model

SELECT neurondb_register_model(
  name          => 'text-embedding-3-small',
  version       => '1.0.0',
  storage_url   => 'https://github.com/pgElephant/NeurondB/releases/download/models/text-embedding-3-small.onnx',
  runtime       => 'onnx',
  device        => 'auto'
);

Inspect registry

Inspect registry

SELECT name,
       version,
       metadata ->> 'owner'     AS owner,
       metadata ->> 'git_commit' AS git_commit,
       created_at,
       status
FROM   neurondb_model_registry
ORDER  BY created_at DESC;

GPU batching & scheduling

NeurondB orchestrates micro-batches per GPU worker. Configure queue sizes, max latency, and fallbacks.

PostgreSQL configuration

postgresql.conf

neurondb.gpu_enabled = on
neurondb.gpu_device_ids = '0,1'
neurondb.inference_batch_size = 32
neurondb.inference_max_latency_ms = 25
neurondb.inference_timeout_ms = 1000

Session-level overrides

Session-level overrides

SET neurondb.session_inference_batch_size = 16;
SET neurondb.session_inference_max_latency = '15ms';

SELECT neurondb_embed_batch(
  model_name => 'text-embedding-3-small',
  inputs     => ARRAY['vector search', 'pg extension', 'gpu batching']
);

Model caching

Models are automatically cached in shared memory for fast access across sessions.

Cache statistics

SELECT * FROM neurondb_model_cache_stats();

Next Steps