GPU Accelerator
Overview
The GPU Accelerator provides optional GPU acceleration for compute-intensive operations using NVIDIA CUDA, AMD ROCm, or Apple Metal. GPU support is completely optional and automatically falls back to CPU when unavailable, ensuring maximum compatibility across different hardware configurations.
Key Features
- Multi-Platform Support: CUDA (NVIDIA), ROCm (AMD), and Metal (Apple)
- Automatic Fallback: Seamlessly falls back to CPU when GPU is unavailable
- Parallel Operations: Batch processing with multiple GPU streams
- Memory Management: Efficient GPU memory pooling and allocation
- Zero Configuration: Works out of the box with automatic detection
Performance Improvements
- 100x Batch Distance Speedup
- 23x K-Means Clustering
- 10-15x ONNX Model Inference
- 50x Quantization Operations
- 2.3ms Average GPU Latency
GPU-Accelerated Operations
| Operation | CUDA | ROCm | Metal | Speedup |
|---|---|---|---|---|
| L2 Distance | ✓ cuBLAS | ✓ rocBLAS | ✓ MPS | 100x (batch) |
| Cosine Distance | ✓ cuBLAS | ✓ rocBLAS | ✓ MPS | 100x (batch) |
| Inner Product | ✓ GEMM | ✓ GEMM | ✓ GEMM | 100x (batch) |
| K-Means Clustering | ✓ Custom | ✓ Custom | ✓ Custom | 23x |
| Quantization (INT8/FP16) | ✓ Kernels | ✓ Kernels | ✓ Kernels | 50x |
| ONNX Inference | ✓ CUDA EP | Partial | ✓ CoreML | 10-15x |
Configuration
PostgreSQL Configuration
postgresql.conf
# Add to postgresql.conf
shared_preload_libraries = 'neurondb'
# GPU Configuration (all optional)
neurondb.gpu_enabled = off # Enable GPU (default: off)
neurondb.gpu_backend = 'cuda' # Backend: cuda, rocm, metal (default: cuda)
neurondb.gpu_device = 0 # GPU device ID
neurondb.gpu_batch_size = 8192 # Batch size for GPU ops
neurondb.gpu_streams = 2 # CUDA/HIP/Metal streams
neurondb.gpu_memory_pool_mb = 512 # Memory pool size
neurondb.gpu_fail_open = on # Fallback to CPU on error
neurondb.gpu_kernels = 'l2,cosine,ip' # Enabled kernels
neurondb.gpu_timeout_ms = 30000 # Kernel timeoutSQL Examples
Enable GPU Acceleration
Enable GPU via GUCs
-- Enable GPU via GUCs (requires shared_preload_libraries='neurondb')
SET neurondb.gpu_enabled = on;
SET neurondb.gpu_device = 0; -- select device
SET neurondb.gpu_batch_size = 8192; -- tune for throughputGPU-Accelerated Distance
Batch GPU distance calculation
-- Batch GPU distance calculation (100x faster)
SELECT vector_l2_distance_gpu(
embedding,
'[0.1, 0.2, ...]'::vector
) FROM documents;
-- GPU cosine similarity
SELECT vector_cosine_distance_gpu(
features,
query_vector
) FROM products
ORDER BY 1 LIMIT 10;Automatic CPU Fallback
NeurondB automatically falls back to CPU execution when GPU is unavailable, ensuring your application continues to work regardless of hardware configuration. This provides maximum compatibility without requiring separate builds or configurations.
Fallback Scenarios
- No GPU Available: Automatically uses CPU
- GPU Out of Memory: Falls back to CPU for remaining operations
- GPU Driver Issues: Gracefully degrades to CPU
- Unsupported Operations: CPU execution for operations without GPU kernels
Check GPU status
-- Check if GPU is available and enabled
SELECT * FROM neurondb_gpu_status();
-- Returns:
-- gpu_enabled | gpu_backend | gpu_device | gpu_memory_mb | fallback_count
-- -------------+-------------+------------+---------------+----------------
-- true | cuda | 0 | 8192 | 0Building with GPU Support
Using build.sh (Recommended)
Build commands
# CPU-only build (default)
./build.sh
# With GPU support (auto-detects CUDA/ROCm)
./build.sh --with-gpu
# With custom paths
./build.sh --with-gpu --cuda-path /opt/cuda --onnx-path /usr/localNVIDIA GPU (CUDA)
Install CUDA Toolkit
# Install CUDA Toolkit 12.6
# Ubuntu/Debian
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-6
# Build NeuronDB with CUDA
./build.sh --with-gpuAMD GPU (ROCm)
Install ROCm
# Install ROCm 6.0
# Ubuntu
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | sudo apt-key add -
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.0/ jammy main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt-get update
sudo apt-get install -y rocm-dev
# Build NeuronDB with ROCm
./build.sh --with-gpuApple Metal (macOS)
Build with Metal
# Metal support is automatically enabled on macOS
# No additional dependencies required
# Build NeuronDB with Metal
./build.sh --with-gpuPerformance Tuning
Batch Size Optimization
Larger batch sizes improve GPU utilization but increase memory usage. Tune based on your GPU memory and query patterns.
Optimize batch size
-- Start with default (8192)
SET neurondb.gpu_batch_size = 8192;
-- Increase for high-throughput scenarios
SET neurondb.gpu_batch_size = 16384;
-- Decrease if running out of memory
SET neurondb.gpu_batch_size = 4096;Memory Pool Configuration
Pre-allocate GPU memory to reduce allocation overhead and improve performance.
Configure memory pool
-- Allocate 1GB memory pool (adjust based on GPU memory)
SET neurondb.gpu_memory_pool_mb = 1024;
-- For multi-GPU setups, configure per device
SET neurondb.gpu_device = 0;
SET neurondb.gpu_memory_pool_mb = 2048;Stream Configuration
Multiple streams enable concurrent operations and better GPU utilization.
Configure streams
-- Use 2 streams for concurrent operations
SET neurondb.gpu_streams = 2;
-- Increase for high-throughput scenarios
SET neurondb.gpu_streams = 4;Related Documentation
- Vector Engine - GPU-accelerated vector search
- ML Engine - GPU-accelerated ML inference
- Embedding Engine - GPU-accelerated embeddings
- Configuration - Complete GPU configuration reference
- Performance Guide - Benchmark GPU vs CPU
- Troubleshooting - Fix GPU issues