DocumentationNeurondB Documentation

GPU Accelerator

Overview

The GPU Accelerator provides optional GPU acceleration for compute-intensive operations using NVIDIA CUDA, AMD ROCm, or Apple Metal. GPU support is completely optional and automatically falls back to CPU when unavailable, ensuring maximum compatibility across different hardware configurations.

Key Features

  • Multi-Platform Support: CUDA (NVIDIA), ROCm (AMD), and Metal (Apple)
  • Automatic Fallback: Seamlessly falls back to CPU when GPU is unavailable
  • Parallel Operations: Batch processing with multiple GPU streams
  • Memory Management: Efficient GPU memory pooling and allocation
  • Zero Configuration: Works out of the box with automatic detection

Performance Improvements

  • 100x Batch Distance Speedup
  • 23x K-Means Clustering
  • 10-15x ONNX Model Inference
  • 50x Quantization Operations
  • 2.3ms Average GPU Latency

GPU-Accelerated Operations

OperationCUDAROCmMetalSpeedup
L2 Distance✓ cuBLAS✓ rocBLAS✓ MPS100x (batch)
Cosine Distance✓ cuBLAS✓ rocBLAS✓ MPS100x (batch)
Inner Product✓ GEMM✓ GEMM✓ GEMM100x (batch)
K-Means Clustering✓ Custom✓ Custom✓ Custom23x
Quantization (INT8/FP16)✓ Kernels✓ Kernels✓ Kernels50x
ONNX Inference✓ CUDA EPPartial✓ CoreML10-15x

Configuration

PostgreSQL Configuration

postgresql.conf

# Add to postgresql.conf
shared_preload_libraries = 'neurondb'

# GPU Configuration (all optional)
neurondb.gpu_enabled = off                    # Enable GPU (default: off)
neurondb.gpu_backend = 'cuda'                 # Backend: cuda, rocm, metal (default: cuda)
neurondb.gpu_device = 0                       # GPU device ID
neurondb.gpu_batch_size = 8192                # Batch size for GPU ops
neurondb.gpu_streams = 2                      # CUDA/HIP/Metal streams
neurondb.gpu_memory_pool_mb = 512             # Memory pool size
neurondb.gpu_fail_open = on                   # Fallback to CPU on error
neurondb.gpu_kernels = 'l2,cosine,ip'         # Enabled kernels
neurondb.gpu_timeout_ms = 30000               # Kernel timeout

SQL Examples

Enable GPU Acceleration

Enable GPU via GUCs

-- Enable GPU via GUCs (requires shared_preload_libraries='neurondb')
SET neurondb.gpu_enabled = on;
SET neurondb.gpu_device = 0;        -- select device
SET neurondb.gpu_batch_size = 8192;  -- tune for throughput

GPU-Accelerated Distance

Batch GPU distance calculation

-- Batch GPU distance calculation (100x faster)
SELECT vector_l2_distance_gpu(
  embedding, 
  '[0.1, 0.2, ...]'::vector
) FROM documents;

-- GPU cosine similarity
SELECT vector_cosine_distance_gpu(
  features, 
  query_vector
) FROM products
ORDER BY 1 LIMIT 10;

Automatic CPU Fallback

NeurondB automatically falls back to CPU execution when GPU is unavailable, ensuring your application continues to work regardless of hardware configuration. This provides maximum compatibility without requiring separate builds or configurations.

Fallback Scenarios

  • No GPU Available: Automatically uses CPU
  • GPU Out of Memory: Falls back to CPU for remaining operations
  • GPU Driver Issues: Gracefully degrades to CPU
  • Unsupported Operations: CPU execution for operations without GPU kernels

Check GPU status

-- Check if GPU is available and enabled
SELECT * FROM neurondb_gpu_status();

-- Returns:
--  gpu_enabled | gpu_backend | gpu_device | gpu_memory_mb | fallback_count
-- -------------+-------------+------------+---------------+----------------
--  true        | cuda        | 0          | 8192          | 0

Building with GPU Support

Using build.sh (Recommended)

Build commands

# CPU-only build (default)
./build.sh

# With GPU support (auto-detects CUDA/ROCm)
./build.sh --with-gpu

# With custom paths
./build.sh --with-gpu --cuda-path /opt/cuda --onnx-path /usr/local

NVIDIA GPU (CUDA)

Install CUDA Toolkit

# Install CUDA Toolkit 12.6
# Ubuntu/Debian
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-6

# Build NeuronDB with CUDA
./build.sh --with-gpu

AMD GPU (ROCm)

Install ROCm

# Install ROCm 6.0
# Ubuntu
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | sudo apt-key add -
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/6.0/ jammy main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt-get update
sudo apt-get install -y rocm-dev

# Build NeuronDB with ROCm
./build.sh --with-gpu

Apple Metal (macOS)

Build with Metal

# Metal support is automatically enabled on macOS
# No additional dependencies required

# Build NeuronDB with Metal
./build.sh --with-gpu

Performance Tuning

Batch Size Optimization

Larger batch sizes improve GPU utilization but increase memory usage. Tune based on your GPU memory and query patterns.

Optimize batch size

-- Start with default (8192)
SET neurondb.gpu_batch_size = 8192;

-- Increase for high-throughput scenarios
SET neurondb.gpu_batch_size = 16384;

-- Decrease if running out of memory
SET neurondb.gpu_batch_size = 4096;

Memory Pool Configuration

Pre-allocate GPU memory to reduce allocation overhead and improve performance.

Configure memory pool

-- Allocate 1GB memory pool (adjust based on GPU memory)
SET neurondb.gpu_memory_pool_mb = 1024;

-- For multi-GPU setups, configure per device
SET neurondb.gpu_device = 0;
SET neurondb.gpu_memory_pool_mb = 2048;

Stream Configuration

Multiple streams enable concurrent operations and better GPU utilization.

Configure streams

-- Use 2 streams for concurrent operations
SET neurondb.gpu_streams = 2;

-- Increase for high-throughput scenarios
SET neurondb.gpu_streams = 4;

Related Documentation