DocumentationNeurondB Documentation

Vector Types & Features

Vector Data Types

NeuronDB provides comprehensive vector data types and operations for AI/ML workloads. Support for dense, sparse, binary vectors with GPU acceleration.

vector

Standard dense vector type (float32). Up to 16,000 dimensions. Storage: 4 bytes per dimension.

Use Cases: Semantic search, Recommendation systems, General embeddings

vectorp

Packed vector with Product Quantization. Up to 16,000 dimensions. Storage: Compressed (2x-32x smaller).

Use Cases: Large-scale search, Memory optimization, Cost reduction

halfvec

Half-precision vector (float16). Up to 16,000 dimensions. Storage: 2 bytes per dimension.

Use Cases: Memory-constrained environments, Mobile deployment, Edge computing

sparsevec

Sparse vector representation. Up to 1,000,000 dimensions. Storage: Only non-zero values stored.

Use Cases: High-dimensional text, Categorical embeddings, TF-IDF vectors

binaryvec

Binary vector (bit-packed). Up to 64,000 dimensions. Storage: 1 bit per dimension.

Use Cases: Hamming distance, Binary embeddings, Fast filtering

Vector Operators

Distance Functions

  • <-> - L2 Distance: Euclidean distance (most common)
  • <#> - Inner Product: Negative inner product
  • <=> - Cosine Distance: 1 - cosine similarity
  • <+> - L1 Distance: Manhattan distance
  • <%> - Hamming Distance: Binary vector distance

Vector Operations

  • + - Addition: Element-wise vector addition
  • - - Subtraction: Element-wise vector subtraction
  • * - Scalar Multiply: Multiply vector by scalar
  • || - Concatenation: Combine vectors

Comparison

  • = - Equals: Exact vector equality
  • <> - Not Equals: Vector inequality
  • @> - Contains: Subvector check

Vector Index Types

IVFFlat

Inverted File with Flat Compression. Best for large datasets (1M+ vectors). Recall: ~95-99%. Build time: Fast. Query time: Fast.

HNSW

Hierarchical Navigable Small World. Best for high recall requirements. Recall: ~98-99.9%. Build time: Moderate. Query time: Very Fast.

LSH

Locality Sensitive Hashing. Best for approximate search at scale. Recall: ~90-95%. Build time: Very Fast. Query time: Very Fast.

Example Usage

Complete example

-- Create table with vector column
CREATE TABLE embeddings (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536)  -- OpenAI ada-002 dimension
);

-- Insert vectors
INSERT INTO embeddings (content, embedding) VALUES
  ('AI and machine learning', '[0.1, 0.2, ...]'),
  ('Database systems', '[0.3, 0.4, ...]');

-- Create HNSW index for fast similarity search
CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops);

-- Find similar embeddings
SELECT content, embedding <=> '[0.15, 0.25, ...]'::vector AS distance
FROM embeddings
ORDER BY embedding <=> '[0.15, 0.25, ...]'::vector
LIMIT 5;

-- Vector operations
SELECT embedding + '[0.1, 0.1, ...]'::vector FROM embeddings LIMIT 1;
SELECT embedding * 2.0 FROM embeddings LIMIT 1;
SELECT embedding || '[0.5]'::vector FROM embeddings LIMIT 1;

Related Documentation