DocumentationNeurondB Documentation
ML Analytics Suite
Clustering Algorithms
K-Means Clustering
Lloyd's K-Means with k-means++ initialization for finding customer segments, topic clusters, and data grouping.
K-Means clustering
-- K-Means clustering
SELECT cluster_kmeans(
'train_data', -- table with vectors
'features', -- vector column
7, -- K clusters
50 -- max iterations
);
-- Project-based training and versioning
SELECT neurondb_train_kmeans_project(
'fraud_kmeans', -- project name
'train_data',
'features',
7,
50
) AS model_id;
-- List models for a project
SELECT version, algorithm, parameters, is_deployed
FROM neurondb_list_project_models('fraud_kmeans')
ORDER BY version;- Time Complexity: O(n·k·i·d)
- Initialization: k-means++
- Project Models: Versioned training runs
DBSCAN
Density-based clustering for arbitrary shapes. Automatically detects noise while grouping dense regions.
DBSCAN clustering
SELECT *
FROM cluster_dbscan(
relation => 'train_data',
column_name => 'features',
eps => 0.35,
min_samples => 12,
distance => 'cosine'
);- No need to specify cluster count. DBSCAN finds density-based groupings
- Handles noise and outliers automatically
- Works well with non-spherical clusters
Outlier Detection
Z-Score Outlier Detection
Z-score outliers
SELECT *
FROM detect_outliers_zscore(
(SELECT embedding FROM documents),
2.5 -- threshold
);Isolation Forest
Isolation forest
SELECT *
FROM detect_outliers_isolation_forest(
(SELECT embedding FROM documents),
100 -- n_estimators
);Dimensionality Reduction
PCA (Principal Component Analysis)
PCA
SELECT *
FROM reduce_pca(
(SELECT embedding FROM documents),
50 -- target dimensions
);Next Steps
- ML Functions - Complete ML API reference
- Clustering Guide - Detailed clustering documentation
- Performance - Optimize ML operations