Random Forest
Overview
Random Forest is an ensemble learning method for classification and regression with GPU acceleration support.
Classification
Train a Random Forest classifier using the unified ML API. The function returns a model_id that you can use for predictions and evaluation.
Train Random Forest classifier
-- Train Random Forest classifier
-- Returns: model_id (integer)
CREATE TEMP TABLE rf_model AS
SELECT neurondb.train(
'default', -- project name
'random_forest', -- algorithm
'training_table', -- source table name
'label', -- target column name
ARRAY['features'], -- feature column names
'{"n_trees": 3}'::jsonb -- hyperparameters
)::integer AS model_id;
-- View the model_id
SELECT model_id FROM rf_model;Function Signature:
neurondb.train( project TEXT, -- ML project name (e.g., 'default') algorithm TEXT, -- Algorithm name: 'random_forest' table_name TEXT, -- Training data table name target_column TEXT, -- Target/label column name feature_columns TEXT[], -- Array of feature column names hyperparameters JSONB -- Algorithm-specific parameters ) RETURNS INTEGER -- Returns model_idHyperparameters:
n_trees(integer): Number of trees in the forest. Default: 100. More trees = better accuracy but slower training.max_depth(integer): Maximum depth of each tree. Default: unlimited. Prevents overfitting.min_samples_split(integer): Minimum samples required to split a node. Default: 2.min_samples_leaf(integer): Minimum samples required in a leaf node. Default: 1.
Regression
Random Forest automatically detects regression vs classification based on the target column data type. For regression, use a numeric target column.
Train Random Forest regressor
-- Train Random Forest regressor (target column is numeric)
CREATE TEMP TABLE rf_model AS
SELECT neurondb.train(
'default',
'random_forest',
'training_table',
'target', -- Numeric target column
ARRAY['features'],
'{"n_trees": 3}'::jsonb
)::integer AS model_id;Prediction
Use the trained model for predictions. The neurondb.predict function takes a model_id and feature vector, returning a prediction value.
Make predictions
-- Single prediction
SELECT neurondb.predict(
(SELECT model_id FROM rf_model),
features -- vector type feature column
) AS prediction
FROM test_table;
-- Batch predictions with model lookup
SELECT
id,
features,
neurondb.predict(
(SELECT model_id FROM rf_model),
features
) AS predicted_label
FROM test_table
LIMIT 100;Function Signature:
neurondb.predict( model_id INTEGER, -- Model ID from neurondb.train() features VECTOR -- Feature vector for prediction ) RETURNS NUMERIC -- Prediction value (class or regression)Model Evaluation
Evaluate your trained model on test data to get accuracy, precision, recall, and F1 score metrics.
Evaluate model
-- Evaluate model and get metrics
DROP TABLE IF EXISTS rf_metrics_temp;
CREATE TEMP TABLE rf_metrics_temp (metrics jsonb);
DO $$
DECLARE
mid integer;
metrics_result jsonb;
BEGIN
-- Get model_id
SELECT model_id INTO mid FROM rf_model LIMIT 1;
-- Evaluate model
metrics_result := neurondb.evaluate(
mid, -- model_id
'test_table', -- test data table
'features', -- feature column
'label' -- target column
);
INSERT INTO rf_metrics_temp VALUES (metrics_result);
END $$;
-- Display metrics
SELECT
format('%-15s', 'Accuracy') AS metric,
CASE WHEN (m.metrics::jsonb ? 'accuracy')
THEN ROUND((m.metrics::jsonb ->> 'accuracy')::numeric, 4)
ELSE NULL END AS value
FROM rf_metrics_temp m
UNION ALL
SELECT format('%-15s', 'Precision'),
CASE WHEN (m.metrics::jsonb ? 'precision')
THEN ROUND((m.metrics::jsonb ->> 'precision')::numeric, 4)
ELSE NULL END
FROM rf_metrics_temp m
UNION ALL
SELECT format('%-15s', 'Recall'),
CASE WHEN (m.metrics::jsonb ? 'recall')
THEN ROUND((m.metrics::jsonb ->> 'recall')::numeric, 4)
ELSE NULL END
FROM rf_metrics_temp m
UNION ALL
SELECT format('%-15s', 'F1 Score'),
CASE WHEN (m.metrics::jsonb ? 'f1_score')
THEN ROUND((m.metrics::jsonb ->> 'f1_score')::numeric, 4)
ELSE NULL END
FROM rf_metrics_temp m;Function Signature:
neurondb.evaluate( model_id INTEGER, -- Model ID from neurondb.train() table_name TEXT, -- Test data table name feature_column TEXT, -- Feature column name target_column TEXT -- Target column name ) RETURNS JSONB -- Returns metrics as JSONBReturned Metrics (JSONB):
accuracy: Overall classification accuracy (0-1)precision: Precision score (0-1)recall: Recall score (0-1)f1_score: F1 harmonic mean (0-1)- For regression:
mse,mae,rmse
Learn More
For detailed documentation on Random Forest parameters, hyperparameter tuning, feature importance, and GPU optimization, visit: Random Forest Documentation
Related Topics
- Classification - Other classification algorithms
- Model Management - Managing trained models