DocumentationNeurondB Documentation

Random Forest

Overview

Random Forest is an ensemble learning method for classification and regression with GPU acceleration support.

Classification

Train a Random Forest classifier using the unified ML API. The function returns a model_id that you can use for predictions and evaluation.

Train Random Forest classifier

-- Train Random Forest classifier
-- Returns: model_id (integer)
CREATE TEMP TABLE rf_model AS
SELECT neurondb.train(
    'default',              -- project name
    'random_forest',        -- algorithm
    'training_table',       -- source table name
    'label',                -- target column name
    ARRAY['features'],      -- feature column names
    '{"n_trees": 3}'::jsonb -- hyperparameters
)::integer AS model_id;

-- View the model_id
SELECT model_id FROM rf_model;

Function Signature:

neurondb.train( project TEXT,           -- ML project name (e.g., 'default') algorithm TEXT,         -- Algorithm name: 'random_forest' table_name TEXT,        -- Training data table name target_column TEXT,    -- Target/label column name feature_columns TEXT[], -- Array of feature column names hyperparameters JSONB  -- Algorithm-specific parameters ) RETURNS INTEGER          -- Returns model_id

Hyperparameters:

  • n_trees (integer): Number of trees in the forest. Default: 100. More trees = better accuracy but slower training.
  • max_depth (integer): Maximum depth of each tree. Default: unlimited. Prevents overfitting.
  • min_samples_split (integer): Minimum samples required to split a node. Default: 2.
  • min_samples_leaf (integer): Minimum samples required in a leaf node. Default: 1.

Regression

Random Forest automatically detects regression vs classification based on the target column data type. For regression, use a numeric target column.

Train Random Forest regressor

-- Train Random Forest regressor (target column is numeric)
CREATE TEMP TABLE rf_model AS
SELECT neurondb.train(
    'default',
    'random_forest',
    'training_table',
    'target',               -- Numeric target column
    ARRAY['features'],
    '{"n_trees": 3}'::jsonb
)::integer AS model_id;

Prediction

Use the trained model for predictions. The neurondb.predict function takes a model_id and feature vector, returning a prediction value.

Make predictions

-- Single prediction
SELECT neurondb.predict(
    (SELECT model_id FROM rf_model),
    features  -- vector type feature column
) AS prediction
FROM test_table;

-- Batch predictions with model lookup
SELECT 
    id,
    features,
    neurondb.predict(
        (SELECT model_id FROM rf_model),
        features
    ) AS predicted_label
FROM test_table
LIMIT 100;

Function Signature:

neurondb.predict( model_id INTEGER,  -- Model ID from neurondb.train() features VECTOR    -- Feature vector for prediction ) RETURNS NUMERIC      -- Prediction value (class or regression)

Model Evaluation

Evaluate your trained model on test data to get accuracy, precision, recall, and F1 score metrics.

Evaluate model

-- Evaluate model and get metrics
DROP TABLE IF EXISTS rf_metrics_temp;
CREATE TEMP TABLE rf_metrics_temp (metrics jsonb);

DO $$
DECLARE
    mid integer;
    metrics_result jsonb;
BEGIN
    -- Get model_id
    SELECT model_id INTO mid FROM rf_model LIMIT 1;
    
    -- Evaluate model
    metrics_result := neurondb.evaluate(
        mid,                -- model_id
        'test_table',       -- test data table
        'features',         -- feature column
        'label'             -- target column
    );
    
    INSERT INTO rf_metrics_temp VALUES (metrics_result);
END $$;

-- Display metrics
SELECT
    format('%-15s', 'Accuracy') AS metric,
    CASE WHEN (m.metrics::jsonb ? 'accuracy')
        THEN ROUND((m.metrics::jsonb ->> 'accuracy')::numeric, 4)
        ELSE NULL END AS value
FROM rf_metrics_temp m
UNION ALL
SELECT format('%-15s', 'Precision'),
    CASE WHEN (m.metrics::jsonb ? 'precision')
        THEN ROUND((m.metrics::jsonb ->> 'precision')::numeric, 4)
        ELSE NULL END
FROM rf_metrics_temp m
UNION ALL
SELECT format('%-15s', 'Recall'),
    CASE WHEN (m.metrics::jsonb ? 'recall')
        THEN ROUND((m.metrics::jsonb ->> 'recall')::numeric, 4)
        ELSE NULL END
FROM rf_metrics_temp m
UNION ALL
SELECT format('%-15s', 'F1 Score'),
    CASE WHEN (m.metrics::jsonb ? 'f1_score')
        THEN ROUND((m.metrics::jsonb ->> 'f1_score')::numeric, 4)
        ELSE NULL END
FROM rf_metrics_temp m;

Function Signature:

neurondb.evaluate( model_id INTEGER,      -- Model ID from neurondb.train() table_name TEXT,       -- Test data table name feature_column TEXT,   -- Feature column name target_column TEXT    -- Target column name ) RETURNS JSONB           -- Returns metrics as JSONB

Returned Metrics (JSONB):

  • accuracy: Overall classification accuracy (0-1)
  • precision: Precision score (0-1)
  • recall: Recall score (0-1)
  • f1_score: F1 harmonic mean (0-1)
  • For regression: mse, mae, rmse

Learn More

For detailed documentation on Random Forest parameters, hyperparameter tuning, feature importance, and GPU optimization, visit: Random Forest Documentation

Related Topics