DocumentationNeurondB Documentation

Gradient Boosting

Overview

NeuronDB supports XGBoost, LightGBM, and CatBoost for gradient boosting.

XGBoost

XGBoost (Extreme Gradient Boosting) is a powerful gradient boosting framework. NeuronDB uses the unified ML API for XGBoost training and prediction.

Train XGBoost Model

Train XGBoost classifier

-- Train XGBoost classifier using unified API
CREATE TEMP TABLE xgb_model AS
SELECT neurondb.train(
    'default',              -- project name
    'xgboost',              -- algorithm
    'training_table',       -- source table
    'label',                -- target column (integer for classification)
    ARRAY['features'],      -- feature columns
    '{"max_depth": 6, "n_estimators": 100}'::jsonb  -- hyperparameters
)::integer AS model_id;

-- For regression, use numeric target column
CREATE TEMP TABLE xgb_reg_model AS
SELECT neurondb.train(
    'default',
    'xgboost',
    'training_table',
    'target',               -- numeric target column
    ARRAY['features'],
    '{}'::jsonb             -- use default hyperparameters
)::integer AS model_id;

Function Signature:

neurondb.train( project TEXT, algorithm TEXT,        -- 'xgboost' table_name TEXT, target_column TEXT, feature_columns TEXT[], hyperparameters JSONB ) RETURNS INTEGER         -- Returns model_id

Common XGBoost Hyperparameters:

  • max_depth (integer): Maximum tree depth. Default: 6. Range: 1-20.
  • n_estimators (integer): Number of boosting rounds. Default: 100.
  • learning_rate (float): Step size shrinkage. Default: 0.1. Range: 0.01-1.0.
  • subsample (float): Fraction of samples per tree. Default: 1.0. Range: 0.1-1.0.
  • colsample_bytree (float): Fraction of features per tree. Default: 1.0.

LightGBM

LightGBM is a fast, distributed gradient boosting framework optimized for efficiency and accuracy.

Train LightGBM model

-- Train LightGBM classifier
CREATE TEMP TABLE lgbm_model AS
SELECT neurondb.train(
    'default',
    'lightgbm',            -- algorithm
    'training_table',
    'label',
    ARRAY['features'],
    '{"num_leaves": 31, "learning_rate": 0.1}'::jsonb
)::integer AS model_id;

Common LightGBM Hyperparameters:

  • num_leaves (integer): Maximum tree leaves. Default: 31.
  • learning_rate (float): Boosting learning rate. Default: 0.1.
  • feature_fraction (float): Random feature subset ratio. Default: 1.0.
  • bagging_fraction (float): Random data subset ratio. Default: 1.0.

CatBoost

CatBoost handles categorical features automatically and is robust to overfitting.

Train CatBoost model

-- Train CatBoost classifier
CREATE TEMP TABLE catboost_model AS
SELECT neurondb.train(
    'default',
    'catboost',            -- algorithm
    'training_table',
    'label',
    ARRAY['features'],
    '{}'::jsonb            -- use defaults
)::integer AS model_id;

Prediction

Use the unified neurondb.predict function for all gradient boosting models:

Make predictions

-- Predict with trained XGBoost model
SELECT 
    id,
    neurondb.predict(
        (SELECT model_id FROM xgb_model),
        features
    ) AS prediction
FROM test_table
LIMIT 10;

-- Get model from catalog if needed
SELECT 
    id,
    neurondb.predict(
        (SELECT model_id FROM neurondb.ml_models 
         WHERE algorithm = 'xgboost' 
         ORDER BY model_id DESC LIMIT 1),
        features
    ) AS prediction
FROM test_table;

Function Signature:

neurondb.predict( model_id INTEGER,  -- Model ID from neurondb.train() features VECTOR    -- Feature vector ) RETURNS NUMERIC      -- Prediction value

Model Evaluation

Evaluate gradient boosting models using the unified evaluation API:

Evaluate model

-- Evaluate XGBoost model
DO $$
DECLARE
    mid integer;
    metrics_result jsonb;
BEGIN
    -- Get model_id
    SELECT model_id INTO mid FROM xgb_model LIMIT 1;
    
    -- Evaluate
    metrics_result := neurondb.evaluate(
        mid,
        'test_table',
        'features',
        'label'
    );
    
    -- Display metrics
    RAISE NOTICE 'Accuracy: %', metrics_result->>'accuracy';
    RAISE NOTICE 'Precision: %', metrics_result->>'precision';
    RAISE NOTICE 'Recall: %', metrics_result->>'recall';
    RAISE NOTICE 'F1 Score: %', metrics_result->>'f1_score';
END $$;

Learn More

For detailed documentation on gradient boosting algorithms, hyperparameter tuning, feature importance, and model comparison, visit: Gradient Boosting Documentation

Related Topics