DocumentationNeurondB Documentation

Gradient Boosting

Overview

NeuronDB supports XGBoost, LightGBM, and CatBoost for gradient boosting.

XGBoost

XGBoost (Extreme Gradient Boosting) is a powerful gradient boosting framework. NeuronDB uses the unified ML API for XGBoost training and prediction.

Train XGBoost Model

Train XGBoost classifier

-- Train XGBoost classifier using unified API
CREATE TEMP TABLE xgb_model AS
SELECT neurondb.train(
    'default',              -- project name
    'xgboost',              -- algorithm
    'training_table',       -- source table
    'label',                -- target column (integer for classification)
    ARRAY['features'],      -- feature columns
    '{"max_depth": 6, "n_estimators": 100}'::jsonb  -- hyperparameters
)::integer AS model_id;

-- For regression, use numeric target column
CREATE TEMP TABLE xgb_reg_model AS
SELECT neurondb.train(
    'default',
    'xgboost',
    'training_table',
    'target',               -- numeric target column
    ARRAY['features'],
    '{}'::jsonb             -- use default hyperparameters
)::integer AS model_id;

Function Signature:

neurondb.train( project TEXT, algorithm TEXT,        -- 'xgboost' table_name TEXT, target_column TEXT, feature_columns TEXT[], hyperparameters JSONB ) RETURNS INTEGER         -- Returns model_id

Common XGBoost Hyperparameters:

max_depth (integer): Maximum tree depth. Default: 6. Range: 1-20.
n_estimators (integer): Number of boosting rounds. Default: 100.
learning_rate (float): Step size shrinkage. Default: 0.1. Range: 0.01-1.0.
subsample (float): Fraction of samples per tree. Default: 1.0. Range: 0.1-1.0.
colsample_bytree (float): Fraction of features per tree. Default: 1.0.

LightGBM

LightGBM is a fast, distributed gradient boosting framework optimized for efficiency and accuracy.

Train LightGBM model

-- Train LightGBM classifier
CREATE TEMP TABLE lgbm_model AS
SELECT neurondb.train(
    'default',
    'lightgbm',            -- algorithm
    'training_table',
    'label',
    ARRAY['features'],
    '{"num_leaves": 31, "learning_rate": 0.1}'::jsonb
)::integer AS model_id;

Common LightGBM Hyperparameters:

num_leaves (integer): Maximum tree leaves. Default: 31.
learning_rate (float): Boosting learning rate. Default: 0.1.
feature_fraction (float): Random feature subset ratio. Default: 1.0.
bagging_fraction (float): Random data subset ratio. Default: 1.0.

CatBoost

CatBoost handles categorical features automatically and is robust to overfitting.

Train CatBoost model

-- Train CatBoost classifier
CREATE TEMP TABLE catboost_model AS
SELECT neurondb.train(
    'default',
    'catboost',            -- algorithm
    'training_table',
    'label',
    ARRAY['features'],
    '{}'::jsonb            -- use defaults
)::integer AS model_id;

Prediction

Use the unified neurondb.predict function for all gradient boosting models:

Make predictions

-- Predict with trained XGBoost model
SELECT 
    id,
    neurondb.predict(
        (SELECT model_id FROM xgb_model),
        features
    ) AS prediction
FROM test_table
LIMIT 10;

-- Get model from catalog if needed
SELECT 
    id,
    neurondb.predict(
        (SELECT model_id FROM neurondb.ml_models 
         WHERE algorithm = 'xgboost' 
         ORDER BY model_id DESC LIMIT 1),
        features
    ) AS prediction
FROM test_table;