cybooster Documentation

CyBooster - A high-performance gradient boosting implementation using Cython

This package provides: - BoosterRegressor: For regression tasks - BoosterClassifier: For classification tasks

class cybooster.BoosterClassifier(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')

Bases: object

Booster classifier.

n_estimators

int number of boosting iterations.

learning_rate

float controls the learning speed at training time.

n_hidden_features

int number of nodes in successive hidden layers.

reg_lambda

float L2 regularization parameter for successive errors in the optimizer (at training time).

alpha

float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’.

row_sample

float percentage of rows chosen from the training set.

col_sample

float percentage of columns chosen from the training set.

dropout

float percentage of nodes dropped from the training set.

tolerance

float controls early stopping in gradient descent (at training time).

bool indicates whether the original features are included (True) in model’s fitting or not (False).

verbose

int progress bar (yes = 1) or not (no = 0) (currently).

seed

int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.

backend

str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)

solver

str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’, ‘enet’). ‘enet’ is a combination of ‘ridge’ and ‘lasso’ called Elastic Net.

activation

str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’

n_clusters

int number of clusters for clustering the features

clustering_method

str clustering method: currently ‘kmeans’, ‘gmm’

cluster_scaling

str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’

degree

int degree of features interactions to include in the model

weights_distr

str distribution of weights for constructing the model’s hidden layer; currently ‘uniform’, ‘gaussian’

hist

bool indicates whether histogram features are used or not (default is False)

bins

int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)

fit(self, double[:, ::1] X, long[:] y, obj=None)
predict(self, double[:, ::1] X)
predict_proba(self, double[:, ::1] X)
update(self, double[:] X, y, double alpha=0.5)
class cybooster.BoosterRegressor(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')

Bases: object

Booster regressor.

n_estimators

int number of boosting iterations.

learning_rate

float controls the learning speed at training time.

n_hidden_features

int number of nodes in successive hidden layers.

reg_lambda

float L2 regularization parameter for successive errors in the optimizer (at training time).

alpha

float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’

row_sample

float percentage of rows chosen from the training set.

col_sample

float percentage of columns chosen from the training set.

dropout

float percentage of nodes dropped from the training set.

tolerance

float controls early stopping in gradient descent (at training time).

bool indicates whether the original features are included (True) in model’s fitting or not (False).

verbose

int progress bar (yes = 1) or not (no = 0) (currently).

seed

int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.

backend

str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)

solver

str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’)

activation

str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’

type_pi

str. type of prediction interval; currently “kde” (default) or “bootstrap”. Used only in self.predict, for self.replications > 0 and self.kernel in (‘gaussian’, ‘tophat’). Default is None.

replications

int. number of replications (if needed) for predictive simulation. Used only in self.predict, for self.kernel in (‘gaussian’, ‘tophat’) and self.type_pi = ‘kde’. Default is None.

n_clusters

int number of clusters for clustering the features

clustering_method

str clustering method: currently ‘kmeans’, ‘gmm’

cluster_scaling

str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’

degree

int degree of features interactions to include in the model

weights_distr

str distribution of weights for constructing the model’s hidden layer; either ‘uniform’ or ‘gaussian’

hist

bool whether to use histogram features or not

bins

int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)

fit(self, double[:, ::1] X, double[:] y)
predict(self, double[:, ::1] X)
update(self, double[:] X, y, double alpha=0.5)