cybooster Documentation¶

CyBooster - A high-performance gradient boosting implementation using Cython

This package provides: - BoosterRegressor: For regression tasks - BoosterClassifier: For classification tasks

class cybooster.BoosterClassifier(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')¶

Bases: object

Booster classifier.

n_estimators¶: int number of boosting iterations.

learning_rate¶: float controls the learning speed at training time.

n_hidden_features¶: int number of nodes in successive hidden layers.

reg_lambda¶: float L2 regularization parameter for successive errors in the optimizer (at training time).

alpha¶: float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’.

row_sample¶: float percentage of rows chosen from the training set.

col_sample¶: float percentage of columns chosen from the training set.

dropout¶: float percentage of nodes dropped from the training set.

tolerance¶: float controls early stopping in gradient descent (at training time).

direct_link¶: bool indicates whether the original features are included (True) in model’s fitting or not (False).

verbose¶: int progress bar (yes = 1) or not (no = 0) (currently).

seed¶: int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.

backend¶: str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)

solver¶: str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’, ‘enet’). ‘enet’ is a combination of ‘ridge’ and ‘lasso’ called Elastic Net.

activation¶: str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’

n_clusters¶: int number of clusters for clustering the features

clustering_method¶: str clustering method: currently ‘kmeans’, ‘gmm’

cluster_scaling¶: str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’

degree¶: int degree of features interactions to include in the model

weights_distr¶: str distribution of weights for constructing the model’s hidden layer; currently ‘uniform’, ‘gaussian’

hist¶: bool indicates whether histogram features are used or not (default is False)

bins¶: int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)

fit(self, double[:, ::1] X, long[:] y, obj=None)¶

predict(self, double[:, ::1] X)¶

predict_proba(self, double[:, ::1] X)¶

update(self, double[:] X, y, double alpha=0.5)¶

class cybooster.BoosterRegressor(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')¶

Bases: object

Booster regressor.

n_estimators¶: int number of boosting iterations.

learning_rate¶: float controls the learning speed at training time.

n_hidden_features¶: int number of nodes in successive hidden layers.

reg_lambda¶: float L2 regularization parameter for successive errors in the optimizer (at training time).

alpha¶: float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’

row_sample¶: float percentage of rows chosen from the training set.

col_sample¶: float percentage of columns chosen from the training set.

dropout¶: float percentage of nodes dropped from the training set.

tolerance¶: float controls early stopping in gradient descent (at training time).

direct_link¶: bool indicates whether the original features are included (True) in model’s fitting or not (False).

verbose¶: int progress bar (yes = 1) or not (no = 0) (currently).

seed¶: int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.

backend¶: str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)

solver¶: str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’)

activation¶: str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’

type_pi¶: str. type of prediction interval; currently “kde” (default) or “bootstrap”. Used only in self.predict, for self.replications > 0 and self.kernel in (‘gaussian’, ‘tophat’). Default is None.

replications¶: int. number of replications (if needed) for predictive simulation. Used only in self.predict, for self.kernel in (‘gaussian’, ‘tophat’) and self.type_pi = ‘kde’. Default is None.

n_clusters¶: int number of clusters for clustering the features

clustering_method¶: str clustering method: currently ‘kmeans’, ‘gmm’

cluster_scaling¶: str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’

degree¶: int degree of features interactions to include in the model

weights_distr¶: str distribution of weights for constructing the model’s hidden layer; either ‘uniform’ or ‘gaussian’

hist¶: bool whether to use histogram features or not

bins¶: int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)

fit(self, double[:, ::1] X, double[:] y)¶

predict(self, double[:, ::1] X)¶

update(self, double[:] X, y, double alpha=0.5)¶