cybooster Documentation¶
CyBooster - A high-performance gradient boosting implementation using Cython
This package provides: - BoosterRegressor: For regression tasks - BoosterClassifier: For classification tasks
- class cybooster.BoosterClassifier(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')¶
Bases:
object
Booster classifier.
- n_estimators¶
int number of boosting iterations.
- learning_rate¶
float controls the learning speed at training time.
int number of nodes in successive hidden layers.
- reg_lambda¶
float L2 regularization parameter for successive errors in the optimizer (at training time).
- alpha¶
float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’.
- row_sample¶
float percentage of rows chosen from the training set.
- col_sample¶
float percentage of columns chosen from the training set.
- dropout¶
float percentage of nodes dropped from the training set.
- tolerance¶
float controls early stopping in gradient descent (at training time).
- direct_link¶
bool indicates whether the original features are included (True) in model’s fitting or not (False).
- verbose¶
int progress bar (yes = 1) or not (no = 0) (currently).
- seed¶
int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.
- backend¶
str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)
- solver¶
str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’, ‘enet’). ‘enet’ is a combination of ‘ridge’ and ‘lasso’ called Elastic Net.
- activation¶
str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’
- n_clusters¶
int number of clusters for clustering the features
- clustering_method¶
str clustering method: currently ‘kmeans’, ‘gmm’
- cluster_scaling¶
str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’
- degree¶
int degree of features interactions to include in the model
- weights_distr¶
str distribution of weights for constructing the model’s hidden layer; currently ‘uniform’, ‘gaussian’
- hist¶
bool indicates whether histogram features are used or not (default is False)
- bins¶
int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)
- fit(self, double[:, ::1] X, long[:] y, obj=None)¶
- predict(self, double[:, ::1] X)¶
- predict_proba(self, double[:, ::1] X)¶
- update(self, double[:] X, y, double alpha=0.5)¶
- class cybooster.BoosterRegressor(obj, int n_estimators=100, double learning_rate=0.1, int n_hidden_features=5, double reg_lambda=0.1, double alpha=0.5, double row_sample=1, double col_sample=1, double dropout=0, double tolerance=1e-6, int direct_link=1, int verbose=1, int seed=123, str backend='cpu', str activation='relu', str weights_distr='uniform')¶
Bases:
object
Booster regressor.
- n_estimators¶
int number of boosting iterations.
- learning_rate¶
float controls the learning speed at training time.
int number of nodes in successive hidden layers.
- reg_lambda¶
float L2 regularization parameter for successive errors in the optimizer (at training time).
- alpha¶
float compromise between L1 and L2 regularization (must be in [0, 1]), for solver == ‘enet’
- row_sample¶
float percentage of rows chosen from the training set.
- col_sample¶
float percentage of columns chosen from the training set.
- dropout¶
float percentage of nodes dropped from the training set.
- tolerance¶
float controls early stopping in gradient descent (at training time).
- direct_link¶
bool indicates whether the original features are included (True) in model’s fitting or not (False).
- verbose¶
int progress bar (yes = 1) or not (no = 0) (currently).
- seed¶
int reproducibility seed for nodes_sim==’uniform’, clustering and dropout.
- backend¶
str type of backend; must be in (‘cpu’, ‘gpu’, ‘tpu’)
- solver¶
str type of ‘weak’ learner; currently in (‘ridge’, ‘lasso’)
- activation¶
str activation function: currently ‘relu’, ‘relu6’, ‘sigmoid’, ‘tanh’
- type_pi¶
str. type of prediction interval; currently “kde” (default) or “bootstrap”. Used only in self.predict, for self.replications > 0 and self.kernel in (‘gaussian’, ‘tophat’). Default is None.
- replications¶
int. number of replications (if needed) for predictive simulation. Used only in self.predict, for self.kernel in (‘gaussian’, ‘tophat’) and self.type_pi = ‘kde’. Default is None.
- n_clusters¶
int number of clusters for clustering the features
- clustering_method¶
str clustering method: currently ‘kmeans’, ‘gmm’
- cluster_scaling¶
str scaling method for clustering: currently ‘standard’, ‘robust’, ‘minmax’
- degree¶
int degree of features interactions to include in the model
- weights_distr¶
str distribution of weights for constructing the model’s hidden layer; either ‘uniform’ or ‘gaussian’
- hist¶
bool whether to use histogram features or not
- bins¶
int or str number of bins for histogram features (same as numpy.histogram, default is ‘auto’)
- fit(self, double[:, ::1] X, double[:] y)¶
- predict(self, double[:, ::1] X)¶
- update(self, double[:] X, y, double alpha=0.5)¶