GradientBoostingClassifier

Gradient boosting is a machine learning method which provides predictions by training an ensemble of weak estimatorss. GradientBoostingClassifier is an implementation of gradient boosting for classification task.

Import

import * as DataCook from '@pipcook/datacook';
const { GradientBoostingClassifier } = DataCook.Model;

Constructor

const gb = newGradientBoostingClassifier({ nEstimators: 10 });

Option parameters

Parameter	Type	Description
nEstimators	number	number of estimators for fitting. default = 100
criterion
minSamplesLeaf	number	The minimum number of samples required to be at leaf node, default = 1
minImpurityDecrease	number	A node will be split if this split induces a decrease of the impurity greater than or equal to this value
minWeightFractionLeaf
minSamplesSplit	number	Minimum number of samples required to split an internal node, default = 2va
validationFraction	number	The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if `nIterNoChange` is set to an integer.
ccpAlpha	number	Complexity parameter used for Minimal Cost-complexity Pruning. The subtree with the largest cost complexity that is smaller than `ccpAlpha` will be chosen. By default, no pruning is performed.
maxDepth	number	Maximum depth of the individual regression tree, default = 3
maxFeatures	number, or {“auto”, “sqrt”, “log2”}	The number of features to consider when looking for the best split: - If integer value, then consider `maxFeatures` features at each split. - If not interger value, then `maxFeatures` is a fraction and `Math.floor(maxFeatures * nFeatures)` features are considered at each split. - If “auto”, then `max_features=sqrt(n_features)`. - If “sqrt”, then `maxFeatures=sqrt(nFeatures)`. - If “log2”, then `maxFeatures=log2(nFeatures)`. - If none, then `maxFeatures=nFatures`.
tol	number	Tolerance for the early stopping. When the loss is not improving by at least tol for `nIterNoChange` iterations (if set to a number), the training stops, default = 1e-4
nIterNoChange	number	Used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to `none` to disable early stopping. If set to a number, it will set aside `validation_fraction` size of the training data as validation and terminate training when validation score is not improving in all of the previous `nIterNoChange` numbers of iterations. The split is stratified.

Methods

fit

Fit gradient boosting classifier

Syntax

async fit(xData: number[][] | Tensor2D, yData: number[] | string[] | boolean[] | Tensor1D): Promise<void>

Parameters

Parameter	type	description
xData	Tensor2D\| number[][]	input data of shape (nSamples,nFeatures) in type of array or tensor
yData	Tensor1D\| number[] \| string[] \| boolean[]	input target

predict

Make predictions using gradient boosting model.

async predict(xData: Tensor|RecursiveArray<number>): Promise<Tensor>

Parameters

parameter	type	description
xData	Tensor	RecursiveArray `<number>`

Returns

Promise of fitted values

fromJson

Load model paramters from json string object

async fromJson(modelJson: string)

Parameters

parameter	type	description
modelJson	string	model json string

toJson

Export model paramters to json string

async toJson(): Promise<string>

Returns

String output of model json