Kmeans
k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.
Import
import * as Datacook from 'datacook';
const { KMeans } = DataCook.Model;
Constructor
const kmeans = new KMeans({ nCluster = 5, init = 'kmeans++' });
Option parameters
parameter | type | description |
---|---|---|
nCluster | number | The number of clusters to form as well as the number of centroids,default=8 |
init | “random”| “kmeans++” | Centroids initialize method - ‘kmeans++’: select initial cluster centroids using kmeans++ - ‘random’: randomly select initial centroids default=”kmeans++” |
nInit | number | Number of time the algorithm will be run with different initialization,default=10 |
maxIterTimes | number | Maximum number of iterations of the k-means algorithm for a single run.default=1000 |
tol | number | Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.default=1e-5 |
verbose | boolean | verbosity mode,default=false |
Methods
async fit(xData)
Fit kmeans model
Parameters
Parameter | type | description |
---|---|---|
xData | Tensor| number[][] | input data of shape (nSamples,nFeatures) in type of array or tensor |
Returns
tf.Tensor
async predict(xData)
Predict sample clusters for given input.
Parameters
parameter | type | description |
---|---|---|
xData | Tensor| number[][] | input data of shape (nSamples, nFeatures) in type of array or tensor |
Returns
tf.Tensor
async score(xData)
Get scores for input xData on the kmeans model. score = -inertia, larger score usually represent better fit.
Parameters
parameter | type | description |
---|---|---|
xData | Tensor| number[][] | input data of shape (nSamples, nFeatures) in type of array or tensor |
Returns
tensor of -inertia
async trainOnBatch(xData: FeatureInputType)
Train kmeans model by batch. Here we apply mini-batch kmeans algorithm to update centroids in each iteration. The return value is inertia copmuted for input batch.
parameter | type | description |
---|---|---|
xData | Tensor| number[][] | input data of shape (nSamples, nFeatures) in type of array or tensor |
Returns
inertia for input batch data
Examples
Basic Usage
import * as Datacook from 'datacook';
const { KMeans } = DataCook.Model;
const xData = [
[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0]
];
const kmeans = new KMeans({ nClusters: 3 });
await kmeans.fit(xData);
const predClus = await kmeans.predict(xData);
predClus.print();
// Tensor
// [0, 0, 0, 1, 1, 1]
// save and load model
const modelJSON = await kmeans.toJson();
const kmeans2 = new KMeans({});
kmeans2.fromJson(modelJSON);
const predClus = await kmeans2.predict(xData);
predClus.print();
// Tensor
// [0, 0, 0, 1, 1, 1]
Train on batch
import * as Datacook from 'datacook';
import * as tf from '@tensorflow/tfjs-core';
const { KMeans } = DataCook.Model;
// create dataset
const clust1 = tf.add(tf.mul(tf.randomNormal([ 100, 2 ]), tf.tensor([ 2, 2 ])), tf.tensor([ 5, 5 ]));
const clust2 = tf.add(tf.mul(tf.randomNormal([ 100, 2 ]), tf.tensor([ 2, 2 ])), tf.tensor([ 10, 0 ]));
const clust3 = tf.add(tf.mul(tf.randomNormal([ 100, 2 ]), tf.tensor([ 2, 2 ])), tf.tensor([ -10, 0 ]));
const clusData = tf.concat([ clust1, clust2, clust3 ]);
// fit kmeans model
const kmeans = new KMeans({ nClusters: 3 });
const batchSize = 30;
const epochSize = Math.floor(clusData.shape[0] / batchSize);
for (let i = 0; i < 50; i++) {
const j = Math.floor(i % epochSize);
const batchX = tf.slice(clusData, [j * batchSize, 0], [batchSize ,2]);
await kmeans.trainOnBatch(batchX);
}
const predClus = await kmeans.predict(clusData);
const accuracy = await checkClusAccuracy(predClus);
console.log('accuracy:', accuracy);