Multinomial Naive Bayes

Naive Bayes model is a classical supervised classification algorithm based on Bayes’ therom. Naive Bayes model applies the independence assumption of conditional probability for feature pairs.

\[P(x_1,...,x_n|y) = P(x_1|y)P(x_2|y)...P(x_n|y)\]

According to Bayes’ therom, the posterior probabiliy of $y$ given $X$ is:

\[P(y|x_1,..,x_n) = \frac{P(y)P(x_1,...,x_n|y)}{p(x_1,x_2...,x_n)}\]

Multinomial Naive Bayes is a classical variant of naive bayes model, and is often used in text classification task. In this model, conditional probability $P(x_i\|y)$ is approximated by computing the frequency of $x$ in class $y$:

\[\hat P(x_i|y) = \frac{N_{yi}+\alpha}{N_y + \alpha n}\]

where $N_{yi}$ is the number of $x_i$ appeared in class $y$, $n$ is number of features, $\alpha$ is smooth parameter. Larger $\alpha$ often leads to more evenly results for all $P(x_i\|y)$.

Import

import * as Datacook from '@pipcook/datacook';
const { MultimonialNB } = DataCook.Model;

Constructor

const mnb = new MultimonialNB({ alpha: 0.1 });

Option parameters

parameter	type	description
alpha	number	smooth parameter, set it to a number greater than 0 to avoid overfitting and divide by zero error. larger number leads to more evenly result, default = 1

Methods

fit

Training multinomial naive bayes model according to X, y.

Syntax

async train(xData: Array<any> | Tensor, yData: Array<any> | Tensor): Promise<MultinomialNB>

Parameters

parameter	type	description
xData	Tensor \| RecursiveArray<number>	Tensor like of shape (n_samples, n_features), input feature
yData	Tensor \| Array<any>	Tensor like of shape (n_sample, ), input target values

Returns

MultinomialNB

predict

Make predictions using naive bayes model.

Syntax

async predict(xData: Tensor | RecursiveArray<number>): Promise<Tensor>

Parameters

parameter	type	description
xData	Tensor	RecursiveArray	Input features

predictProba

Predict probabilities for each class.

Syntax

async predictProba(xData: Tensor | RecursiveArray<number>): Promise<Tensor>

Parameters

parameter	type	description
xData	Tensor \| RecursiveArray	Input features

Returns

Predicted probabilities

fromJson

Load model paramters from json string object

Syntax

async fromJson(modelJson: string)

Parameters

parameter	type	description
modelJson	string	model json string

toJson

Dump model parameters to json string.

Syntax

async toJson(): Promise<string>

Returns

string of model json

Examples

Following is an example about training an naive bayes model for spam email detection task.

Click here for live demo of this model

import * as DataCook from '@pipcook/datacook';
/** if in browser, don't include 'node-fetch' and 'fs' **/
import fetch from 'node-fetch';
import * as fs from 'fs';

const { MultinomialNB } = DataCook.Model.NaiveBayes;
const { OneHotEncoder } = DataCook.Encoder;
const { CountVectorizer } = DataCook.Text;
const { accuracyScore } = DataCook.Metrics;

console.log(OneHotEncoder);

const res = await fetch('http://127.0.0.1:4000/datacook/assets/dataset/spam.csv');
const text = await res.text();
const data = text.split('\n').map((d) => d.split(','));

const stopwords = 'i\nme\nmy\nmyself\nwe\nour\nours\nourselves\nyou\nyour\nyours\nyourself\nyourselves\nhe\nhim\nhis\nhimself\nshe\nher\nhers\nherself\nit\nits\nitself\nthey\nthem\ntheir\ntheirs\nthemselves\nwhat\nwhich\nwho\nwhom\nthis\nthat\nthese\nthose\nam\nis\nare\nwas\nwere\nbe\nbeen\nbeing\nhave\nhas\nhad\nhaving\ndo\ndoes\ndid\ndoing\na\nan\nthe\nand\nbut\nif\nor\nbecause\nas\nuntil\nwhile\nof\nat\nby\nfor\nwith\nabout\nagainst\nbetween\ninto\nthrough\nduring\nbefore\nafter\nabove\nbelow\nto\nfrom\nup\ndown\nin\nout\non\noff\nover\nunder\nagain\nfurther\nthen\nonce\nhere\nthere\nwhen\nwhere\nwhy\nhow\nall\nany\nboth\neach\nfew\nmore\nmost\nother\nsome\nsuch\nno\nnor\nnot\nonly\nown\nsame\nso\nthan\ntoo\nvery\ns\nt\ncan\nwill\njust\ndon\nshould\nnow';
const contents = data.map(d => d[1])
const labels = data.map(d => d[0]);

const countVectorizer = new CountVectorizer(contents, stopwords.split('\n'));
console.log(countVectorizer.wordOrder.length);
const textVec = countVectorizer.transform(contents);

const mnb = new MultinomialNB();
await mnb.train(textVec, labels);
const yPred = await mnb.predict(textVec);


console.log('accuracy score');
console.log(accuracyScore(yPred, labels));


/** save model file, if in browser, don't include following two lines **/
await fs.writeFile('./model.json', mnb.toJson(), ()=>{});
await fs.writeFile('./vectorizer.json', countVectorizer.toJson(), ()=>{});it('\n').map((d) => d.split(','));