Multinomial Naive Bayes
Naive Bayes model is a classical supervised classification algorithm based on Bayes’ therom. Naive Bayes model applies the independence assumption of conditional probability for feature pairs.
\[P(x_1,...,x_n|y) = P(x_1|y)P(x_2|y)...P(x_n|y)\]According to Bayes’ therom, the posterior probabiliy of $y$ given $X$ is:
\[P(y|x_1,..,x_n) = \frac{P(y)P(x_1,...,x_n|y)}{p(x_1,x_2...,x_n)}\]Multinomial Naive Bayes is a classical variant of naive bayes model, and is often used in text classification task. In this model, conditional probability \(P(x_i\|y)\) is approximated by computing the frequency of \(x\) in class \(y\):
\[\hat P(x_i|y) = \frac{N_{yi}+\alpha}{N_y + \alpha n}\]where \(N_{yi}\) is the number of \(x_i\) appeared in class \(y\), \(n\) is number of features, $\alpha$ is smooth parameter. Larger $\alpha$ often leads to more evenly results for all \(P(x_i\|y)\).
Import
import * as Datacook from '@pipcook/datacook';
const { MultimonialNB } = DataCook.Model;
Constructor
const mnb = new MultimonialNB({ alpha: 0.1 });
Option parameters
parameter | type | description |
---|---|---|
alpha | number | smooth parameter, set it to a number greater than 0 to avoid overfitting and divide by zero error. larger number leads to more evenly result, default = 1 |
Methods
fit
Training multinomial naive bayes model according to X, y.
Syntax
async train(xData: Array<any> | Tensor, yData: Array<any> | Tensor): Promise<MultinomialNB>
Parameters
parameter | type | description |
---|---|---|
xData | Tensor | RecursiveArray<number> | Tensor like of shape (n_samples, n_features), input feature |
yData | Tensor | Array<any> | Tensor like of shape (n_sample, ), input target values |
Returns
MultinomialNB
predict
Make predictions using naive bayes model.
Syntax
async predict(xData: Tensor | RecursiveArray<number>): Promise<Tensor>
Parameters
parameter | type | description | |
---|---|---|---|
xData | Tensor | RecursiveArray | Input features |
predictProba
Predict probabilities for each class.
Syntax
async predictProba(xData: Tensor | RecursiveArray<number>): Promise<Tensor>
Parameters
parameter | type | description |
---|---|---|
xData | Tensor | RecursiveArray | Input features |
Returns
Predicted probabilities
fromJson
Load model paramters from json string object
Syntax
async fromJson(modelJson: string)
Parameters
parameter | type | description |
---|---|---|
modelJson | string | model json string |
toJson
Dump model parameters to json string.
Syntax
async toJson(): Promise<string>
Returns
string of model json
Examples
Following is an example about training an naive bayes model for spam email detection task.
Click here for live demo of this model
import * as DataCook from '@pipcook/datacook';
/** if in browser, don't include 'node-fetch' and 'fs' **/
import fetch from 'node-fetch';
import * as fs from 'fs';
const { MultinomialNB } = DataCook.Model.NaiveBayes;
const { OneHotEncoder } = DataCook.Encoder;
const { CountVectorizer } = DataCook.Text;
const { accuracyScore } = DataCook.Metrics;
console.log(OneHotEncoder);
const res = await fetch('http://127.0.0.1:4000/datacook/assets/dataset/spam.csv');
const text = await res.text();
const data = text.split('\n').map((d) => d.split(','));
const stopwords = 'i\nme\nmy\nmyself\nwe\nour\nours\nourselves\nyou\nyour\nyours\nyourself\nyourselves\nhe\nhim\nhis\nhimself\nshe\nher\nhers\nherself\nit\nits\nitself\nthey\nthem\ntheir\ntheirs\nthemselves\nwhat\nwhich\nwho\nwhom\nthis\nthat\nthese\nthose\nam\nis\nare\nwas\nwere\nbe\nbeen\nbeing\nhave\nhas\nhad\nhaving\ndo\ndoes\ndid\ndoing\na\nan\nthe\nand\nbut\nif\nor\nbecause\nas\nuntil\nwhile\nof\nat\nby\nfor\nwith\nabout\nagainst\nbetween\ninto\nthrough\nduring\nbefore\nafter\nabove\nbelow\nto\nfrom\nup\ndown\nin\nout\non\noff\nover\nunder\nagain\nfurther\nthen\nonce\nhere\nthere\nwhen\nwhere\nwhy\nhow\nall\nany\nboth\neach\nfew\nmore\nmost\nother\nsome\nsuch\nno\nnor\nnot\nonly\nown\nsame\nso\nthan\ntoo\nvery\ns\nt\ncan\nwill\njust\ndon\nshould\nnow';
const contents = data.map(d => d[1])
const labels = data.map(d => d[0]);
const countVectorizer = new CountVectorizer(contents, stopwords.split('\n'));
console.log(countVectorizer.wordOrder.length);
const textVec = countVectorizer.transform(contents);
const mnb = new MultinomialNB();
await mnb.train(textVec, labels);
const yPred = await mnb.predict(textVec);
console.log('accuracy score');
console.log(accuracyScore(yPred, labels));
/** save model file, if in browser, don't include following two lines **/
await fs.writeFile('./model.json', mnb.toJson(), ()=>{});
await fs.writeFile('./vectorizer.json', countVectorizer.toJson(), ()=>{});it('\n').map((d) => d.split(','));