Naive Bayes Classifier

2023-07-09

Smokers are less likely to die of age related sickness

In this article, we will revisit the example from our Supervised Learning introduction where we needed to identify different fruits in a basket. Imagine being in an orchard, where you gather information about the fruits available. The information is presented in a table with the following attributes for each fruit:

  • Name of the fruit
  • Whether it is yellow or not
  • Whether it is long or not
  • Whether it is sweet or not

Here is a sample table showing the details about the fruits in the orchard:

FruitYellowLongSweet
Mangotruefalsetrue
Applefalsefalsefalse
Mangofalsefalsetrue
Bananatruetruefalse
...
Total1000 rows

Now, given a fruit from the basket, your task is to determine if it is a mango, banana, or something else.

The core concept behind the Naive Bayes Classifier is that the attributes are considered to be conditionally independent. Even though it may not seem very practical, but being a simple algorithm this assumption is very effective.

To identify a fruit from the basket, let's say you pick up a fruit and observe that it is yellow, long, and not sweet. How can you utilize the provided table to determine the type of fruit? One way is to compute the probability of the fruit being a yellow, long and not sweet X (mango/banana/others) using Bayes’ Theorem as:

P(X, Y, L, S’) = P(Y, L, S’, X) = P(Y|L, S’, X) * P(L|S’, X) * P(S’|X) * P(X)

Here, P(Y|L, S’, X) means probability of color being yellow given that the fruit is long, not sweet and is X. P(L|S’, X), P(S’|X) have similar meanings.

In the Naive Bayes Classifier, we assume that all attributes are conditionally independent, allowing us to simplify the equation as:

P(X, Y, L, S’) = P(Y|X) * P(L|X) * P(S’|X) * P(X)

Here,

  • P(X): Probability of the fruit being X
  • P(Y|X): Probability of the color being yellow given the fruit is X
  • P(L|X): Probability of the length being long given the fruit is X
  • P(S'|X): Probability of the sweetness type being (not sweet) given the fruit is X

The difference between both the equations is that we are assuming that P(Y|L, S’, X) = P(Y|X) since all attributes are independent. We are doing the same for other components in the equation.

Based on the details provided, we can create an aggregated table that counts the occurrences of each attribute for mangoes, bananas, and other fruits:

FruitYellowLongSweetTotal Pieces
Mango2000250300
Banana350350200400
Others150100150300
Total7004506001000

In this table, the value at (Mango, Yellow) represents the number of yellow mangoes, and the value at (Mango, Total Pieces) represents the total number of mangoes. We can similarly interpret the other cells.

From this table, we can create a likelihood table, which represents the likelihood of an event occurring:

FruitYellowLongSweetTotal Pieces
Mango200/3000/300250/300300/1000
Banana350/400350/400200/400400/1000
Others150/300100/300150/300300/1000
Total700/1000450/1000600/10001000/1000

In this likelihood table, the value at (Mango, Yellow) represents the probability of the color being yellow given that the fruit is a mango (P(Y|M)), and the value at (Mango, Total Pieces) represents the probability of the fruit being a mango (P(M)). The other cells have similar interpretations.

Now, we have all the required probabilities to compute P(X, Y, L, S').

For example, let's calculate the probability of the fruit being a mango, given that it is yellow, long, and not sweet:

P(M, Y, L, S’) = 200/300 * 0/300 * 250/300 * 300/1000 = 0

Similarly, we can calculate the probabilities for a banana and others:

P(B, Y, L, S’) = 350/400 * 350/400 * 200/400 * 400/1000 = 0.153125  
P(O, Y, L, S’) = 150/300 * 100/300 * 150/300 * 300/1000 = 0.025

Therefore, the individual probabilities of the fruit being a mango, banana, or others, given that it is yellow, long, and not sweet, are:

P(M | Y, L, S’) = 0/(0+0.153125+0.025) = 0  
P(B | Y, L, S’) = 0.153125/(0+0.153125+0.025) = 0.86  
P(O | Y, L, S’) = 0.025/(0+0.153125+0.025) = 0.14

Since the probability of it being a banana is the highest, we can conclude that the fruit is a banana. This approach can be applied to all the fruits in the basket, resulting in accurate identification with high precision.

One drawback of using the Naive Bayes Classifier is the assumption of attribute independence, which may not always hold true.

The Naive Bayes classifier is available in scikit-learn as GaussianNB, MultinomialNB, and BernoulliNB in the sklearn.naive_bayes module.

Bayes’ Theorem Image Source: wikipedia