Supervised Learning

2023-07-09

To predict the future, look to the past

Classification Problem

Imagine this scenario: You are handed a basket filled with various types of fruits, but you have absolutely no knowledge about fruits. If I were to randomly select a fruit from the basket and ask you to identify it, would you be able to give me the correct answer? Most likely not, unless you resort to using Google Image Search.

Now, let's consider a different situation. I take you to an orchard where you are shown a wide variety of fruits, and I provide you with the names of each fruit. After this exposure and with my guidance, you would probably become a fast learner and be able to identify most of the fruits from the basket with reasonable accuracy. Your learning ability, combined with the assistance I provide, would contribute to your success.

Now, let's shift gears and think about teaching a computer to perform the same task. Obviously, I can't physically take the computer to the orchard to help it understand. Instead, I would provide the computer with a list of fruits from the orchard, along with some specific features such as color, shape, volume, texture, and so on. I would then ask the computer to learn this information using sophisticated algorithms. Once the computer has learned, it can apply its knowledge to identify the fruits from the basket. In all likelihood, it would also be able to correctly identify most of the fruits.

This type of problem, where we teach a computer to classify objects based on labeled data, is commonly known as a Classification Problem. The learning model is given a set of observations with pre-assigned labels, and its task is to classify new, unseen observations based on what it has learned.

Regression Problem

Let's consider another example. Imagine you own a bicycle rental shop and you want to make an estimate of how many bicycles you will be able to rent on a given day. Since you have experience in this business, you can make an educated guess based on factors like the day of the week, current weather conditions, temperature, humidity, the season, and whether it's a holiday or not. These factors help you form a rough estimate, right?

Now, let's say I provide a computer with quantified data, including the number of bikes rented for each day over the past two years, along with various factors such as the day of the week, weather conditions, temperature, humidity, season, and holidays. I ask the computer to predict the number of bikes your shop will be able to rent for each of the next seven days. Will the computer be able to make these predictions? Certainly, there's a good chance it can.

This type of problem, where the goal is to predict a real number value based on learning from labeled data, is known as a Regression Problem. In this case, you provide the learning model with a set of observations that include the corresponding output values, and then ask it to predict the output for new, unseen observations.

Supervised Learning

These types of problems, namely Classification and Regression, are typically tackled using Supervised Learning models.

In supervised learning, an algorithm is provided with a set of observations that have known desired outputs (known as the training set). The algorithm's task is to learn from this data and create a generalized model that can predict the desired output for any new, unseen observation, whether or not it was part of the training set.

The main distinction between a regression problem and a classification problem lies in the nature of the predicted value. In regression, the goal is to predict a value from a range of continuous values, which can be practically infinite. On the other hand, in classification, the aim is to predict a value from a set of discrete and finite values.

To illustrate this, refer to the image below:

Classification and Regression

Image source: http://ipython-books.github.io

Classification problems can be further categorized into two main types:

  1. Binary Classification: In this type, the goal is to classify an observation into one of two possible classes.
  2. Multiclass Classification Here, the objective is to assign an observation to one of several possible classes (more than two).

There is also another common type of classification problem known as Multi-Label classification. In this scenario, an observation needs to be classified into multiple classes from a finite set of classes.

To understand the distinction between binary and multiclass classification, refer to the image below:

Binary and Multiclass Classification

Image source: Coursera