This blog post covers what was learned in Coursera’s Machine Learning Week 2 class on Supervised Learning. Topics include linear and polynomial regression, overfitting, and how to choose the right model.

Check out our video:

## Introduction

Supervised learning is a type of machine learning that uses labels to teach the algorithm what the output should be for given inputs. For example, in a supervised learning algorithm for recognizing handwritten digits, we would provide the algorithm with images of handwritten digits along with their corresponding labels (i.e., the correct digit for each image). The algorithm would then learn to recognize digits by generalizing from these labeled examples.

There are two main types of supervised learning algorithms: regression and classification.

Regression algorithms are used when the output variable is continuous (e.g., predicting the price of a stock). Classification algorithms are used when the output variable is discrete (e.g., predicting whether an email is spam or not).

In this week’s programming assignment, you will apply linear regression and logistic regression to real-world datasets. You will also learn to evaluate models and choose between different types of models for different tasks.

## Linear Regression with One Variable

In this part of the exercise, you will implement linear regression with one variable to predict profits for a food truck. Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities.

You would like to use this data to help you select which city to expand to next.

The file ex1data1.txt contains the dataset for our linear regression problem. The first column is the population of a city (in 10,000s) and the second column is the profit of a food truck in that city (in $10,000s). A negative value for profit indicates a loss.

We provide you with some helper functions that will load this data for you from ex1data1.txt into arrays `X` and `y`. In code cell 2, we have already set up the data for linear regression using scikit-learn’s `LinearRegression` function.

Your task is to complete the following functions in parts 1-4 below:

## Linear Algebra Review

Supervised learning is a type of machine learning where the algorithm is given training data that includes both input data AND desired results, and the algorithm learn how to generalize from the training data to produce desired results for new inputs.

Linear algebra is a mathematics discipline that is central to understanding supervised learning. In this review article, we will briefly introduce some of the key concepts in linear algebra that are used in machine learning, including vectors, matrices, matrix operations, and linear transformations.

## Linear Regression with Multiple Variables

In this second week of the machine learning class, we focus on supervised learning, and more specifically on linear regression with multiple variables. We learn how to best represent our data using multiple features, how to properly assess the quality of our models using training and test sets, and how to avoid overfitting our data.

## Computing Costs

In supervised learning, we must compute the cost function. This is essential in order to find the line or decision boundary that best separates our training data. The cost function will take into account how well our model is doing at predicting the correct labels for our training data. If our predictions are bad, the cost function will be high. If our predictions are good, the cost function will be low.

There are many ways to compute the cost function. One popular method is called the squared error function:

Cost(h(x),y) = 1/2 * (h(x) – y)^2

where h(x) is our hypothesis (prediction) for the value of x and y is the actual value.

## Gradient Descent

Supervised learning is a method of machine learning where the data used to train the model is labeled. This means that for each piece of data, there is a known output or result. Supervised learning is used for tasks such as classification and regression.

One of the most important concepts in supervised learning is gradient descent. Gradient descent is an optimization algorithm that is used to find the local minimum of a function. In other words, it helps us find the best values for our parameters so that our model can accurately make predictions on new data.

There are two main types of gradient descent: batch gradient descent and stochastic gradient descent. Batch gradient descent uses all of the data to calculate the error and update the parameters at each step. Stochastic gradient descent, on the other hand, only uses one piece of data to calculate the error and update the parameters.

Both methods have their pros and cons, but stochastic gradient descent is generally faster and more efficient, so it is often used in practice.

## Polynomial Regression

In this part of the exercise, you will implement polynomial regression with multiple features to predict house prices. Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.

The file ex1data2.txt contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price

of the house.

You will need to complete PolynomialRegressionCost.m and featureNormalize.m before running gradientDescentMulti.m

We start by loading and displaying some values from this dataset:

## Learning Curves

In machine learning, we usually don’t care too much about whether our algorithm understands the “true” model. Instead, we care about whether it can accurately predict on new, unseen data. Intuitively, we can think of this as how close our predictions are to the “ground truth” value.

One way to evaluate our models is to split our dataset into two parts: a training set and a test set. We train our model on the training set, and then evaluate it on the test set. The error rate on the test set is a good proxy for how well our model will do on new, unseen data.

However, there is a problem with this approach. If we have a very small dataset, then chances are that the training and test sets will not be representative of the full dataset. In other words, they might not be representative of the true distribution of data. This means that our error rate might not be accurate.

One way to solve this problem is to use learning curves. Learning curves plot the error rate on the training and test sets as a function of the number of datapoints in the training set. If we have a large enough dataset, then we expect that as we increase the number of datapoints in the training set, the error rate on both the training and test sets will decrease (because our model will be better able to fit the data). However, if our dataset is too small, then we might see a different trend: either both curves will increase (implying that more data would actually make things worse), or one curve will increase while the other decreases (implying that we are overfitting or underfitting).

## Regularization

Supervised learning algorithms require that the training data be labeled in some way so that the algorithm can learn to generalize from it. This labeling can take many forms, but one of the most common is to simply have a set of training examples where each example is labeled with the correct output. In other words, for each input value (x), we also know the correct output value (y).

Once the supervised learning algorithm has learned a generalization from the training data, we can then give it new inputs and have it predict the corresponding output values. We expect these predicted output values to be accurate most of the time, but sometimes they will be off by a little bit. The goal of supervised learning is to find a generalization that accurately predicts outputs for new inputs as much as possible.

One way to think about this process is that the supervised learning algorithm is trying to find a function (f) that takes in an input value (x) and outputs the correct label (y). Ideally, we want this function to be as simple as possible while still accurately describing the relationship between x and y. This is because if the function is too complicated, it will be difficult to understand and use; if it’s too simple, it might not accurately describe the data.

A common way to measure how well a function describes a set of data is to use what’s called a cost function. The cost function gives us a measure of how “far off” our predictions are from the actual labels in the data. We want our cost function to be small, which means our predictions are accurate.

We can think of regularization as a method for “simplifying” our function by penalizing complexity. In other words, we want to discourage our function from being too complicated (having too many parameters) so that it’s easier to understand and use. There are many ways to regularize functions, but one common approach is called L1 regularization.

## Conclusion

Supervised learning is a type of machine learning where the model is built using training data, and then the model is used to make predictions on new data. This can be contrasted with unsupervised learning, where the model is not given training data, and instead learns from the data itself. Supervised learning is more common than unsupervised learning, as it usually results in more accurate predictions.

Keyword: Coursera Machine Learning Week 2: Supervised Learning