If you’re enrolled in Coursera’s Machine Learning course, then you know that regularization is an important concept. But how well do you really understand it? Take this quiz to find out!

Check out this video for more information:

## Introduction

In machine learning, we usually try to minimize some cost function J(θ) by changing the parameters in θ. For example, in linear regression, J(θ) is the sum of the squared errors between our hypothesis hθ(x(i)) and the ith training label y(i), and we try to find the value of θ that minimizes J(θ). In doing so, we hope that hθ also does a good job of predicting y on new examples x that we haven’t seen before (i.e. generalizing well).

In order to ensure that hθ generalizes well, i.e. doesn’t just fit the training set well but also has low error on unseen examples, we need to keep the learned hypothesis as simple as possible without sacrificing too much training set accuracy. One way to prevent overfitting is called regularization.

We’ve seen two methods of regularization in this course:

-L2 Regularization: The cost function for linear regression becomes J(θ)=1/2m ∑ (hθ (x(i))-y (i))^2+λ / 2m∑ θ^2 .The parameter λ controls how much you want to penalize high values for θ and therefore how much fitting your training data exactly vs. generalizing to new, unseen data counts when deciding what value λ should have. A common default choice for λ is 0.01; if you have a very large dataset though you might want to increase this value (say, up towards 10). If your recall, increasing λ will decrease variance but increase bias; Conversely decreasing it will increase variance but decrease bias! You can think of bias as how close hθ tends to be to y on average while variance measures how close each individual prediction hθ(x) is likely be be to y .

-early stopping: This method works by looking at how the cost function changes with iteration number t during gradient descent and halting gradient descent once it has started decreasing but then begins increasing again (otherwise known as a ‘ elbow’ ). This point corresponds approximately to where our hypothesis h tθ begins overfitting the training set; Since halting gradient descent at this point takes into account both variance and bias , early stopping regularization can lead improved test set performance over L 2 regularization alone .

## What is Regularization?

Regularization is the process of adding information in order to reduce uncertainty. In the context of learning algorithms,regularization typically refers to adding information in order to reduce overfitting.

Overfitting occurs when a model captures too much of the noise in the training data, resulting in poorer performance on unseen data.

One common approach to regularization is to constrain or penalize the weights of the model. This approach encourages the model to find solutions with smaller weights, which typically leads to simpler models that are less likely to overfit.

## Why is Regularization Important?

Regularization is a technique used to avoid overfitting in machine learning models. Overfitting occurs when a model is too closely fit to the training data, and does not generalize well to new data. This can lead to poor performance on the test set.

Regularization helps to avoid overfitting by penalizing models that have too many parameters. This forces the model to be simpler, and thus more likely to generalize well.

There are two main types of regularization: L1 and L2. L1 regularization penalizes the sum of the absolute values of the weights, while L2 regularization penalizes the sum of the squared weights.

## Types of Regularization

There are three main types of regularization: weight decay, L1 regularization, and L2 regularization. Weight decay is typically used in neural networks and is the process of reducing the magnitude of the weights to reduce overfitting. This can be done by adding a term to the cost function that reduces the weights if they are too large. L1 regularization is the process of adding a term to the cost function that penalizes weights that are too large. This results in a sparse vector of weights, which can be useful if you only want a few features to be used in your model. L2 regularization is similar to L1 regularization, but instead of penalizing weights that are too large, it penalizes them if they are too small. This results in a more even distribution of weights and can be useful if you want all features to be used in your model.

## How to Implement Regularization

There are two main ways to regularize a model:

-by adding a penalty on the size of the weights

-by constraining the size of the weights

The first way is called **weight regularization**, and the second way is called **constraint-based regularization**. You can also combine these two methods to get the best of both worlds.

In weight regularization, you add a penalty on the sum of the squares of the weights (or, more commonly, on the sum of the absolute values of the weights). This is called an **L2 penalty** because it is proportional to the square of the weights. In constraint-based regularization, you constrain the weights to be small. This is called an **L1 penalty** because it is proportional to the absolute value of the weights.

You can usually get good performance with either method, but L2 regularization tends to give better results when you have more data, and L1 regularization tends to give better results when you have less data.

## Regularization in Practice

In machine learning, we often use regularization techniques to prevent overfitting and help our models generalize to new data. In this quiz, we’ll put some of those techniques into practice on a real dataset.

## Summary

Welcome to the quiz on regularization! You will have 5 questions to complete, each worth 20% of the total quiz grade. Please select the best answer for each question. Good luck!

## Further Reading

-Here are some articles that discuss regularization in more depth:

-Introduction to Regularization

-The Benefits and Drawbacks of Regularization

-Regularization: A Clear and Concise Explanation

-Why Does Regularization Work?

And here are some helpful videos on the topic:

-Regularization: The Big Picture (3 mins)

-Regularization: Intuition (7 mins)

-Overfitting, Underfitting and Regularization (10 mins)

## References

-https://www.coursera.org/learn/machine-learning/lecture/6ibnK/regularization

-https://www.coursera.org/learn/machine-learning/supplement/5yT7D/programming-exercise-5

## About the Author

I’m excited to be part of Coursera’s Machine Learning course! My name is Sebastian Thrun and I am a professor at Stanford University. I am also the co-founder and CEO of Udacity, an online education startup that is focused on bringing accessible, affordable, engaging learning experiences to people everywhere.

As part of my work at Udacity, I created and taught the very first MOOC (Massive Open Online Course), which was an online version of my Stanford AI course. This was the first time anyone had taught an online class with thousands of students. It was a great success, and since then Udacity has offered many other MOOCs, all with the same goal of providing access to high quality education for everyone.

The Machine Learning course on Coursera is based on my Stanford course, and includes video lectures, readings, and quizzes. I hope you enjoy it!

Keyword: Coursera Machine Learning: Regularization Quiz