# How to Find Machine Learning Outliers

This blog post will show you how to find outliers in your machine learning data using Python and the scikit-learn library.

Checkout this video:

## Introduction: How to Find Machine Learning Outliers

In this post, we’ll discuss how to find outliers in machine learning data. We’ll start with a brief discussion of what outliers are and why they’re important. We’ll then talk about some of the methods used to detect outliers in machine learning data. Finally, we’ll discuss some of the potential problems with outlier detection in machine learning.

## Why Find Outliers?

There are many reasons why you might want to find outliers in your data. Maybe you’re looking for unusual patterns that could indicate fraud or error. Maybe you want to clean up your data set before training a machine learning model. Or maybe you’re just curious and want to see what kind of strange things are hiding in your data set!

Whatever your reason, there are a few different ways to go about finding outliers in machine learning data. In this post, we’ll take a look at a few of the most common methods and see how they work in practice.

## How to Find Outliers

An outlier is an observation point that is distant from other observations. Outliers can occur in either a univariate or multivariate setting. In a univariate setting, an outlier is defined as a point that is greater than three standard deviations from the mean. In a multivariate setting, an outlier is defined as a point that is more than three standard deviations from the mean in at least one direction.

There are various ways to find outliers in your data, but the most common method is to use a technique called z-score normalization. Z-score normalization transforms your data so that the mean is 0 and the standard deviation is 1. This transformation allows you to easily compare values and find outliers.

To find outliers using z-score normalization, you first need to calculate the z-scores for each observation in your data. You can do this by subtracting the mean from each value and then dividing by the standard deviation. Once you have calculated the z-scores, you can identify outliers by looking for values that are greater than 3 or less than -3.

Another method for finding outliers is to use authority limits. Authority limits are defined as points that are more than 2 standard deviations from the mean in at least one direction. To find authority limits, you first need to calculate the z-scores for each observation in your data. You can do this by subtracting the mean from each value and then dividing by the standard deviation. Once you have calculated the z-scores, you can identify outliers by looking for values that are greater than 2 or less than -2 .

## Outlier Detection Techniques

There are dozens of outlier detection techniques ranging from simple statistical methods to more complex machine learning models. No technique is perfect and each has its own advantages and disadvantages. The key is to select the right technique for your data and your specific outlier detection task.

One of the simplest outlier detection techniques is to compute the z-score for each data point. The z-score is calculated as:

z = (x – μ) / σ

where x is the data point, μ is the mean, and σ is the standard deviation. Data points with a z-score greater than 3 or less than -3 are considered outliers.

Another common technique is to compute the median absolute deviation (MAD). The MAD is calculated as: