# Anamoly Detection Algorithms

Anamoly Detection is a class of semi-supervised (close to unsupervised) learning algorithm widely used in Manufacturing, data centres, fraud detection and as the name implies, anamoly detection. Normally this is used when we have a imbalanced classification problem, with, say, y=1(anamoly) is approx 20 and y=0 is 10,000. An example would be identifying faulty aircraft engines based on a wide number of parameters, where the anamolous data might not be available or if it is available, will be less than 0.1%.

### Algorithm:

Suppose there are m training examples

Problem Statement : Is anamolous?

Approach:

Suppose are the features of the training examples

Model p(x) from the data; p(x) = ,

or

p(x) =

Identify unusual/anamolous examples by checking if p(x)<

### Guassian Distribution

An assumption of the above model is that the features are distributed as per the Guassian Distribution with mean $\mu_j$ and variance $\sigma_j^2$. If the features are distributed in a different way apply transformations to convert them to the normal distribution.

**Normal Distribution**

~

=

Probability Density Function

Cumulative Distribution Function

Parameter Estimation:

=

### Anamoly Detection Algorithm

- Choose features that you think might be indicative of anamolous examples
- Fit parameters using the formulae
- Given new example , compute as:

= =

Anamoly if p(x)<$\epsilon$

**Example - Dividing data into Train, CV and Test Set**

**Anamoly Detection vs Supervised Learning**

*Source material from Andrew NG’s awesome course on Coursera. The material in the video has been written in a text form so that anyone who wishes to revise a certain topic can go through this without going through the entire video lectures.*