Anamoly Detection Algorithms

Anamoly Detection is a class of semi-supervised (close to unsupervised) learning algorithm widely used in Manufacturing, data centres, fraud detection and as the name implies, anamoly detection. Normally this is used when we have a imbalanced classification problem, with, say, y=1(anamoly) is approx 20 and y=0 is 10,000. An example would be identifying faulty aircraft engines based on a wide number of parameters, where the anamolous data might not be available or if it is available, will be less than 0.1%.

Algorithm:

Suppose there are m training examples

Problem Statement : Is anamolous?

Approach:
Suppose are the features of the training examples

Model p(x) from the data; p(x) = ,
or
p(x) =

Identify unusual/anamolous examples by checking if p(x)<

Guassian Distribution

An assumption of the above model is that the features are distributed as per the Guassian Distribution with mean $\mu_j$ and variance $\sigma_j^2$. If the features are distributed in a different way apply transformations to convert them to the normal distribution.

Normal Distribution

~

=

Probability Density Function

Probability Density function

Cumulative Distribution Function

Cumulative Distribution function

Parameter Estimation:

=

Anamoly Detection Algorithm

  1. Choose features that you think might be indicative of anamolous examples
  2. Fit parameters using the formulae
  3. Given new example , compute as:

= =

Anamoly if p(x)<$\epsilon$

Example - Dividing data into Train, CV and Test Set

AD Example

Anamoly Detection vs Supervised Learning

Anamoly vs Supervised Learning

Written on March 24, 2018
]