# Activations Functions in a Neural Network

Activation functions are required in every neuron unit of a neural network layer to convert the parametrized equation ($W^Tx+b$)to a number which can be fed into the next layer. The backpropagation algorithm also uses the derivatives of these activation functions to propagate the error back into the networks, and minimize the final cost.

Activation functions should also overcome the problem of Vanishing Gradient. If hard max function is used as activation function, it induces the sparsity in the hidden units. ReLU doesn’t face gradient vanishing problem as with sigmoid and tanh function. Also, It has been shown that deep networks can be trained efficiently using ReLU even without pre-training.

### Details about the activation functions

#### Sigmoid Activation Function

Sigmoid function takes into account any number and outputs a number between 0 and 1. For very large +ve or -ve values it is 0 and 1 respectively, and is equal to 0.5 at x=0. This is quite useful in computing probability.

Function: $\sigma(x) = \frac{1}{1+e^{-x}}$

Derivative of Sigmoid: g(z) = $\sigma$(z) g’(z) = $\sigma$(z).(1-$\sigma$(z))

#### Rectified Linear Unit (ReLU) Activation Function

The advantage of ReLU is the speed of computation. Functions like sigmoid tend to be extremely slow due to the slowly converging derivative of the function when x is very large or very small. ReLU is extremely fast, and is now widely used in most of the layers and activation units.

ReLU is not defined for x=0, but it is Ok because you’ll hardly ever encounter the case where you’ll have to calculate the derivative at x=0. Even if it is the case the derivative can be calculated at (x+0.01) and results will still be accurate.

Function: f(x) = x if x>0 f(x) = 0 if x<0

#### Softmax Activation Function

Softmax function is the most widely used function in the output layer of the neural network when multi class classification needs to be carried out. The function takes input in $\Re^n$ and outputs in $\Re^n$. The function outputs between the range [0,1] and all the outputs will sum to 1

Function: $S_j = \frac{e^{a_j}}{\sum_{k=1}^N e^{a_k}}$

#### Tanh Activation Function

Function: $g_{tanh}(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$

Derivative: $g'_{tanh}(z) = 1 - tanh^2(z)$

Written on December 1, 2017
]