# Loss Functions

### Loss functions for model accuracy and cost optimization

Loss functions are used to calculate how far is your prediction$(\hat y)$ from the actual value $(y)$. There are multiple ways of measuring it, and it very much depends on the context. Each function has its own advantage and depends on the problem being solved, and there is no one-size-fits-all function.

### Mean Squared Error

MSE = $\frac{1}{N}\displaystyle\sum_{i=1}^N (y_i - \hat y_i)^2$

Keras: mean_squared_error(y_true, y_pred)

Best constant which minimizes MSE: Target mean

#### Side Note: Root Mean Squared Error (RMSE)

RMSE = $\sqrt{\frac{1}{N}\displaystyle\sum_{i=1}^N (y_i - \hat y_i)^2 }$
or
RMSE = $\sqrt{MSE}$

Derivative of RMSE:
$\frac{\partial RMSE}{\partial \hat y_i} = \frac{1}{2\sqrt{MSE}}\frac{\partial MSE}{\partial \hat y_i}$

#### R-squared $R^2$

$R^2$ is one of the most widely used metric in linear regression problems, and is very popular owing to the simplicity of explaining itself. $R^2$ Explains the % of variability in data explained by the model.

$R^2 = 1 - \frac{\frac 1N\sum_{i=1}^N (y_i - \hat y_i)^2}{\frac 1N\sum_{i=1}^N (y_i - \bar y_i)^2}$ = $1 - \frac{MSE}{\frac 1N\sum_{i=1}^N (y_i - \bar y_i)^2}$

### Mean Absolute Error

MAE = $\frac1N \displaystyle\sum_{i=1}^N \lvert y_i-\hat y_i\rvert$

Best Constant: Target Median

MAE is really good if your data has outliers and you want to take care of it. But if these are the unexpected values (not outliers) that we should care about, we need to use MSE

### Mean Squared Percentage error (MSPE) & Mean Absolute Percentage Error (MAPE)

MSPE = $\frac{100\%}{N} \displaystyle\sum_{i=1}^N \Big(\frac{y_i - \hat y_i}{y_i} \Big)$

MAPE = $\frac{100\%}{N} \displaystyle\sum_{i=1}^N \lvert\frac{y_i - \hat y_i}{y_i}\rvert$

### Mean Squared Logarithmic Error (MSLE) and Root Mean Squared Logarithmic Error

MSLE = $\frac 1N\displaystyle\sum_{i=1}^N (log(y_i+1)-log(\hat y_i+1))^2$
RMSLE = $\sqrt{MSLE}$
= $\sqrt{\frac 1N\displaystyle\sum_{i=1}^N (log(y_i+1)-log(\hat y_i+1))^2}$
= $RMSE(log(y_i+1),log(\hat y_i+1))$ = $\sqrt{MSE(log(y_i+1),log(\hat y_i+1))}$

We need to expontiate the value or prediction after converting it to log(y) + 1

### Hinge Loss

• SVM Classification Loss Function

### Categorical Hinge

• SVM Multi Class Classification Loss Function

### Binary Crossentropy

• Logistic Binary Classification Loss Function

### Categorical Crossentropy

• Logigtic Multi Class Classification Loss Function

Cross-entropy is commonly used to quantify the difference between two probability distributions. Usually the “true” distribution (the one that your machine learning algorithm is trying to match) is expressed in terms of a one-hot distribution.

For example, suppose for a specific training instance, the label is B (out of the possible labels A, B, and C). The one-hot distribution for this training instance is therefore:

 Pr(Class A) Pr(Class B) Pr(Class C) 0.0 1.0 0.0

You can interpret the above “true” distribution to mean that the training instance has 0% probability of being class A, 100% probability of being class B, and 0% probability of being class C.

Now, suppose your machine learning algorithm predicts the following probability distribution:

 Pr(Class A) Pr(Class B) Pr(Class C) 0.228 0.619 0.153

How close is the predicted distribution to the true distribution? That is what the cross-entropy loss determines. Use this formula:

Written on December 1, 2017
]