Skip to the content.

Loss Functions

Loss functions for model accuracy and cost optimization

Loss functions are used to calculate how far is your prediction from the actual value . There are multiple ways of measuring it, and it very much depends on the context. Each function has its own advantage and depends on the problem being solved, and there is no one-size-fits-all function.

Mean Squared Error


Keras: mean_squared_error(y_true, y_pred)

Best constant which minimizes MSE: Target mean

Side Note: Root Mean Squared Error (RMSE)


Derivative of RMSE:


is one of the most widely used metric in linear regression problems, and is very popular owing to the simplicity of explaining itself. Explains the % of variability in data explained by the model.


Mean Absolute Error


Best Constant: Target Median

MAE is really good if your data has outliers and you want to take care of it. But if these are the unexpected values (not outliers) that we should care about, we need to use MSE

Mean Squared Percentage error (MSPE) & Mean Absolute Percentage Error (MAPE)



Mean Squared Logarithmic Error (MSLE) and Root Mean Squared Logarithmic Error

= =

We need to expontiate the value or prediction after converting it to log(y) + 1

Hinge Loss

  • SVM Classification Loss Function

Categorical Hinge

  • SVM Multi Class Classification Loss Function

Binary Crossentropy

  • Logistic Binary Classification Loss Function

Categorical Crossentropy

  • Logigtic Multi Class Classification Loss Function

Cross-entropy is commonly used to quantify the difference between two probability distributions. Usually the “true” distribution (the one that your machine learning algorithm is trying to match) is expressed in terms of a one-hot distribution.

For example, suppose for a specific training instance, the label is B (out of the possible labels A, B, and C). The one-hot distribution for this training instance is therefore:

Pr(Class A) Pr(Class B) Pr(Class C)
0.0 1.0 0.0

You can interpret the above “true” distribution to mean that the training instance has 0% probability of being class A, 100% probability of being class B, and 0% probability of being class C.

Now, suppose your machine learning algorithm predicts the following probability distribution:

Pr(Class A) Pr(Class B) Pr(Class C)
0.228 0.619 0.153

How close is the predicted distribution to the true distribution? That is what the cross-entropy loss determines. Use this formula:

Written on December 1, 2017