# MSE relevance as a metric when errors < 1

Data Science Asked by Gwalchaved on December 5, 2020

I’m trying to build my first models for regression after taking MOOCs on deep learning. I’m currently working on a dataset whose labels are between 0 and 2. Again, this is a regression task, not classification.

The low y values imply that the loss for each sample is quite low, always < 1. My question is then about the relevance of mse as a metric in such a case : since the loss is < 1, squaring it will result in an even smaller value, making the metric value drop very rapidly. In this case, would it be more relevant to use mae ? Or should I multiply the y values so that the order of magnitude of a sample loss would be > 1.

I found this nice article about regression metrics, but didn’t find the answer in it. Thanks for your help.

I'd use relative RMSE $$sqrt{frac{1}{n} sum frac{(Preicted - True)^2}{True^2}}$$. In this case, close to 0 implies a good model, regardless of the scale of the true values.

Similarly, you can try relative MAE.

Correct answer by Suren on December 5, 2020

If your only concern is small error values, why not simply scale the output by some constant?

• The idea would be to multiply all the actual values by some factor e.g. 10*y_actual
• Next, train your model on the scaled values.
• To make a prediction in the orginal rang you would have to scale back the outputs: y_scale_orginal = y_prediction / 10

Answered by Burger on December 5, 2020

MSE and Standard deviation

Mean squared error, shows us how much error we have over all our points. Indeed the goal is to reduce it, however, in your case, the error yielded would already be small.

One way to understand the relevance of your (MSE) RMSE is to compare it to the standard deviation.

Imagine having a standard deviation lower than your learned model's RMSE, therefore, if you take the mean as a value for all your predictions (X_test), it would be a better answer than trying to predict the value using your estimator.

In other words, imagine using a naive regressor, that gives all your points the mean value. If this estimator is yielding less RMSE than your model that should have learned something, then your model is very bad since the naive estimator beats it.

Start from this logic...

I would love you to think of what I said, however, if you lose hope in figuring it out check this.

Why not use MAE

MAE has its own benefits, therefore, using it randomly is useless. MAE is mostly used when we are dealing with data that has outliers or noise, therefore, we want to try to not give much importance to those spikes in magnitude.

MSE vs. MAE (L2 loss vs L1 loss) In short, using the squared error is easier to solve, but using the absolute error is more robust to outliers. But let’s understand why!

Answered by ombk on December 5, 2020

## Related Questions

### Imputing features with NA values in classification task

1  Asked on December 15, 2020

### Maximum Dimensionality of AWS Machine Learning

0  Asked on December 15, 2020 by 719016

### Training neural network to generate realistic terrain for video games

0  Asked on December 15, 2020 by max-walczak

### How to use Kaggle Api in Google Colab for directly using dataset?

1  Asked on December 15, 2020 by mozilla_firefox

### Transposed Convolution without using Python built-in functions

0  Asked on December 15, 2020 by chloe_ck

### What ML architecture fits fixed length signal regression?

1  Asked on December 15, 2020 by shay

### What’s the of all values above some percentile called? How do I get it in pandas?

1  Asked on December 15, 2020 by dankness

### Ingredients, Recipes and recipe ratings. I would like to predict the rating based on combination of ingredients

1  Asked on December 15, 2020 by threesunnydays

### Multilabel Classification – Overfitting?

0  Asked on December 15, 2020 by shepan6

### Working with few instances of specific target feature over large dataset

1  Asked on December 15, 2020 by asael-aiken

### How to evenly distribute data to multiple GPUs using Keras

1  Asked on December 15, 2020 by fengxu

### what is label shift?

1  Asked on December 14, 2020 by marzi-heidari

### How to use Variational Autoencoder’s μ and σ with user-generated z?

1  Asked on December 14, 2020 by unsure_automata

### Different results for LogisticRegression on python 2.7 and 3

2  Asked on December 14, 2020 by mutatos

### When to One-Hot encode categorical data when following Crisp-DM

1  Asked on December 14, 2020 by kjtheron

### Cannot fig out error in my gradient function implementation in python

1  Asked on December 14, 2020 by gaurang-swarge

### Minimizing error on unseen data

2  Asked on December 14, 2020

### Keras mnist.load_data() unshuffled?

1  Asked on December 14, 2020 by user4779

### What are the alternatives to Python + Spark (pyspark)?

2  Asked on December 14, 2020 by stackoverflower

### NER and context mapping

1  Asked on December 14, 2020 by skb

### Ask a Question

Get help from others!