TransWikia.com

Preprocessing: StandardScaler() Do we really need mean to be zero?

Data Science Asked by Shivam Arora on March 18, 2021

For instance, many elements used in the objective function of a
learning algorithm (such as the RBF kernel of Support Vector Machines
or the l1 and l2 regularizers of linear models) assume that all
features are centered around zero and have variance in the same order.

This is from the scikit-learn

Can someone please specify what elements they are referring when it is said the mean is expected to be zero.

I understand that the variance should be in similar range for the algorithms to give same significance to each feature. But is there anything also necessarily expecting zero mean in the data?

In other words, if I know variance of each feature is already in same range, can something still go wrong as it says some "elements" expect zero mean.

2 Answers

The reason is the same.
I assume you understand how the Features at a very different scale can create issue

But just scaling will not always bring them on the similar scale because Standard Deviation is dependent on the Range of the Feature.
So, if a feature is very large but in a small range then simply scaling it will not help.

Let's check an example with two features

import numpy as np
np.set_printoptions(precision=2)

feat_A = np.array([1,2,3,4,5])
feat_B = np.array([10000000,10000001,10000002,10000003,10000005])

# Case I  - Only Std
print(feat_A/feat_A.std())
print(feat_B/feat_B.std())

# Case II - Mean and Std
print((feat_A-feat_A.mean())/feat_A.std())
print((feat_B-feat_B.mean())/feat_B.std())

Output

Case - I - [You may see, this doesn't solve the problem]

[0.71 1.41 2.12 2.83 3.54]
[5812381.94 5812382.52 5812383.1 5812383.68 5812384.84]

Case - II - [This is what we were looking for]

[-1.41 -0.71 0. 0.71 1.41]
[-1.28 -0.7 -0.12 0.46 1.63]

Answered by 10xAI on March 18, 2021

It seems very unlikely that centering would hurt, and so I'd suggest just to do it anyway.

Theoretically, in a generalized linear model with regularization, no, centering won't change anything. This is because the intercept term can absorb any changes; shifting $x$ by 100 can simply be rewritten: $$ 15 + 0.2*(x-100) = 15 - 0.2cdot100 + 0.2x = -5 + 0.2x,$$ so that essentially the same model exists for the shifted data with the same coefficient penalty. Of course, that suggests the first nontrivial example where failing to center will hurt performance: if you don't fit an intercept term!

However, in trying to test that in sklearn, I have some troubles (notebook): scaling with and without centering gives different results in penalized logistic regression, depending on the solver! (saga gives different results, but lbfgs gives nearly-identical coefficients.) I'm not sure yet if this is some numerical issue (which I have seen before, but with datasets of much more varying scales) or a misunderstanding on my part. Penalized linear regression seems to work fine.

Finally, to SVMs with the rbf kernel. My understanding of nonlinear-kenerl SVMs is rather limited, but the rbf kernel should be invariant to centering. However, see Data Centering in Feature Space (pdf) which posits that centering in the kernel space can be helpful. Whether for theoretical or numerical reasons, both the classification and regression versions in the above-linked notebook show different results after centering.

Answered by Ben Reiniger on March 18, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP