TransWikia.com

Standardization with positive and negatives

Data Science Asked on September 5, 2021

I have a data set that has a few columns such as:

Total cost: mean = 3,000,000

Percent complete: mean = 50

final profit %: mean = 14

I know with such different orders of magnitude before I fit a linear regression I should standardize the data (using python and sklearn). The problem is there are negatives in this data that I need to keep so I don’t know which type of standardization I should use? The only two I am familiar with are log transformations and StandardScaler both of which I think get rid of negatives.

2 Answers

You can use Normalization. Normalization rescale your mean to 0 and standard deviation to 1 containing both positive and negative value.

$X_{Normalised} = frac{X - mu}{sigma}$

Here $mu$ is your original mean and $sigma$ is your standard deviation.

Answered by SrJ on September 5, 2021

You can still use StandardScaler() as it will keep the negative values. If you think you have a few outliers, and want to reduce their influence, you can also look at RobustScaler().

Answered by Donald S on September 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP