AnswerBun.com

Can features that are the same in every sample contribute to learning?

Data Science Asked on January 4, 2022

For simplicity, let’s say that I am monitoring 4 sensors for an ongoing metric.

The first column is the sensor ID and the second column is the sensor type.

[
  [
    [0, 0, 0.123],
    [1, 0, 0.456],
    [2, 1, 0.789],
    [3, 1, 0.555]
  ],
  [
    [0, 0, 0.987],
    [1, 0, 0.654],
    [2, 1, 0.321],
    [3, 1, 0.666]     
  ],
  [
    [0, 0, 0.591],
    [1, 0, 0.824],
    [2, 1, 0.760],
    [3, 1, 0.888]      
  ]
]

If the first two columns are always the same values, will a CNN or an LSTM be able to learn from these columns or are they just redundant?

In my mind, the sensor ID could correspond with a postion on the map where different metrics are observed. Or the sensor type could correspond with some sort of sensitivity in the metric. But am I just kidding myself if they are the same in every sample?

I don’t want to provide unnecessary dimensionality to the model.

One Answer

It has no value if it is same for all the training set.

Let's say, you are using a global health dataset for Life expectancy then country code can be a useful feature. It might contain hidden information.


But if you are doing the same analysis for one country e.g. India, keeping a feature country which has only one value e.g. India, will be of no use.
It will not show any variance, now the variance is shifted to States instead of Country compared to the last example.

In you data, I can see multiple Sensor Id. If very dataset will always have these 4 values and there is no explicit assumption for 3rd column which is dependent on 1st, you can remove the feature

Answered by 10xAI on January 4, 2022

Add your own answers!

Related Questions

Drop NA values in an imported xlxs document with R

3  Asked on August 20, 2021 by chris-kehl

 

Marginalization of joint distribution

1  Asked on August 20, 2021 by jackt153

   

Ask a Question

Get help from others!

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP