The distribution of dataset train and test are the differents, how to fix this?

Question

I am new in data science and like some help to understand my problem. For instance, I have two signals non-stationary for the same condition (figure 1).

I acquisition them at different times(in the morning and in the afternoon), when applying the Kolmogorov-Smirnov test, the null hypothesis was rejected, I don't understand why distribution is different if I no change any parameters in my system of acquisition.

This is the main trouble in my analysis because of this no get model any algorithm of machine learning to classification (overfitting).

I read something about the Covariate Shift (*I saw this post), and Kullback_Leiber Importance Estimation Procedure, but I don't know iff will really work out.

*'Different Test Set and Training Set Distribution'

Piotr Rarus - Reinstate Monica · Answer

how does it compare to longer timespans? Can you still observe noticeable drift?
to some extent this is usual behaviour
here is a guide for KL divergence, It's just a tool you can use to compare probability distributions.
exploration is equally important as hard stats. For me those plots look alright. There's no defined point, telling you when you can or cannot do things. That's whole point of ML, you don't have a priori knowledge. Best run some models and check validation metrics.

The distribution of dataset train and test are the differents, how to fix this?

One Answer

Add your own answers!

Ask a Question