# The distribution of dataset train and test are the differents, how to fix this?

Data Science Asked by Natalie Chaves on September 5, 2020

I am new in data science and like some help to understand my problem. For instance, I have two signals non-stationary for the same condition (figure 1).

I acquisition them at different times(in the morning and in the afternoon), when applying the Kolmogorov-Smirnov test, the null hypothesis was rejected, I don’t understand why distribution is different if I no change any parameters in my system of acquisition.

This is the main trouble in my analysis because of this no get model any algorithm of machine learning to classification (overfitting).

I read something about the Covariate Shift (*I saw this post), and Kullback_Leiber Importance Estimation Procedure, but I don’t know iff will really work out.

• how does it compare to longer timespans? Can you still observe noticeable drift?
• to some extent this is usual behaviour
• here is a guide for KL divergence, It's just a tool you can use to compare probability distributions.
• exploration is equally important as hard stats. For me those plots look alright. There's no defined point, telling you when you can or cannot do things. That's whole point of ML, you don't have a priori knowledge. Best run some models and check validation metrics.

Answered by Piotr Rarus - Reinstate Monica on September 5, 2020

## Related Questions

### How to evaluate the “importance” of a variable in a function

1  Asked on May 4, 2021 by allan-araujo

### Weighting training data from time-series

0  Asked on May 4, 2021 by asteve

### Why Continous Variable Buckets Overfitting model

1  Asked on May 4, 2021 by user172500

### Number of parameters in an LSTM model

5  Asked on May 4, 2021 by wabbit

### How do you see the element of a csv table with many columns (>30) which the names of its columns is more than 10 character in pandas?

1  Asked on May 4, 2021

### Data normalization in nonstationary data classification with Learn++.NSE based on MLP

1  Asked on May 4, 2021 by alexander-okunev

### ML regression poor performance

2  Asked on May 4, 2021 by henryhub

### Tool to label images for classification

6  Asked on May 4, 2021 by jlarsch

### TensorFlow – Image Vectorization – Add New Files

0  Asked on May 4, 2021 by shubham

### Clarify recurrent neural networks

1  Asked on May 4, 2021

### Equation of a Multi-Layer Perceptron Network

1  Asked on May 4, 2021 by vasco-ferreira

### Predicting the missing word using fasttext pretrained word embedding models (CBOW vs skipgram)

1  Asked on May 4, 2021 by kingstar

### DIGITS Docker container not picking up GPU

1  Asked on May 4, 2021 by bojan-komazec

### Fast k-means like algorithm for $10^{10}$ points?

2  Asked on May 3, 2021

### “Hand pose estimation in-the-wild” vs Normal hand pose estimation

0  Asked on May 3, 2021 by samantha92

### How Adaboost calculates error for each weak learner in training?

1  Asked on May 3, 2021 by heresthebuzz

### Faster Data Transfer?

1  Asked on May 3, 2021 by andrew-maurer

### Bert-Transformer : Why Bert transformer uses [CLS] token for classification instead of average over all tokens?

1  Asked on May 3, 2021

### Parking Prediction based on Mobile application

1  Asked on May 3, 2021

### Different representations of dendrograms

1  Asked on May 3, 2021 by noppawee-apichonpongpan