TransWikia.com

Influence of trend on (supposedly) correlated time series

Data Science Asked by Viktor Katzy on October 1, 2021

TL;DR: What is the impact of a linear trend on the correlation between time series that are (most likely) not spuriously correlated?


I’m currently trying to reconstruct/cross-validate an analysis delivered by one of my companies contractors.

The data is based on time series of sensor data (approx. 3.5m timestamps). Goal was to find the signals with the highest correlation with one specific signal.

Despite not being an expert in data science I was able to reproduce their data cleaning (drop columns with zero variance, interpolate linearly over smaller gaps, drop remaining columns containing NaN-values). But after that I’m not sure if I can confirm their findings.

Seemingly they did a simple pearson-correlation like

corr = df.corrwith(df['DesiredSignal'])

Yet looking at the data the signals seem definitely trended.

When I then apply a detrend-function like

from scipy import signal

df_d = signal.detrend(df[column])
df_n = pd.DataFrame(data=df_d)

and apply the corrwith-function to this new dataframe I get totally different results (e.g. a significant higher amount of highly negativ correlations).

My Question now is: Can I trust the findings of the contractor or are they rendered invaild by not considering the influence of trends on correlation or am I getting something completly wrong?

One Answer

Q1: What impact does a linear trend have on the correlation between non-spurious time series?

The 4 main measures of correlation are Pearson, Kendall rank, Spearman and Point-biserial (the latter of which is not applicable for this type of problem). For simplicity, I'll only explain how it affects measuring Pearson correlation.

Let's assume $X$ represents a sinusoidal time series without trend: $x_t = sin(t)$, $Y$ represents $X$ with an upwards linear trend: $y_t = t + sin(t)$ and $Z$ represents $X$ with a downwards linear trend: $z_t = -(t + sin(t))$. All series have identical timestamps and have the same unit of measurement (for ease of plotting):

Line Plot of Time Series Signals

One of the assumption to measure Pearson correlation between two time series is called linearity, that when both series are plotted against one another on a scatter graph, there is a linear relationship:

Scatter Plot of Y vs X and Z vs Y

As you can see, $X$ and $Y$ do not satisfy this condition and so Pearson correlation is the incorrect statistical measure to use, whereas for $Y$ and $Z$ it is. Why though?

Pearson correlation measures the degree to which values deviate from the linear line of best fit between the two series provided. If the relationship is not linear, this relationship will not be measured accurately. This can be shown by plotting the Pearson correlation coefficient as $t$ increases:

Coefficient vs Time for XY and YZ

Notably, $X$ and $Y$ will also violate the assumption of a monotonic relationship for both Spearman and Kendall rank, and so you cannot measure correlation with any of these methods for $X$ and $Y$ unless transformations of the data are performed to satisfy the underlying assumptions - as you do in the question post.

Linear trends, therefore, don't have strictly positive or negative impacts on measuring correlation. You will just have to react accordingly to the underlying assumptions of the correlation measure you need to use.

Q2: Can I trust the findings of a contractor who doesn't take these type of trends into account?

To paraphrase Hanlon's razor:

It is better to assume ignorance than malicious intent.

If you provide your feedback, the analyst will have an opportunity to discuss through why they chose to pursue a certain route, give them a chance to realise what they did was incorrect, or that they misunderstood the requirements and/or limitations of the project.

Hopefully, this leads to a more positive outcome, given you want the best results and the analyst wants to provide the best service.

Correct answer by mwtmurphy on October 1, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP