TransWikia.com

Reading values from a column into a variable and then correlating using Python

Data Science Asked by Taylrl on August 13, 2021

I want to be able to correlate values from various IDs, where the date is the same with one another.

The data looks something like this;

ID      Time(secs)   Date
AAAA    1            01/01/1990
AAAA    6            02/01/1990
AAAA    5            03/01/1990
AAAA    2            04/01/1990
AAAA    4            05/01/1990
BBBB    2            01/01/1990
BBBB    4            02/01/1990
BBBB    6            03/01/1990
BBBB    3            04/01/1990
CCCC    3            01/01/1990
CCCC    4            02/01/1990
CCCC    1            03/01/1990
CCCC    6            04/01/1990
DDDD    7            01/01/1990
DDDD    4            02/01/1990
DDDD    5            03/01/1990
DDDD    3            04/01/1990

I want to find the correlation coefficient between each combination of these IDs, where there are matching dates. N.B. Not all of the IDs have the same date;

ID             CorrCoef
AAAA>BBBB      ????
AAAA>CCCC
AAAA>DDDD
BBBB>CCCC
BBBB>DDDD
CCCC>DDDD

I think I need to feed the data from each ID into a variable and then run the following;

data1.corr(data2)

One Answer

If i understand correctly you can do something like this:

In [42]: df
Out[42]:
      ID  Time(secs)        Date
0   AAAA           1  01/01/1990
1   AAAA           6  02/01/1990
2   AAAA           5  03/01/1990
3   AAAA           2  04/01/1990
4   AAAA           4  05/01/1990
5   BBBB           2  01/01/1990
6   BBBB           4  02/01/1990
7   BBBB           6  03/01/1990
8   BBBB           3  04/01/1990
9   CCCC           3  01/01/1990
10  CCCC           4  02/01/1990
11  CCCC           1  03/01/1990
12  CCCC           6  04/01/1990
13  DDDD           7  01/01/1990
14  DDDD           4  02/01/1990
15  DDDD           5  03/01/1990
16  DDDD           3  04/01/1990

In [43]: df.pivot(index='Date', columns='ID', values='Time(secs)').corr()
Out[43]:
ID        AAAA      BBBB      CCCC      DDDD
ID
AAAA  1.000000  0.778924 -0.336336 -0.368964
BBBB  0.778924  1.000000 -0.609449 -0.257143
CCCC -0.336336 -0.609449  1.000000 -0.609449
DDDD -0.368964 -0.257143 -0.609449  1.000000

Virtual helper DF:

In [44]: df.pivot(index='Date', columns='ID', values='Time(secs)')
Out[44]:
ID          AAAA  BBBB  CCCC  DDDD
Date
01/01/1990   1.0   2.0   3.0   7.0
02/01/1990   6.0   4.0   4.0   4.0
03/01/1990   5.0   6.0   1.0   5.0
04/01/1990   2.0   3.0   6.0   3.0
05/01/1990   4.0   NaN   NaN   NaN

Answered by MaxU on August 13, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP