TransWikia.com

How to determine relationship categorical and numerical data

Cross Validated Asked by onhalu on January 9, 2021

I have two data frames and I would like to determine correlation or another type of relationship between them.

The first data frame is consumption of soft drinks weekly (in liters).

Soft drinks consumption

ID  WEEK1   WEEK2   WEEK3 ......  WEEK 28
1   1.8     2       2.5   ......  5
2   0.7     0.3     0.5   ......  1
.
.
.
1108 1      2       3    ......   1

So I have liters of soft drinks consumed on sample of 1108 people.

Than I have data frame that shows that a same person had an infection in actual week.(1 is infection, 0 no infection)

Infection
ID  WEEK1   WEEK2   WEEK3 ......  WEEK 28
1   1       0       0     ......  1
2   0       0       0     ......  0
.
.
.
1108 1      0       0    ......   0

Could you suggest me some method, test or type of linear regression, which I could use to determine relationship between consumption of soft drinks and infections?

I would like to use R software, so if you could suggest and methodology I would very appreciate that.

One Answer

You could build what is known as a classifier. Predict 'Infection' from 'Consumption'.

In statistics, we use a capital letter $P$ for probability, as your prior. For probability densities a small letter $p$ is used. Bayes rule is what you should basically compute:

$ begin{split} P(Infec mid Cons) =& &frac{p(Cons mid Infec) P(Infec)}{p(Cons mid Infec) P(Infec) + p(Cons mid neg Infec) P(neg Infec)} end{split} $

Here, the density $p(Cons mid Infec)$ can be the normal distribution of Comsumption for the infected cases, the density $p(Cons mid neg Infec)$ can be the normal distribution of Comsumption for the cases without infection. The terms $P(Infec)$ and $P(neg Infec)$ are the prior probabilities of being infected, or not.

The densities may be normally distributed, or follow a different continuous distribution. You can perform a discriminant analysis in the software R, in order to make these calculations. The better the predictions (the accuracy) the stronger the relationship between your two variables.

Answered by Match Maker EE on January 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP