TransWikia.com

Question on ANOVA and Correlation/Association

Data Science Asked on July 16, 2021

I’ve been working on examining statistical relationships between variable:

  1. Pearsons, Spearman’s for continuous variables
  2. Kendall’s Tau, Cramer’s V for ordinal/nominal variables.

I know there’s many more ways. Recently I read about ANOVA and hypothesis testing. It seems similar to measuring correlation and association. In fact, I can’t tell if it is just another way of doing the same thing, or if it is something entirely different. Most explanations of ANOVA seem a bit more complicated than most explanations of correlation or association.

For example, I know that Pearson’s R is a measure of covariance scaled by standard deviation. And ANOVA stands for Analysis Of Variance. So it appears to me that it’s the same sort of thing. But I can’t tell 100% for sure.

Will someone please shed some light on this technique, what it is used for, and how it contrasts with measuring correlation?

One Answer

  • About what ANOVA is used for: it can answer whether the difference between the mean values for the data samples I have is due to randomness or is it statistcally significant. Then it is a significance-test that gives you an idea about whether your mean values are (statistically significantly) the same or not. A drawback is that it does not tell you which data sample/s differ from the rest or by how much (useful source). You can think of the process as follows (as described in Practical Statistics for data scientists):
  1. Combine all the data together in a single box
  2. Shuffle and draw out n resamples of m values each (where n is the number of data samples and m the number of data points in each sample)
  3. Record the mean of each of the n groups
  4. Record the variance among the n group means
  5. Repeat steps 2–4 many times (say 1,000) What proportion of the time did the resampled variance exceed the observed variance? This is the pvalue.
  • On the other hand, the direct measure of correlation gives you a number, it tells you by how much two data samples vary linearly along with each other

Correct answer by German C M on July 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP