TransWikia.com

In R, why do the p-values from anova() change when you add more predictors?

Cross Validated Asked by M. Smith on August 4, 2020

When conducting linear regression in R, I am trying to understand how certain p-values are calculated and what they represent. So far this is my understanding:

The p-values from summary() correspond to t-tests of the marginal impact of the variables in question, given all the other variables are already included. This uses Type III sum of squares.

The anova() function instead uses F-tests, which are sequential testing using the Type I sum of squares. For example, if we have the following output:

Analysis of Variance Table

Response: soma
          Df Sum Sq Mean Sq F value  Pr(>F)   
ht2        1  0.071  0.0710  0.1289 0.72073   
wt2        1  4.635  4.6349  8.4196 0.00504 **
ht9        1  3.779  3.7792  6.8651 0.01090 * 
Residuals 66 36.333  0.5505                   
---

The p-values are testing the significance of ht2 in the presence of the intercept only, of wt2 in the presence of only the intercept and ht2, and of ht9 in the presence of the intercept, ht2, and wt2.

Is this understanding correct? And if it is, then why do the p-values change when we add additional variables? For example:

Analysis of Variance Table

Response: soma
          Df  Sum Sq Mean Sq F value    Pr(>F)    
ht2        1  0.0710  0.0710  0.2072 0.6504835    
wt2        1  4.6349  4.6349 13.5353 0.0004772 ***
ht9        1  3.7792  3.7792 11.0363 0.0014695 ** 
wt9        1 14.0746 14.0746 41.1018 1.878e-08 ***
Residuals 65 22.2581  0.3424                      
---

Adding the wt9 variable decreased the p-value for ht2. But if this is just testing the significance of ht2 in the presence of nothing but the intercept, shouldn’t the p-value be identical?

Thanks in advance for any clarifications!

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP