Bioinformatics Asked by RMM on September 30, 2021

How would one determine the significance of a variable in a glm model?

If I, for example, have a dataframe like seen below, how would I determine if the origin of the sample has a significant effect on the value? (this is the number of enzymes capable of degrading the substrate f that matters)

```
Substrate variable value origin
cellulose M09 8 free
mannan M12 2 free
glycogen M65 2 free
chitin M87 4 free
cellulose M90 2 isolate
manan M78 1 isolate
glycogen M21 4 isolate
chitin M21 1 isolate
```

So far I have tried:

```
mcomp = glm.nb(value ~ origin, data = my_data)
summary(mcomp)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9625 -0.9047 -0.9047 0.1212 3.5232
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.01657 0.06571 -0.252 0.80097
originisolate -0.21911 0.08180 -2.679 0.00739 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.3418) family taken to be 1)
Null deviance: 2053.5 on 2679 degrees of freedom
Residual deviance: 2046.3 on 2678 degrees of freedom
AIC: 6517.5
Number of Fisher Scoring iterations: 1
Theta: 0.3418
Std. Err.: 0.0186
2 x log-likelihood: -6511.4590
```

So free becomes the intercept and then isolate if significantly different from that. Does this mean Origin has a significant effect on the value?

Would the better approach be to do the following?:

```
mcomp = glm.nb(value ~ origin + Substrate, data = comb_data)
summary(aov(mcomp))
Df Sum Sq Mean Sq F value Pr(>F)
origin 1 23 22.55 6.612 0.0102 *
Substrate 44 1445 32.84 9.631 <2e-16 ***
Residuals 2634 8981 3.41
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
```

This shows me that origin and substrate have an effect on value if I understand correctly?

There is no better method, it's a matter of what you want to test or what is your question.

Using the `anova()`

or `aov()`

, test the terms collectively. For example, in your example with Substrate, the null hypothesis is that the coefficients are all zero, meaning `cellulose =0, mannan =0 , ....`

If the question is, "do the isolate samples have a higher value than origin samples?", then you can use your first model, where `free`

is set as the reference and you test whether the effect of `isolate`

is non-zero. Likewise you can do this for substrate and set of them as your reference. You can also do other pairwise comparisons using this model.

If the question is, "does origin have a significant effect on value, after controlling for substrate?", then you can use your second model.

Answered by StupidWolf on September 30, 2021

Second viewing of the question from what I can see -0.22 as a coefficient of origin is a strong negative association, so yeah it has a major impact. Its not how I would have done it, but that looks to be the result.

First viewing,

I'm going to throw my hat in here. We don't know what 'origin' is about, anyway just throw everything, i.e. each substrate and the origin into the same regression calculation. Check for a low-residual and preferably do a Q-Q plot, transform your data it this doesn't look good.

The key and the thing you are missing is your regression weights, without that I couldn't say very much. If the regression weight is near zero for 'origin' then it has zero impact. If the regression weight of 'origin' is positively greater than everything else ... I assume there are skewed distributions of 'substrates' between the 'origins'. If the regression weight of 'origin' is negative but still greater than all other regression weights then it is adversely affecting the 'value' you are seeking.

I don't know the experiment, the biological system or really the 'substrate' assays, so I can't comment any further.

The two issues I have are:

- Doing an ANOVA on the output of a regression analysis doesn't make much sense to me. It is not something I would do, nor something in ML or GLM I've encountered.
- Are you doing pairwise substrate/origin calculations? I presue not, but just in case this not how GLM works.

Answered by M__ on September 30, 2021

1 Asked on March 24, 2021 by timd1

2 Asked on March 23, 2021 by whateversclever

1 Asked on March 22, 2021 by swa_mi

1 Asked on March 22, 2021 by nitha

1 Asked on March 20, 2021

2 Asked on March 19, 2021 by lazer-guided-lazerbeam

2 Asked on March 19, 2021 by celinedion

1 Asked on March 19, 2021 by user3390486

1 Asked on March 16, 2021 by maxno3

0 Asked on March 13, 2021 by mendel

1 Asked on March 13, 2021 by ryan-ward

0 Asked on March 12, 2021 by user257566

1 Asked on March 11, 2021

Get help from others!

Recent Answers

- Jon Church on Why fry rice before boiling?
- haakon.io on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?
- Joshua Engel on Why fry rice before boiling?

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir