TransWikia.com

P-value correction when evaluating correlation between gene and miRNA expression

Bioinformatics Asked on August 19, 2021

First of all I apologize without the question is very basic, I am taking my first steps in bioinformatics.

Data information

We are evaluating the correlation (using the Pearson, Kendall or Spearman method) between gene expression and miRNA expression using the corAndPvalue function of WCGNA.

The resulting structure would be a DataFrame containing all combinations between each gene with each miRNAs, containing the following columns:

Gene     miRNA      Correlation P-value
Gen_1    miRNA_1    0,959       0.00311
Gen_1    miRNA_2    -0,039      0.1041
Gen_1    miRNA_3    -0,344      0.0021
Gen_2    miRNA_1    0,1333      0.00451
Gen_2    miRNA_2    0,877       0.07311
...

Question

Considering the huge number of correlation tests we are going to evaluate, we need to adjust the p-values to avoid correlations due to chance. Bonferroni does not seem to be the best solution, so we would use Benjamini-Hochberg method (BH). The question is:

The BH correction for the Gen_1 x miRNA_1 combination, should consider the p-values of all combinations that include Gen_1 (Option 1), or should consider all the p-values of all the genes x miRNA combinations (Option 2)?

For example, let’s assume an expression dataset of 20,000 genes and another of 15,000 miRNAs

Option 1:

To adjust Gen_1 x miRNA_1 we would use 15,000 p-values (Gen_1 x miRNA_1, Gen_1 x miRNA_2, …, Gen_1 x miRNA_15000).

Option 2:

To adjust Gen_1 x miRNA_1 we would use 300,000,000 p-values (Gen_1 x miRNA_1, Gen_1 x miRNA_2, …, Gen_1 x miRNA_15000, Gen_2 x miRNA_1, Gen_2 x miRNA_2, …, Gen_2 x miRNA_15000 and so on).

Suplementary question

Documentation of the method fdrcorrection from Python Statsmodels library suggests that for negative correlations (that could be frequent in a mRNA x miRNA correlation analysis) Benjamini-Yekutieli would work better; is that right? Or Benjamini-Hochberg method would be appropiated for this case?

Any kind of help would be much appreciated, thanks in advance!

One Answer

I made the same question in CrossValidated forum and got an excellent answer!

The important part:

You need to correct for all of the comparisons you are doing. So if that's 300,000,000 comparisons you need to correct for that many multiple comparisons.

For more information check the answer in the link above

Correct answer by Genarito on August 19, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP