are GSEA and other geneset enrichment analysis supposed to yield extremely different results between them?

Question

I have recently ran in R four geneset enrichment analysis in the same database (TCGA:breast cancer) comparing two intrinsic subtypes. The methods I used were:

MIGSA, that imports mGSZ package and combines it with a SEA algorithm. Using RNA-seq TMM normalized counts as input, because the software supports it. All the parameters are as default.
  mGSZ with standard parameters, using RNA-seq TMM normalized log2 cpm. All parameters as default
  FGSEA preranked, using log fold change as ranking metric, obtained from limma deg from RNA-seq TMM normalized counts. All the parameters are as default.
  GSVA + limma: extracted by-sample enrichment score from RNA-seq TMM normalized log2 cpm, then ran a differential expression analysis with geneset enrichment scores as if they were simple gene expression values. I know this is not a formal approach but I wanted to compare. All the parameters are as default.

The genesets tested are the full combined list of KEGG, GO:BP, GO:MF and GO:CC. In all cases I trimmed the genesets to genesets of length <500, which leaves a total of 21723 genesets tested.
Sadly, results are confusing. For starters mGSZ for some reason is cropping the output and giving results for only around 9700 of those genesets -I've already sent an email to the package mainteiner asking about that. But that isn't the biggest problem, the most shocking thing is that none of the analyses performedt has a good correlation to the results of any of the other analyses. Here I plotted the -log10(adjusted p-value + 0.0001) of each analysis agaisnt every other analysis, with lines indicating -log10(0.05)=1.3 and -log10(0.01)=2.

I was expecting the results to be somehow different, but here I can't see even something remotely close to a correlation between the different results. It is also evident that MIGSA and GSVA+limma output a lot more significatively enriched genesets than the other two methods.
I have gone through the code time and again, and I can't find anything errors. Also I have searched for studies doing this kind of direct comparison between methods, without luck so far. Has anybody experienced this kind of inconsistencies between gene set enrichment analysis results with different tools? What could I possible be doing wrong?.

llrs · Accepted Answer

Each of these methods do something different, so it is reasonable to expect different results. The bottom line is that there isn't a single question you can ask when you do an "enrichment test". For example: you can test if a set of genes are more expressed than this other list of genes or they are above the mean levels of any other gene.
Even more, you use the gene ontology, which has an underlying structure (directed acyclic graph) that no method you used takes into account but could modify the results if taken into account. However, the KEGG pathways do not have this structure. So when you want to apply a method mixing both data sources you need to decide if you want to use it or not.
This is also complicated because for most of these methods you can only compare the p-value which depends on the underlying assumptions of the methods (uniform, normal, binomial distribution?)
The latest comprehensive study comparing enrichment methods including some of the methods you used and more is this one (that I know of): Toward a gold standard for benchmarking gene set enrichment analysis.
Last, what would a correlation tell you here? It is know that there is little agreement between methods. You couldn't say that one method is equal to the other unless you tested with several datasets and you had always a correspondence on the p-value.

are GSEA and other geneset enrichment analysis supposed to yield extremely different results between them?

One Answer

Add your own answers!

Ask a Question