How many single copy gene are needed to construct the phylogenetic tree?

Bioinformatics Asked on December 18, 2020

I ran the orthomcl program with genomes of the whole streptomyces genus(from ncbi) as input, but I only got 22 orthologous groups(streptomyces’s genome size vary from 2M to 15M, which shows the great difference among them). And the analyst from the bioinformatics company told me that my result wasn’t reasonable if I try to construct the phylogenetic tree with the 22 orthologous groups I got. He also told me, I’d better use almost 1000 orthologous groups to build the tree. My question is, how to know my orthologous groups are enough to construct a tree?

The one thing that strikes me is that OrthoMCL was written under David Roos supervision, which really means it was written with Toxoplasma gondii in mind, i.e. a eukaryotic parasite with notable levels of gene duplication. They tested it against Plasmodium falciparum, but that isn't surprising because both are members of the Apicomplexa. Personally I would suggest looking at a tool more orientated towards bacteria, albeit I'd like to see the output of OrthoMCL.

Prokka is a tool that might give better results, here, which is designed for bacteria, but personally I would extract the orthologes/homologues directly if the primary purpose is to contruct a phylogeny, i.e. I would parse annotated genes which are not "hypothetical".

