How to define an outgroup to build a robust amino acid tree

I building a robust amino acid phylogeny with adequate robustness support (see previous post here).

This is a brief summary of what I have done:

  1. I performed a BLASTp analysis using a protein sequence from species A as query against the full NCBI protein database,
  2. downloaded all the hits in fasta format and I added the sequences I identified with the query sequence in my target genome, which is closely related to species A.
  3. I then aligned the sequences using the Compute – Alignment – MUSCLE option in Seqotron and I saved my alignment file in Phylip format.
  4. I tried to run this command line in RAxML to build a robust maximum likelihood tree:

raxmlHPC -m PROTGAMMAILG -n output.tre -o outgroup-p 10000 -s ~/Desktop/alignment_file.phy

However, the program showed the following error:

Error, the outgroup name "outgroup-p" you specified can not be found in the alignment, exiting ….

Question how can I build an outgroup for my phylogenetic tree, and how do I include it in my alignment?

I appreciate your help and suggestions!

2 Answers

I think you have a bug

raxmlHPC -m PROTGAMMAILG -n output.tre -o outgroup -p 10000 -s /Users)/username/Desktop/alignment_file.phy

Try the above, where username is your username. I'll check my codes later. The most important bit is the space between the -p and word outgroup. You can omit the -p 10000 it is only needed for very geeky maximum likelihood and sets the random number stream.

Correct answer by M__ on September 25, 2021

To clear up what may be a possible misunderstanding about RAxML: RAxML (and most/all other maximum likelihood phylogenetic inference programs using reversible models of sequence evolution) explicitly infer unrooted trees. This is because, for a reversible model of sequence evolution, changing the position of the root does not affect the likelihood of the tree. Therefore, as you will see in the RAxML manual, specifying an outgroup is explicitly a display option - it doesn't affect the inference in any way. This means that it's usually easier to just infer the tree in RAxML without specifying any outgroup, and then use any tree viewing software (e.g. FigTree, phylotree.js, or equivalent) to reroot the tree as you need to.

Answered by NatWH on September 25, 2021

