TransWikia.com

Plotting distance tree from blastn output

Bioinformatics Asked on October 11, 2020

I’m trying to plot a simple distance tree of my blastn output with nj (like the tree view on NCBI). From what I understood, what I think I should do is

  1. extract all the hsps from each alignment
  2. re-align them using MUSCLE
  3. generate a distance matrix from the msa file
  4. plot the tree using nj

However, there is a couple problems I am not sure about:

1) When there are multiple hsps in one alignment then which one do I include in the tree?

2) If two sequences map to two different parts of the query sequence and do not overlap, then what should I do?

One Answer

Specific questions

  1. When there are multiple hsps in one alignment then which one do I include in the tree?

You can included them all because NJ is not computer intensive, so running loads is fine. You can remove identical sequences if you want either programmatically (these days I use pandas Python) or something like RAxML. Its honestly not critical for an NJ tree (and in fact the more the better) providing you can read the labels. A tree viewing program such as FigTree allows you to collapse clades to make the tree more readable to users.

  1. If two sequences map to two different parts of the query sequence and do not overlap, then what should I do?

Hmmmm.... good question. Do 3 alignments one with one e.g. 5' fragment, the other with the 3' fragment and attempt a joint alignment. Under other non-pairwise tree buidling algorithms this would be disasterous, NJ gets around the problem because it is pairwise. HOWEVER, ask yourself whether your feel the resulting tree is meaningful, I would argue not because there are no homologous sites between the 5' and 3' partial sequences and phylogeny models point mutations. NJ will give you a tree however and will even bootstrap it.

Generic questions, you can actually do this as one of the new features of NCBI's Blast

  • Go to Blast here, https://blast.ncbi.nlm.nih.gov/Blast.cgi
  • Enter your sequence into the box (it doesn't accept PDB codes alone)
  • Enter the protein database - when I first did this calculation I used SwissProt, thinking there would be alot of sequences - I then used "nr"
  • Under the algorithm parameters enter "50" (default is too many)
  • Hit "Blast"
  • Once the search is complete at the top of the page are the hyperlinks: "Other reports:
  • Search Summary [Taxonomy reports] [Distance tree of results]"
  • Click on "Distance tree of results"
  • The tree page will load automatically, automatically aligning your sequences and producing, in this case a parsimony based tree, but there is also the option of a nj tree (recommended) ....
  • Click "Tool", "Download", "PDF" ...

You can generate a distance and matrix and/or perform a neighbor-joining phylogeny in MEGA10 and using Timura 3-parameter model is a good model. The bootstrap option is important. Its very easy to use, the format is a bit weird

# MEGA
Title: Stuff
# Sequence1
ACTAGACGT
# Sequence2
ACCTTAGGA

etc ...

Answered by Michael on October 11, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP