AnswerBun.com

How to create Phylogenetic Trees from fasta files in Python or R?

Bioinformatics Asked on October 3, 2021

I have around a hundred Fasta files (and will collect several thousand) with DNA sequences and +50x coverage. What is a recommended method to construct a phylogenetic tree? Solutions in Python or R are sought.

I found Phylo from Biopython only handles already calculated trees.

4 Answers

The obvious single answer is R "ape". This will give you access to PhylML for tree building and Clustal/Muscle for alignment building. The paths to the binarys are important. There are several distance methods in there such as NJ and BIONJ. Its distance approaches however don't look mainstream, but I could be wrong.

There are functions within ape which are cool, the tree sorting is very cool and I need to read through this with much greater care. Personally I wouldn't perform a core phylogenetic analysis within R, because the standalones are sufficient and the analysis is intensive.

https://cran.r-project.org/web/packages/ape/ape.pdf

Answered by M__ on October 3, 2021

I would not look for a package for this, but instead build a small pipeline calling external tools with something like the following workflow:

  • Cluster the ~100 sequences with CD-HIT-EST/PSI-CD-HIT or many other options
  • Take all the sequences that form one individual cluster and build a multiple sequence alignment (MSA) with MAFFT/ClustalOmega or similar
  • Take the MSA and build a phylogenetic tree with a Maximum-Likelihood approach like iqtree or similar
  • Visualize the tree file with Jalview or similar

Of course this is rather general and depending on exactly what you're doing you may want a different workflow and/or different tools. You should also explore the parameter space, do not assume the defaults are necessarily good choices

Answered by Chris_Rands on October 3, 2021

I agree with Chris Rands that a reasonable approach would be to call external tools.

However, if you really want to do the phylogeny from within Python, you could use the P4 package, which is a bit complicated to handle but gives you lots of options in the way to build MCMC-based bayesian phylogenies:

https://github.com/pgfoster/p4-phylogenetics

You would still need something else to align the sequences before.

To visualize the tree using python, you could use the ete toolkit, which is likely more powerful than what you can find in Biopython: http://etetoolkit.org/

Answered by bli on October 3, 2021

(if I understand your situation correctly) https://www.rdocumentation.org/packages/seqinr/versions/3.6-1/topics/read.alignment shows how to use the function read.alignment which can take fasta msf etc. The docs provide the example' read.alignment(file = system.file("sequences/LTPs128_SSU_aligned_First_Two.fasta", package = "seqinr"), format = "fasta", whole.header = TRUE) but you can use this code below (assumes those files are aligned) to go from reading the tree to getting the distances, producing the neighbor joining phylogenetic tree, and then plotting the tree.

library("Biostrings")
library("seqinr")
library("ape")
library(phylogram)
library("dendextend")

fasta.res <- read.alignment(file = "geneticAlignment.msf", format = "fasta")
fasta.res.dist.alignment = dist.alignment(msf.res, matrix = "identity")
fasta.res.dist.alignment.nj = nj(fasta.res.dist.alignment)
plot(fasta.res.dist.alignment.nj, main = "from fasta files")

Answered by Vass on October 3, 2021

Add your own answers!

Related Questions

Block wise protein imputation

2  Asked on March 23, 2021 by whateversclever

     

RAD Seq Data Analysis without barcode

2  Asked on March 20, 2021 by biobash

   

FASTA and PDB: How to specify chain?

2  Asked on March 19, 2021 by lazer-guided-lazerbeam

     

How can I use my Myheritage DNA results file for further analysis?

1  Asked on March 19, 2021 by user3390486

   

Within and between sample count normalization

1  Asked on March 16, 2021 by maxno3

     

Too slow issue of BioMart

1  Asked on March 12, 2021 by user224050

   

Obtaining Whole Genetic Sequence

2  Asked on March 11, 2021

     

Ask a Question

Get help from others!

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir