Bioinformatics Asked on October 3, 2021
I have around a hundred Fasta files (and will collect several thousand) with DNA sequences and +50x coverage. What is a recommended method to construct a phylogenetic tree? Solutions in Python or R are sought.
Biopython only handles already calculated trees.
The obvious single answer is R "ape". This will give you access to PhylML for tree building and Clustal/Muscle for alignment building. The paths to the binarys are important. There are several distance methods in there such as NJ and BIONJ. Its distance approaches however don't look mainstream, but I could be wrong.
There are functions within ape which are cool, the tree sorting is very cool and I need to read through this with much greater care. Personally I wouldn't perform a core phylogenetic analysis within R, because the standalones are sufficient and the analysis is intensive.
Answered by M__ on October 3, 2021
I would not look for a package for this, but instead build a small pipeline calling external tools with something like the following workflow:
Of course this is rather general and depending on exactly what you're doing you may want a different workflow and/or different tools. You should also explore the parameter space, do not assume the defaults are necessarily good choices
Answered by Chris_Rands on October 3, 2021
I agree with Chris Rands that a reasonable approach would be to call external tools.
However, if you really want to do the phylogeny from within Python, you could use the P4 package, which is a bit complicated to handle but gives you lots of options in the way to build MCMC-based bayesian phylogenies:
You would still need something else to align the sequences before.
To visualize the tree using python, you could use the ete toolkit, which is likely more powerful than what you can find in Biopython: http://etetoolkit.org/
Answered by bli on October 3, 2021
(if I understand your situation correctly)
https://www.rdocumentation.org/packages/seqinr/versions/3.6-1/topics/read.alignment shows how to use the function read.alignment which can take fasta msf etc. The docs provide the example'
read.alignment(file = system.file("sequences/LTPs128_SSU_aligned_First_Two.fasta", package = "seqinr"), format = "fasta", whole.header = TRUE) but you can use this code below (assumes those files are aligned) to go from reading the tree to getting the distances, producing the neighbor joining phylogenetic tree, and then plotting the tree.
library("Biostrings") library("seqinr") library("ape") library(phylogram) library("dendextend") fasta.res <- read.alignment(file = "geneticAlignment.msf", format = "fasta") fasta.res.dist.alignment = dist.alignment(msf.res, matrix = "identity") fasta.res.dist.alignment.nj = nj(fasta.res.dist.alignment) plot(fasta.res.dist.alignment.nj, main = "from fasta files")
Answered by Vass on October 3, 2021
1 Asked on March 24, 2021 by timd1
2 Asked on March 23, 2021 by whateversclever
1 Asked on March 22, 2021 by swa_mi
1 Asked on March 22, 2021 by nitha
1 Asked on March 20, 2021
2 Asked on March 19, 2021 by lazer-guided-lazerbeam
2 Asked on March 19, 2021 by celinedion
1 Asked on March 19, 2021 by user3390486
1 Asked on March 16, 2021 by maxno3
0 Asked on March 13, 2021 by mendel
0 Asked on March 13, 2021
1 Asked on March 13, 2021 by ryan-ward
0 Asked on March 12, 2021 by user257566
1 Asked on March 11, 2021
Get help from others!