TransWikia.com

MSA (protein) with biopython or something else?

Bioinformatics Asked by CuriousTree on June 24, 2021

I am very new to bioinformatics (and python in general), but I would like to use python to more efficiently analyse enzymes both in terms of structure and functio, using Jupiter notebook. I would like to ask what is the best program/source code for multiple sequence alignments (amino acids) to identify conserved binding sites etc. I see that biopython has a few ways of creating alignments, but I have the impression that it is more focused on nucleotide sequences?

One Answer

In my personal experience, MUSCLE is the easiest program to use in conjunction with Biopython. Biopython features a command line wrapper for this program, which makes it very easy to use. Make sure to download the appropriate MUSCLE program from drive5 and save it somewhere. E.g., if you are using Jupyter in Linux:

!wget https://www.drive5.com/muscle/downloads3.8.31/muscle3.8.31_i86linux64.tar.gz
!tar -xzvf muscle3.8.31_i86linux64.tar.gz
!cp muscle3.8.31_i86linux64 /usr/local/bin
!chmod 755 /usr/local/bin/muscle3.8.31_i86linux64

Then, you can run MUSCLE like so:

from Bio.Align.Applications import MuscleCommandline
def runMUSCLE(infile, outfile):
    muscle_exe = r"/usr/local/bin/muscle3.8.31_i86linux64" #Here is where we installed MUSCLE
    muscle_cline = MuscleCommandline(muscle_exe,
                                     input=infile,
                                     out=outfile,
                                     clwstrict=True #Output in clustal format (more visually pleasing), otherwise the output is in FASTA. Whichever you need.
                                    )
    muscle_cline()

You don't need to specify that your sequence is amino acidic, however keep in mind that the input file must be in FASTA format.

Answered by albertr on June 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP