TransWikia.com

Finding annotated counterpart after BLASTn with efetch (Biopython)

Bioinformatics Asked by MWP on April 18, 2021

I am creating a pipeline for the identification of unknown transcripts. After a local BLASTn search of the transcripts, I have a large list of the respective hits with different genomes. I have the accession code of the respective genome, start/end coordinates, and the strand where the hit is found. Now, I would like to find if the genome where the hit was found is annotated within the coordinates returned by BLAST.

I have found the possibility of using efetch to search for each genome via Biopython, but this only returns the whole annotation, even though I limit the search to the specific coordinates with:

Entrez.efetch(db="nuccore", id="CP054847.1", strand="2", seq_start="326053", seq_end="326786", retmode="xml") 

Of course, I could parse the XML and look for possible annotations within the given range, but not only would this be quite computationally intense since I expect around 10.000+ hits to be searched, but it also seems to return the same results for the plus/minus strand, even though the annotations should be strand-specific.

I have also intended to use the standalone tool of efetch, but it would be easier to integrate the search to the pipeline using Biopython.

Do you have any ideas why Entrez.efetch is returning the whole xml or is there any fast way of filtering the output? Do you have maybe another alternative/idea for this step? I would appreciate any suggestion!

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP