TransWikia.com

Given a .gb file and a locus - how to get relevant annotations in Python?

Bioinformatics Asked on April 13, 2021

Given a .gb file and a specific locus in the genome – how can I retrieve the relevant annotations in Python (i.e., annotations that include that locus)?

I could retrieve the features using:

SeqIO.read('my_gb_file.gb', 'gb').features

and then scan them to find the relevant ones, but it feels like reinventing the wheel.

Is there a function in Biopython that does that?
Or in any other well-maintained package?

One Answer

Biopython is the main package for this. It's only a few lines, so it is not reinventing the wheel. Unfortunately, this feels like a homework question, so nobody can write the code for you as per no homework policy.

But pointers are okay...

So you want to iterate across the features table (a list in Biopython) of a record and find the case where the qualifier['locus'][0] matches your query.

The things to watch out for are:

  • filter by type (CDS?)
  • entries where there is not key among the qualifiers, so add error catching (try: ... except Exception: pass
  • the values of the qualifiers of a feature are a list. so add a [0]
  • multithreading.Pool might help if speed is a worry (which it really shouldn't). Asyncio and thread are not the way (both single core).

Answered by Matteo Ferla on April 13, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP