Bioinformatics Asked by Mauri1313 on April 27, 2021
I have a text file that contains a list of IDs (314 sequences):
AVP78031.1
AVP78042.1
ATO98108.1
ATO98120.1
ATO98132.1
...
My goal is to make a script (maybe using Python or Perl) to check in the list if all the IDs are nucleotide or protein sequences.
For example:
AVP78031.1 -> protein (this is a nucleotide sequence, I change nucleotide for protein to show an example).
AVP78042.1 -> nucleotide
ATO98108.1 -> nucleotide
ATO98120.1 -> nucleotide
ATO98132.1 -> nucleotide
Any idea to do a script?
Thank everybody!
If these are all GenBank or RefSeq accessions, you can use Entrez Direct for this as shown below:
$ cat accs.txt
ATO98108.1
ATO98120.1
ATO98132.1
AVP78031.1
AVP78042.1
$ cat accs.txt | epost -db nuccore | efetch -format acc
## no output because none of them are nucleotide accessions
$ cat accs.txt | epost -db protein -format acc | efetch -format acc
AVP78042.1
AVP78031.1
ATO98132.1
ATO98120.1
ATO98108.1
NOTE: This will work only if the accessions are currently live because epost
does not find any suppressed accessions. For example:
$ cat accs.txt
NM_002826.3
NM_002826.4
NM_002826.5
$ cat accs.txt | epost -db nuccore -format acc | efetch -format acc
NM_002826.5
Here, all three accessions are valid nucleotide accessions but only the last one, NM_002826.5, is alive.
An alternate way is to use the accession prefixes defined here and come up with an appropriate regular expression query.
Correct answer by vkkodali on April 27, 2021
1 Asked on March 24, 2021 by timd1
2 Asked on March 23, 2021 by whateversclever
1 Asked on March 22, 2021 by swa_mi
1 Asked on March 22, 2021 by nitha
1 Asked on March 20, 2021
2 Asked on March 19, 2021 by lazer-guided-lazerbeam
2 Asked on March 19, 2021 by celinedion
1 Asked on March 19, 2021 by user3390486
1 Asked on March 16, 2021 by maxno3
0 Asked on March 13, 2021 by mendel
1 Asked on March 13, 2021 by ryan-ward
0 Asked on March 12, 2021 by user257566
1 Asked on March 11, 2021
Get help from others!
Recent Questions
Recent Answers
© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir