TransWikia.com

How to find all WGS assemblies accessions of a species

Bioinformatics Asked by Oren Milman on December 12, 2020

Some background

Similar to the OP of https://www.biostars.org/p/377840/, I would like to programmatically BLAST a sequence to a local database of all WGS assemblies.
Since this isn’t feasible for the average biology lab server (correct me if I am wrong), I plan to use ncbi-acc-download to download all WGS assemblies of the species of interest (not a popular species like E. coli, so it should be feasible). Then, I will create a BLAST database for the downloaded assemblies and BLAST the sequence to it.

My question

How can I find all WGS assemblies accessions of a species?


My current plan is to search the NCBI Assembly database using Entrez and a search term such as "wgs"[Properties] AND txid1337[orgn:exp].
EDIT: IIUC, this approach might miss some WGS assemblies. See my answer.

I am worried (and thus ask for your help) this isn’t the right approach because there seem to be at least 3 other places in which assemblies can be found:

One Answer

There seem to be WGS assemblies that can't be found in NCBI Assembly database, e.g.: https://www.ncbi.nlm.nih.gov/nuccore/1779902990. I guess that such assemblies also cannot be found in The assembly_summary.txt files that are described in https://ftp.ncbi.nlm.nih.gov/genomes/README_assembly_summary.txt.

My current best guess is that each WGS assembly has a "WGS master record" in NCBI Nuccore database. To find all WGS assemblies of a taxon whose uid in NCBI Taxonomy database is 1337, search NCBI Nuccore database using Entrez and the search term "wgs master"[Properties] AND txid1337[orgn:exp].

Answered by Oren Milman on December 12, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP