TransWikia.com

Getting Unique Identifier List for GEO Datasets NCBI

Bioinformatics Asked by Pawan Verma on June 24, 2021

AIM: Download "Unique Identifier List" for the following query from GEO DataSets.

Query: ("Expression profiling by high throughput sequencing"[DataSet Type] AND ("Homo sapiens"[Organism] OR "Mus musculus"[Organism] OR "rattus norvegicus"[Organism])) AND ("2020/01/01"[PDAT] : "3000"[PDAT])

which means, all RNASeq studies deposited on GEO in the year 2020 for humans, mice or rat.

Problem: I need the GSE ID list for ~9k datasets, but while trying to download the list of ids, it loads to a blank page and nothing happens. Also, clicking on "Next Page" gives error.
I have been trying for the last 3-4 days but it doesn’t work.

Steps to generate file:

"Send To" -> "File" -> "Format" (Unique Identifier List) -> "Sort By" (Default Order) -> "Create File"

2 Answers

You can use Entrez Direct for this. The following returns Unique Identifiers which are just bare integers.

$ geo_query='"Expression profiling by high throughput sequencing"[DataSet Type] AND ("Homo sapiens"[Organism] OR "Mus musculus"[Organism] OR "rattus norvegicus"[Organism]) AND ("2020/01/01"[PDAT] : "3000"[PDAT])'
$ esearch -db gds -query "$geo_query" | efetch -format uid > gds_results.txt 
$ wc -l gds_results.txt 
9981 gds_results.txt
$ head -n2 gds_results.txt 
200134092
200120931

Instead, if you are looking for a way to get the GSE accessions, you can use the built-in xtract command to parse the XML returned by esummary as follows:

$ esearch -db gds -query "$geo_query" | esummary | xtract -pattern DocumentSummary -first Accession > gse_accs.txt > gse_accs.txt
$ wc -l gse_accs.txt 
9981 gse_accs.txt
$ head -n2 gse_accs.txt 
GSE165829
GSE165824

Correct answer by vkkodali on June 24, 2021

I think I would just try to do this with GEOquery.

https://bioconductor.org/packages/release/bioc/html/GEOquery.html

Answered by k1sauce on June 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP