TransWikia.com

Get list of urls of GSM data set of a GSE set

Bioinformatics Asked by user432797 on January 6, 2021

I have this GSE dataset ( GSE104279 ) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE104279).

I want to make a table with set IDs and ftp urls to use it as a table in galaxy.org

I know that we can use ENA to get specific arrangement:
https://www.ebi.ac.uk/ena/browser/

I tried to get:

SampleID    Group   URL

so I used :
https://www.ebi.ac.uk/ena/browser/view/PRJNA412223

But nothing is showing.

Is there away to get these urls in the arrangement above?

One Answer

You can use Entrez Direct for this as follows:

esearch -db gds -query 'GSE104279' 
  | esummary 
  | xtract -pattern DocumentSummary 
    -if 'entryType' -equals 'GSM' 
    -def 'NA' -element Accession title summary FTPLink

This will return a table with data similar to this:

GSM2580330      18_Z13_2_d5_Zika        cortical organoids_Zika_5d      ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM2580nnn/GSM2580330/
GSM2580329      17_Z13_2_d5_Control     cortical organoids_mock_5d      ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM2580nnn/GSM2580329/

To download sequence reads, you should follow links to SRA. Using Entrez Direct you can do this as follows:

esearch -db gds -query 'GSE104279' 
  | elink -target sra 
  | efetch -format runinfo

This will return a comma-delimited table containing SRA identifiers and an FTP path to the SRA data. These FTP paths won't lead you to FASTQ files though. You can pass the SRA run identifiers of the format SRR### (or ERR### or DRR###) to fastq-dump or fasterq-dump tools from the SRAToolkit to download data in FASTQ format.

Correct answer by vkkodali on January 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP