TransWikia.com

How can I obtain FTP links to studies in ENA?

Bioinformatics Asked on August 22, 2021

How can I programmatically obtain ftp links to RNA seq fastq files in ENA? Here’s an example of a link that I would be interested in obtaining:

ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR824/000/SRR8240860/SRR8240860_1.fastq.gz

In particular, is there some tool that, given the BioProject ID (here, PRJNA506829), would be able give me all ftp links for the runs in the project, or would I need to write a web scraper to do it?

3 Answers

pysradb can fetch ENA/SRA fastq/bam links (if available):

$ pysradb metadata SRR8240860 --detailed
run_accession study_accession experiment_accession experiment_title                                               experiment_desc                                                organism_taxid  organism_name           library_strategy library_source  library_selection library_layout sample_accession sample_title instrument   total_spots total_size  run_total_spots run_total_bases run_alias      sra_url_alt1                                    sra_url_alt2                                    sra_url                                                                                  experiment_alias source_name strain/genotype developmental stage ena_fastq_http ena_fastq_http_1                                                                 ena_fastq_http_2                                                                 ena_fastq_ftp ena_fastq_ftp_1                                                                     ena_fastq_ftp_2                                                                    
SRR8240860    SRP170618       SRX5059122           GSM3487689: fer-15 Day1 rep1; Caenorhabditis elegans; RNA-Seq  GSM3487689: fer-15 Day1 rep1; Caenorhabditis elegans; RNA-Seq  6239            Caenorhabditis elegans  RNA-Seq          TRANSCRIPTOMIC  cDNA              PAIRED         SRS4075216       N/A          HiSeq X Ten  40494415    4755049874  40494415        12148324500     GSM3487689_r1  gs://sra-pub-src-3/SRR8240860/RRA0719_R2.fq.gz  s3://sra-pub-src-3/SRR8240860/RRA0719_R2.fq.gz  https://sra-downloadb.st-va.ncbi.nlm.nih.gov/sos2/sra-pub-run-3/SRR8240860/SRR8240860.1  GSM3487689       whole worm  fer-15(b26ts)   Adult day 1         N/A            http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR824/000/SRR8240860/SRR8240860_1.fastq.gz  http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR824/000/SRR8240860/SRR8240860_2.fastq.gz  N/A           [email protected]:vol1/fastq/SRR824/000/SRR8240860/SRR8240860_1.fastq.gz  [email protected]:vol1/fastq/SRR824/000/SRR8240860/SRR8240860_2.fastq.gz

Answered by rightskewed on August 22, 2021

They have an API you can interact with.

If you need to get files for only a few different projects: Search for your accession ID in the browser (leading to https://www.ebi.ac.uk/ena/browser/view/PRJNA506829). Filter the show selected columns to only fastq_ftp, click download tsv to get the list of ftp links.

Answered by Pallie on August 22, 2021

Here are some command lines I used for that purpose in bash. Simply prepare a text file containing each accession number (SRR/ERR) you want and create a for loop. Here I used prozilla to speed up downloads but you may use wget either.

for index in $(cat list_of_accessions) ; do

if [ ${#index} -eq 9 ]; then
            proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/${index}/${index}_1.fastq.gz
            proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/${index}/${index}_2.fastq.gz
            else
            if [ ${#index} -eq 10 ]; then
                    proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/00${index:9:9}/${index}/${index}_1.fastq.gz
                            proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/00${index:9:9}/${index}/${index}_2.fastq.gz
                    else
                    if [ ${#index} -eq 11 ]; then
                     proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/0${index:9:10}/${index}/${index}_1.fastq.gz
                                        proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/0${index:9:10}/${index}/${index}_2.fastq.gz
                    else
                        proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/${index:9:11}/${index}/${index}_1.fastq.gz
                                                proz -k=6 --no-curses ftp.sra.ebi.ac.uk/vol1/fastq/${index:0:6}/${index:9:11}/${index}/${index}_2.fastq.gz
                        fi
                fi
            
            fi
done
```

Answered by thomas duge de bernonville on August 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP