TransWikia.com

Downloading SRA Files from AWS

Bioinformatics Asked on December 6, 2020

I want to download the original BAM files that the authors had uploaded to SRA. Normally, I would just use sam-dump, but the files are having issues that seem related to this issue. Since according to the entry, AWS S3 also hosts the original BAM files, I thought I could download these directly.

Example SRA Entry

enter image description here

NCBI documentation implies that I can’t download this directly, but I can freely copy to other AWS locations within the region. To this end, I created my own S3 bucket (mm-mneuron) and am now trying to copy from the SRA bucket to mine. Here’s what I try:

import boto3
import botocore

s3 = boto3.resource('s3')

bam_file = {
  'Bucket': 'sra-pub-src-6',
  'Key': 'SRR5253957/RPI25_0.bam'
}

my_bucket = s3.Bucket('mm-mneuron')

my_bucket.copy(bam_file, 'RPI25_0.bam')

This fails with:

botocore.exceptions.ClientError: An error occurred (403) when calling
the HeadObject operation: Forbidden

That is, it sounds like I can’t access the SRA bucket. I’ve tested downloading and uploading to my bucket, so I know I have write permissions. Not sure what else to try here.

How can I access the SRA data on S3?

One Answer

A member of the SRA submission staff pointed out that using

prefetch --type all SRR5253957

will download the original files. In this case, it means running the above within an EC2 instance colocated with the S3 bucket (so, us-east-1) and having installed and configured SRA Toolkit to work from AWS (as per this documentation).

Unfortunately, the particular files I am concerned with are not currently accessible, but this should generically work in most situations.

Answered by merv on December 6, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP