Bioinformatics Asked on September 26, 2021
I need to obtain sample data from modern humans in fasta format. I just need some megabytes of data from every individual. I actually use a script that obtains the cram file from here (ftp.1000genomes.ebi.ac.uk) and then processes it to obtain the fasta file.
The problem is that cram files are large, slow to download and slow to process. It takes days to get the samples.
Is there a better way to get these samples in fasta format?
The script already makes use of samtools to retrieve only the part of the bam file it needs but doesn’t help much. Cram files are still gigabytes large for only a few megabytes of data that I need.
I have the same problem with data from the 1000 genomes project.
You can download HGDP data in FASTQ format here: https://www.internationalgenome.org/data-portal/data-collection/hgdp
Correct answer by Dan Bolser on September 26, 2021
1 Asked on December 20, 2020 by bio314
0 Asked on December 20, 2020
1 Asked on December 18, 2020 by tangli83
1 Asked on December 17, 2020 by iriel
1 Asked on December 17, 2020 by nienke
1 Asked on December 14, 2020
1 Asked on December 12, 2020 by oren-milman
2 Asked on December 11, 2020
0 Asked on December 10, 2020 by user3377241
3 Asked on December 10, 2020 by user3289492
2 Asked on December 9, 2020 by 0x90
1 Asked on December 9, 2020
1 Asked on December 9, 2020 by dasfoogle
1 Asked on December 9, 2020 by chippycentra
0 Asked on December 8, 2020 by eliran-turgeman
2 Asked on December 8, 2020 by anthony-guterres
Get help from others!