Obtaining HGDP project data in fasta format

Bioinformatics Asked on September 26, 2021

I need to obtain sample data from modern humans in fasta format. I just need some megabytes of data from every individual. I actually use a script that obtains the cram file from here ( and then processes it to obtain the fasta file.
The problem is that cram files are large, slow to download and slow to process. It takes days to get the samples.
Is there a better way to get these samples in fasta format?

The script already makes use of samtools to retrieve only the part of the bam file it needs but doesn’t help much. Cram files are still gigabytes large for only a few megabytes of data that I need.

I have the same problem with data from the 1000 genomes project.

One Answer

You can download HGDP data in FASTQ format here:

Correct answer by Dan Bolser on September 26, 2021

