Obtaining HGDP project data in fasta format

Bioinformatics Asked on September 26, 2021

I need to obtain sample data from modern humans in fasta format. I just need some megabytes of data from every individual. I actually use a script that obtains the cram file from here ( and then processes it to obtain the fasta file.
The problem is that cram files are large, slow to download and slow to process. It takes days to get the samples.
Is there a better way to get these samples in fasta format?

The script already makes use of samtools to retrieve only the part of the bam file it needs but doesn’t help much. Cram files are still gigabytes large for only a few megabytes of data that I need.

I have the same problem with data from the 1000 genomes project.

One Answer

You can download HGDP data in FASTQ format here:

Correct answer by Dan Bolser on September 26, 2021

Add your own answers!

Related Questions

Genome scaffolding

3  Asked on December 22, 2020


Time for running ADMIXTURE analysis

1  Asked on December 17, 2020 by iriel


Output the linear predictor from a stratified cox model?

1  Asked on December 17, 2020 by nienke


FastQC and Trimmomatic in Galaxy?

0  Asked on December 16, 2020


What type of Protein ID is this?

1  Asked on December 14, 2020 by firingam


How to find all WGS assemblies accessions of a species

1  Asked on December 12, 2020 by oren-milman


plot transcript expression vs length

0  Asked on December 10, 2020 by user3377241


Filtering pileup from site lists

0  Asked on December 8, 2020 by eliran-turgeman


How to modify DNA evolution model to fit actual data?

2  Asked on December 8, 2020 by anthony-guterres


Ask a Question

Get help from others!

© 2023 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir