Obtaining Whole Genetic Sequence

Question

As an end-user of my own data, specifically raw DNA sequence (WGS, whole genome sequencing). How or where do one obtain such DNA data so that a biology hobbyist can perform bioinformatics on this?
What keywords is best describe to the search engines when finding these online WGS websites).
It would be nice to have a USB stick mailed to me after such DNA HomeKit.  What file format is most commonly offered? (VCF?)
Am I looking too far ahead into the future?

Liam McIntyre · Accepted Answer

Are you looking too far ahead into the future? No. This is certainly possible now. The gold standard being Illumina or BGI short read WGS. Long read sequencing can capture some extra data but is very noisy.
How or where do one obtain such DNA data? There are now many companies. You want one the uses NGS with reads at least 150 bp and garantees at least a mean of 30x read depth. In my opinion you also want one that gives you ownership of your data. Ideally they will give you; fastqs (raw reads), a BAM (mapped fastqs) and a VCF (variant file). I good company will ask you if you want to donate your data to science. Be warned these are not defaults. Many (most?) companies won't give you your raw data citing medico legal risk and then they will on sell it to a pharma companies. I think these guys are OK (as they are in Europe with good privacy laws) https://www.dantelabs.com/ but plz do your own research.
Last, these data could include info pertaining to your health and your family's health. Always good to speak to a genetic counsellor to understand these things.

Kamil S Jaron · Answer

"DNA data" comes in several forms dependent on the technology used to produce them.
Companies like 23andMe are using SNP chips and those are available for an only rather a limited number of species. To be completely honest, I don't even know what would be the raw data they would provide, but I would suspect it could be a vcf file.
If the chosen technology would be DNA sequencing. DNA sequencing is provided by a few dozen of sequencing centres around the world, some of them associated with universities. DNA sequencing generates files that contain individual sequencing reads with assigned quality scores ("how much the sequencer trusts each base"), the typical formats are fastq (Illumina) or .bam (long read technologies). Usually, you would also need to find a suitable reference in one of the public databases, for instance, the NCBI portal for human reference. Then map your reads and do variant calling. Of course, then it depends on what would you be interested in.
If you don't care for your own DNA and you are more interested in life around you, you can check ncbi for all sorts of genomic data. If you would be interested to understand how the raw sequencing data work, ebi is a good place to look. Also, in every single paper generating/using sequencing data you there should be mention where in public databases you can find them.
Finally, let me warn you. Some of the genomic analyses are quite computationally heavy (not all though). If you don't have a specific question in mind, it would be wise to start with bacterial sequencing.
Hope you will have fun with DNA data!

Obtaining Whole Genetic Sequence

2 Answers

Add your own answers!

Ask a Question