Difference between genome assembly and genome sequence alignment to a reference to find structural variants

Question

I'm trying to determine what the difference and benefits of genome assembly and genome sequence alignments are when trying to identify structural variants or transposons in populations.
I've been scouring the internet but have only really come across the difference between short vs long reads and de novo assembly vs reference-based.
My understanding is that to identify variations in structural variants within a population there seem to be 2 main comparative genomic methods, the first being what the 1KGP and SDGP did and sequence the whole genome, align the reads to the reference genome and end up with a BAM file.
The second is to assemble personal genomes and then compare or align the assemblies to each other and the reference genome or using the Lastz/LiftOver/ChainNets Examples: 10.1016/j.gene.2005.09.031
Thanks in advance.

Chris_Rands · Answer

the first being what the 1KGP and SDGP did and sequence the whole
genome, align the reads to the reference genome and end up with a BAM
file.

If you have a well defined reference genome (e.g. human, mouse etc.) and you are interested in population level genetic variation, then this is the main approach. If you sequence a new human genome in the classical way (i.e. short reads ~30X coverage etc.), de novo assembly is pointless generally and you can more rapidly map reads to the reference. The read mapping approach has the large advantage that when you get the to variant calling stage, you can use information about both the depth of sequencing and the base quality scores.

The second is to assemble personal genomes and then compare or align
the assemblies to each other and the reference genome or using the
Lastz/LiftOver/ChainNets

This is the traditional comparative genomics approach used for non-model organisms and for conducting evolutionary rate comparisons at the cross-species level. You discard information on sequencing depth and quality scores, but you can compare many species' genomes at once genome-wide. You also worry less about sequencing errors etc. because you are not looking for relatively rare population level variation, but instead more common (depending on your species) variation between species.

Difference between genome assembly and genome sequence alignment to a reference to find structural variants

One Answer

Add your own answers!

Ask a Question