TransWikia.com

What are some ways to error correct Oxford Nanopore long read sequencing?

Bioinformatics Asked on August 3, 2021

I am sequencing long read genomic sequences to assemble MHC region haplotypes in a non model organism using Cas9 Sequencing Kit (SQK-CS9109) using flongle adaptors in the minION from Oxford Nanopore Technologies (ONT). However the error rate in genomic sequences results from ONT has been over 14%.

I know that there is error correction using Illumina sequencing and I was hoping anyone knows more about how this is accomplished?

Additionally, there are bioinformatic tools to error correct these errors? Has anyone accomplished diminishing this big error rate using bioinformatic tools?

One Answer

How are you evaluating sequencing error rate? My most recent re-calls of 2017 sequences are demonstrating median single-read accuracy over 96%.

Before considering Illumina, it'd be worth it to do an initial correction using nanopore-only reads. This will make sure that the best results are obtained from the nanopore signal. First, make sure that the reads have been recalled using the latest highest-accuracy basecaller (currently guppy v5.0.11):

guppy_basecaller -c dna_r9.4.1_450bps_sup.cfg ... # for 9.4.1 reads
guppy_basecaller -c dna_r10.3_450bps_sup.cfg ... # for 10.3 reads

If you don't have any reference sequences available, reads can be corrected using the other reads by using canu:

canu -correct -p nonModel -d nonModel genomeSize=10m -nanopore reads.fastq

If you do have reference sequences available, then reads can be corrected using megalodon. I don't have any experience with megalodon, but have seen a couple of papers demonstrating that it has good accuracy, especially for methylation calling.

After all that, if you're still not happy with the accuracy, you can use Pilon for read correction using the Illumina reads. Pilon is used for correcting a reference sequence, so you'll need to create your haplotype assembly first before using Pilon. Even when using Pilon, I'd still recommend only using the Illumina reads for local INDEL correction. Pileup nanopore reads are very accurate for substitutions, especially when considering [lack of] strand bias when evaluating variants.

Update:

Given that you're interested in determining haplotypes across the MHC, which is something that performs poorly for short reads even for well-annotated genomes, I think you're better of sticking with reference-free correction, and categorising haplotypes while taking taking this into account. If you just need to be able to distinguish haplotypes, I recommend doing homopolymer compression on the resultant sequences, as hompolymer inaccuracies are the most common error mode for nanopore reads. Even if that doesn't get you all the way to perfect haplotypes, it might get close enough that something like CDHIT can be used for binning similar sequences.

HLA typing in humans can be done from nanopore reads (e.g. see here and here), but don't expect perfect base-level accuracy. What I'm suggesting is something similar to a typing process, but without an established reference, i.e. binning reads based on their similarity to each other.

Correct answer by gringer on August 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP