TransWikia.com

Finding simple sequence from reads with significant overlap

Bioinformatics Asked by Ryan Ward on March 13, 2021

I wrote a "script" to pull out reads from a huge fastq file in an iterative manner, by finding homology to the previous sequence.

It should be relatively easy to overlap them and assemble the gene, however traditional assemblers like SPAdes fail due to obviously low coverage.

I am hesitant to just feed it more faked reads, when I know I have all the data necessary to just overlap them into a single sequence.

The reads are set up like this:

GATCAAACATC
     AACATCAGTTAG
         TCAGTTAG
              TAGAGGATAGC

What kind of assembler is out there where I can feed it this kind of terribly-formatted data?

One Answer

Miniasm will probably work for this; it's designed to be an ultra-fast, greedy assembler, which has been used for people doing real-time assemblies from MinION data. Because of its greedy nature, it can misassemble regions where the overlapping fragments are ambiguous, so it's worth using some other method in addition to miniasm to confirm that the assembly is correct.

By default it looks for overlap regions that have a coverage of 3 reads when joining (option -c), but I assume that could be reduced down to 2 reads, as in your example (which has a minimum overlapping coverage of 2).

Correct answer by gringer on March 13, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP