Bioinformatics Asked on June 21, 2021
As far as I understood, for most assembly programs, the scaffolding step takes into consideration paired-end information in order to get from contigs (contiguous sequences) to scaffolds (longer sequences that might have some N-filled gaps).
My assembly software of choice, MEGAHIT, uses paired-end information to build the contigs, but it does not output a standard scaffold. So I am wondering the following, related things:
-is it meaningful to run a scaffold program on the output of MEGAHIT? I imagine there might be some instances in which paired-end information could span a gap.
-Which software would you recommend for it? (I’ve tried soapdenovo2 and SSPACE, but they appear not to be actively maintained so I have the issue of ‘it doesn’t work and I can’t do anything about it’)
-Could relevant information (eg. two contigs being connected by pair-end information) be recovered by alternative and perhaps more user-friendly means, such as exploring the assembly graph with Bandage?
Thank you for your time!
Update 2: It looks like your approach has actually been suggested here as one way to use the PE information. I guess megahit may not be really using the PE information anyways. I still believe that it's kind of weird but other people do suggest it, so maybe it's worth trying what Torsten suggests.
I think that scaffolding with PE information from the reads used as input to Megahit is somewhat sketchy. You could still try it, but I'd be worried about artifacts, just because de novo assemblers are so heuristic.
However, I think that it is perfectly ok to scaffold using orthogonal data. Here are some examples of orthogonal data:
The tools used in each case would be somewhat different. For a little more information about how you might use this, here is a recent review of tech.
Full disclosure: I work for a company that sells Hi-C kits for such applications.
Update: Realized that I missed one part of the Q. I think that visually exploring the assembly using e.g. bandage is always a good idea. Quite possibly you can make some scaffolding decisions that way, but to me it sounds somewhat painful to do, especially in a metagenome where there are going to be a lot of multiple-branching collapsed regions.
Answered by Maximilian Press on June 21, 2021
You can use SOAPdenovo-Fusion to scaffold contigs produced by MEGAHIT as suggested by one of the developers: https://github.com/aquaskyline/SOAPdenovo2
Answered by Robvh on June 21, 2021
1 Asked on January 15, 2021
1 Asked on January 13, 2021
0 Asked on January 10, 2021 by user977828
0 Asked on January 6, 2021 by lot_to_learn
1 Asked on January 6, 2021 by user432797
1 Asked on January 4, 2021 by manuel-milla
1 Asked on January 2, 2021 by marilu
0 Asked on December 30, 2020 by matthew-jones
1 Asked on December 30, 2020 by ryan-fahy
0 Asked on December 29, 2020
1 Asked on December 29, 2020 by anamaria
1 Asked on December 26, 2020
1 Asked on December 25, 2020 by paul-endymion
1 Asked on December 23, 2020
2 Asked on December 22, 2020 by kai-he
Get help from others!