Genome scaffolding

Question

I have assembled a virus genome using Ray resulting in approximately 5000 contigs. Now I want to build a scaffold of those contigs to get the full genome (I am aware of the fact that there might be some gaps in the genome).
I  found one tool called Medusa to do this. Unfortunately, this tool cannot take a file above 50 MB and my contig file is about 156 MB.
Are there any other tools that can perform reference based scaffolding?
It would be great if anyone knows about genome scaffolding packages in Bioconda.

Colin Davenport · Answer

What you mean is assembly based scaffolding, as opposed to using reads with long distance information such as mate pair/long jumping distance to scaffold.

You could reduce your contig file to just the longest contigs, since they have the most information. Probably 156MB of data is overkill for your virus.

There are other programs out there.

https://github.com/ksahlin/BESST

https://github.com/institut-de-genomique/MaGuS

Your mileage will vary.

Kamil S Jaron · Answer

The number of contigs and total assembly size you have suggest that there was probably more in the sequencing run than a single virus strain. Does the total assembly length correspond to expected genome size of the reference strain? Have you tried just to blast some your contigs? Could you have a contamination or something?

Maybe you can just sort out the contigs by the reference using something like artemis or quast compares your assembly to reference and showing nice stats (just to be sure that your assembly makes sense so far). Or you could use directly MUMmer (which the alignment tool that is running on the backend of of Medusa and Quast).

This is not directly answering your question "How to scaffold", however I think you should first figure out if your contigs make sense before you scaffold them. Also sorting your contigs by artemis using reference is practically the same thing as scaffolding since you won't know the sequence in between in either of the cases.

L R Joshi · Answer

I found that Viral NGS Pipeline from broad institute can actually do this. There is a python script called order_and_orient which serves this purpose. Here is the link  viral ngs assembly

Genome scaffolding

3 Answers

Add your own answers!

Ask a Question