Genome scaffolding

Bioinformatics Asked on December 22, 2020

I have assembled a virus genome using Ray resulting in approximately 5000 contigs. Now I want to build a scaffold of those contigs to get the full genome (I am aware of the fact that there might be some gaps in the genome).

I found one tool called Medusa to do this. Unfortunately, this tool cannot take a file above 50 MB and my contig file is about 156 MB.

Are there any other tools that can perform reference based scaffolding?

It would be great if anyone knows about genome scaffolding packages in Bioconda.

3 Answers

What you mean is assembly based scaffolding, as opposed to using reads with long distance information such as mate pair/long jumping distance to scaffold.

You could reduce your contig file to just the longest contigs, since they have the most information. Probably 156MB of data is overkill for your virus.

There are other programs out there.

Your mileage will vary.

Answered by Colin Davenport on December 22, 2020

The number of contigs and total assembly size you have suggest that there was probably more in the sequencing run than a single virus strain. Does the total assembly length correspond to expected genome size of the reference strain? Have you tried just to blast some your contigs? Could you have a contamination or something?

Maybe you can just sort out the contigs by the reference using something like artemis or quast compares your assembly to reference and showing nice stats (just to be sure that your assembly makes sense so far). Or you could use directly MUMmer (which the alignment tool that is running on the backend of of Medusa and Quast).

This is not directly answering your question "How to scaffold", however I think you should first figure out if your contigs make sense before you scaffold them. Also sorting your contigs by artemis using reference is practically the same thing as scaffolding since you won't know the sequence in between in either of the cases.

Answered by Kamil S Jaron on December 22, 2020

I found that Viral NGS Pipeline from broad institute can actually do this. There is a python script called order_and_orient which serves this purpose. Here is the link viral ngs assembly

Answered by L R Joshi on December 22, 2020

Add your own answers!

Related Questions

Block wise protein imputation

2  Asked on March 23, 2021 by whateversclever


RAD Seq Data Analysis without barcode

2  Asked on March 20, 2021 by biobash


FASTA and PDB: How to specify chain?

2  Asked on March 19, 2021 by lazer-guided-lazerbeam


How can I use my Myheritage DNA results file for further analysis?

1  Asked on March 19, 2021 by user3390486


Within and between sample count normalization

1  Asked on March 16, 2021 by maxno3


Too slow issue of BioMart

1  Asked on March 12, 2021 by user224050


Obtaining Whole Genetic Sequence

2  Asked on March 11, 2021


Ask a Question

Get help from others!

© 2023 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir