TransWikia.com

How to annotate optimally a fungal genome without RNA-seq evidence?

Bioinformatics Asked on January 30, 2021

Genome information:

  1. ~50M nt
  2. 2300+ contigs
  3. No pre-trained parameters in Augustus
  4. There are several well annotated RefSeq genomes of this genus.
  5. There are several RNA-seq data of this genus, but not in this species.

My current strategy:

EST evidence:

  1. I selected many RNA-seq data that include as many species as
    possible, and de novo assembled these data using Trinity.
  2. Only the longest transcripts were saved.
  3. All transcript files were catenated into a Total_transcript.fasta file, and cd-hit-est was used to reduce data size with options -c 0.8 -n 5 -s 0.8.

Protein evidence

  1. All fungal proteins from SwissProt was download and reduced using cd-hit with options -c 0.8 -n 5 -s 0.8.

  2. Proteinsets from RefSeq genomes of this genus were download, catenated, reduced using same methods.

I planed to annotate genome using Maker2 with above Evidence.

I have a doubt about my strategy: Maker2 calls Augustus to ab-initio annotation, but no pre-trained model. How should I do to solve this?

Anyone else could share your valuable suggestions? Thanks anyway.

One Answer

The Maker documentation does include some instructions for training ab initio gene predictors, but it assumes an abundant EST database is available. (Assumptions about what kinds of sequences will be available for a draft genome assembly have changed drastically in the last decade.)

It may be worth exploring an iterative approach in any case. You can do a first-pass annotation with Maker using the evidence you discussed, along with ab initio predictions from Augustus (and I'd recommend SNAP as well). Each gene model will be scored by Maker with the annotation edit distance (AED), representing how well the gene model agrees with the evidence (0.0 is best). From this first pass, if you can identify a few hundred reliable gene models with good AED scores (or maybe even as many as one or two thousand), that should be plenty of training data. You could use these gene models to train a new model for Augustus, using Maker's instructions or Augustus' own instruction. (In my experience, it was quite a bit of work in either case.)

Answered by Daniel Standage on January 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP