TransWikia.com

Samtools Index: Chromosome Blocks not Continuous

Bioinformatics Asked on October 17, 2020

I am working with short-read whole-genome sequences from the NCBI’s SRA. I have aligned and sorted all of my short-read sequences and am attempting to index each sequence into .bai format using samtools index, but am running into a couple of errors.

I unpacked the original .sra files in the following manner:

fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files SRR6509138.sra

I then aligned each fastq paired-end read, removed duplicate reads, and converted each to bam format like so:

bwa mem xiphophorus_birchmanni_10x_12Sep2018_yDAA6.fasta SRR6509136_1.fastq SRR6509136_2.fastq | samblaster -e -r| samtools view -Sb - > blasted_SRR6509136.bam

Some of my files did not come as paired-end reads. For those files, I used the –ignoreUnmated flag on samblaster.

I then sorted each bam file like so:

samtools sort blasted_SRR6649368.bam -o sorted_SRR6649368.bam -n

I am attempting to obtain a bai file for each bam using the following command:

samtools index sorted_SRR6649368.bam sorted_SRR6649368.bam.bai

This is the error I run into for the unpaired reads:

[E::hts_idx_push] Chromosome blocks not continuous
[E::sam_index] Read 'HWI-ST387:164:D0CJWACXX:4:1101:1129:13264' with ref_name='ScdB1pO_646;HRSCAF=880', ref_length=33092979, flags=16, pos=23260108 cannot be indexed
samtools index: failed to create index for "sorted_SRR791885.bam": No such file or directory

This is the error I run into for the paired reads:

[E::hts_idx_push] Unsorted positions on sequence #79: 11159020 followed by 11158717
samtools index: failed to create index for "sorted_SRR6649368.bam"

Can anyone help me figure out why this is happening?

One Answer

samtools sort blasted_SRR6649368.bam -o sorted_SRR6649368.bam -n

These error messages indicate that the reads are not sorted by coordinate — in particular, that the reads mapped to ScdB1pO_646;HRSCAF=880 are not all together, and that the reads at positions 11159020 < 11158717 are not sorted by position on their chromosome.

This is because samtools sort -n has been used to sort the reads by name instead. Remove -n to sort by position, which is what is needed to prepare a BAM file for indexing with samtools index.

Correct answer by John Marshall on October 17, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP