hisat2 --rna-strandness option and downstream htseq-count analysis

Question

I've got some doubts on the hisat2 --rna-strandness option and its output for downstream analysis. Please see below.

I understand that the --rna-strandness option produces an XS tag to indicate where a transcript is from (on the + or - strand) for downstream transcriptome assembly analysis. I have a paired-end stranded sequencing library that was aligned to the genome using hisat2 without specifying the --rna-strandness (in other words, the default unstranded was the usage). Following this, the reads were assigned to genes using htseq-count and this time "-s reverse" was specified given the strand-specific sequencing assay type.

Would the above handling affect the alignment and counting results given the default usage of --rna-strandness in hisat2 followed by htseq-count -s reverse on a strand-specific assay? Since --rna-strandness is for transcriptome assembly using the XS tags generated and htseq does not use XS tags for counting, I presume there should be no practical impact from the above. Could you also shed light on this? in case I may have been overlooking other facts of the usages of the tools.

To help verify the above, I re-aligned and counted the reads from 2 samples by switching on --rna-strandness RF in hisat2. I attach the alignment and count features info. below for assessment.

Overall alignment rate of Sample 1: 94.52% (--rna-strandness RF) vs.94.12% (--rna-strandness unstranded)
Overall alignment rate of Sample 2: 94.57% (--rna-strandness RF) vs.94.15% (--rna-strandness unstranded)

Feature counts of Sample 1 (following --rna-strandness RF + -s reverse):
__no_feature        6327294
__ambiguous     2954776
__too_low_aQual     3784481
__not_aligned       688856
__alignment_not_unique      4858182

Feature counts of Sample 1 (following --rna-strandness unstranded + -s reverse):
__no_feature        6291151
__ambiguous     2911298
__too_low_aQual     4075017
__not_aligned       754400
__alignment_not_unique      16136045

Feature counts of Sample 2 (following --rna-strandness RF + -s reverse):
__no_feature        5417882
__ambiguous     1708510
__too_low_aQual     3532352
__not_aligned       564596
__alignment_not_unique      2859501

Feature counts of Sample 1 (following --rna-strandness unstranded + -s reverse):
__no_feature        5359434
__ambiguous     1676091
__too_low_aQual     3813344
__not_aligned       623122
__alignment_not_unique      2891792

These results look comparable to me across pipelines.

Thanks
Guan

swbarnes2 · Answer

If you reran the command with the correct settings, just leave it at that.  (It is not at all clear to me that strandedness rf is correct)

If you want people to tell you if you ran the commands right, you need to put down what commands you used.

hisat2 --rna-strandness option and downstream htseq-count analysis

One Answer

Add your own answers!

Ask a Question