TransWikia.com

How does picard's MarkDuplicate handle unmapped reads?

Bioinformatics Asked by init_js on August 22, 2021

Our BAM files are created according to a "lossless" alignment procedure [1] from the Broad Institute GATK documenation and involves re-adding the unaligned/unmapped reads into an aligned BAM, using Picard’s MergeBamAlignment.

The BAM files are produced in the end contain both the mapped and the unmapped reads. These files are then sorted with SortSam [2]- so that the sort order in the header becomes:

@HD VN:1.6 SO:coordinate

How does MarkDuplicates handle the unmapped reads of a BAM file containing both unmapped and mapped?

Note MarkDuplicates seems to normally take the BAM’s ordering into account, namely, it accepts arguments such as --ASSUME_SORT_ORDER X. However it’s not specified whether reads without a position are ignored, or have to be compared with all other possible reads.


Disclaimer: I initially posted this question on the GATK forum [3], but I’m reaching out to hopefully a broader audience.

Citations:

One Answer

From the Picard documentation:

DUPLICATION METRICS: Metrics that are calculated during the process of marking duplicates within a stream of SAMRecords.

UNMAPPED_READS The total number of unmapped reads examined. (Primary, non-supplemental)

It won't alter the flags on these reads, but it will count them in the summary report it generates. You should be able to test this yourself with a small set of mapped + unmapped reads

Answered by James Hawley on August 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP