TransWikia.com

Add tags to the read pairs in a bam file

Bioinformatics Asked by SPPearce on May 27, 2021

I have a bam file from paired-end sequencing, in which each R1 has the RX tag assigned (via UMItools group) denoting a Unique Molecular Identifier (UMI).

I want to call molecular consensus reads using fgbio but fgbio GroupReadsByUmi gives me an error, because it expects both R1 and R2 to have the RX tag set.

I would like a tool that will go through my bam file and assign each R2 the same RX tag as the R1.

One Answer

The best thing would be to figure out how to do this in UMItools. I don't know that tool at all, but it seems that it is supposed to be able to deal with / tag both reads in paired data. e.g. did you use --paired flag, etc.

I would suggest investigating that route first before you look at the next part of my answer.


Per Devon Ryan's comment suggestion I would suggest a workflow similar to this:

  • sort bam file by query name (samtools sort -n) to get the R1s immediately following the R2s
  • Write a short script/command that
    1. reads through the bam file and for each line
    2. looks up the tag in the R1, prints R1 line
    3. reads the R2, substitutes the RX tag in, prints R2 line
  • pipe that through samtools view and possibly samtools sort back to a bam file of the same type that you started with.

My pysam is a little rusty but something like this might work

import pysam

bam = pysam.AlignmentFile("bam", "rb")
# print header
print(bam.header)

tag = None
for read in bam:
    r2 = False
    if tag:
        read.assign_umi_tag() # assign to read- not sure where/how
        r2 = True
    else:
        tag = read.lookup_umi_tag()  # i don't know where this tag is in bam format
    print(read)
    if r2:
        tag = None

I haven't tested that in any way, caveat emptor.

I am not personally familiar with UMItools and how it assigns tags so can't suggest anything more specific, sorry.

Answered by Maximilian Press on May 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP