Bioinformatics Asked by user438383 on April 25, 2021
I have merged together 2 different .bam files in order to simulate sample contamination. So the reads can come from one of two samples, as shown by the read group info:
@RG ID:0 PL:ILLUMINA SM:LP4100018-DNA_C11_Proband PU:HGY3WDSXX:1:none
@RG ID:1 PL:ILLUMINA SM:LP4100018-DNA_C11_Proband PU:HGY3WDSXX:2:none
@RG ID:2 PL:ILLUMINA SM:LP4100018-DNA_C11_Proband PU:HGY3WDSXX:3:none
@RG ID:3 PL:ILLUMINA SM:LP4100018-DNA_C11_Proband PU:HGY3WDSXX:4:none
@RG ID:0-11EFC00B PL:ILLUMINA SM:LP4100018-DNA_E11_Proband PU:HGY3WDSXX:1:none
@RG ID:1-B8A1099 PL:ILLUMINA SM:LP4100018-DNA_E11_Proband PU:HGY3WDSXX:2:none
@RG ID:2-330086F PL:ILLUMINA SM:LP4100018-DNA_E11_Proband PU:HGY3WDSXX:3:none
@RG ID:3-7681F092 PL:ILLUMINA SM:LP4100018-DNA_E11_Proband PU:HGY3WDSXX:4:none
I’d like to check that the correct proportion of read groups originate from each sample.
Currently I am using:
samtools view example.bam | rev | cut -f 1 | rev > output.txt
, but this is not very elegant and only works because the RG field is last in the .bam.
Is there a quick way to tabulate the number of reads groups with different IDs? E.g. produce an output like:
ID:0 1000
ID:1 2000
ID:2 3000
...
A solution in samtools would be ideal, along the lines of the output produced in samtools stats
.
1 Asked on December 7, 2021
3 Asked on December 2, 2021
1 Asked on November 29, 2021
7 Asked on November 27, 2021
1 Asked on November 27, 2021
2 Asked on November 24, 2021
1 Asked on November 17, 2021
1 Asked on November 15, 2021
2 Asked on November 15, 2021
chip seq differential expression macs2 peak calling sequence analysis
4 Asked on November 12, 2021
1 Asked on November 10, 2021
2 Asked on November 4, 2021
2 Asked on October 6, 2021
Get help from others!
Recent Answers
Recent Questions
© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir