Sequence alignment using BWT

Question

My Problem:
Skipping some specific background, what I want to do is judging whether some soft-clipping sequences are the same, which may result by the same SV event. Colored bases in Fig.1 is an example of soft-clipping sequences.

I use BWA -MEM as the aligner. And I am trying to develop an SV caller actually, what I want to do here is refining PE(Paired-Ends) signal. If it is SR(Split-Reads), I don't need to do this, because of SR and its supplemental alignments coming from the same read. But PE signal generated from some cluster procedures need some filters. And I think K-mer is not a good way to do this which adapted by some SV callers. For me, as a bioinformatic fool, imitation and implementation may be a good learning process. It seems still a long way to go.
My solution:
I suppose this is a sequence alignment problem. Then I use FM-indexing based on BWT to do this. My inexact/approximate matching implementation is referred from BWA.
My workflow:

Extract these soft-clipping sequences
Choose the longest one as the reference
Iterate others to match with the reference, if the number of unmatched is large than 1, it means these sequences are not the same.

Question 1:
Is that sounds reasonable to you?  Any suggestions?

Question 2:
If it is OK, how should I implement the penalty scoring scheme?
Here, because these sequences should come from the same SV event, the main differences(Mismatch, InDel) should arise from sequencing errors, not small genomic variants. So I want to set a higher gap open penalty and even higher gap extend penalty, like
mismatch_penalty = 4
gapopen_penalty = 11
gapextend_penalty = 20
alignment penalty cutoff = 30

The later means that one alignment with two more gaps will be considered as spurious alignment.
How do I optimize these parameters?

Liam McIntyre · Answer

You just need to check the flag for suplementary mapping. If there is suplementary mapping then use the coordinates; if they are all the same the seq is the same and you just have one event, if different reads have different suplementary mapping locations then there's multiple events. The problem is more complex than this though so just use an existing tool.

Answered by Liam McIntyre on August 27, 2021

Sequence alignment using BWT

One Answer

Add your own answers!

Ask a Question