Why do I get cytosine to guanine/adenine transitions in bisulphite treated sequences?

Question

I got my sequencing results (bisulphite treated and non treated sequences of same species Allium cepa) and now I have to do analysis in Cymate online tool. I prepared all sequences as it is written in original Cymate paper (master sequence and clones - aligned in ClustalW, blunt ended).
My problem is that I got gaps in results (image is attached, it is from Cymate Result) on positions 27, 50, 58 etc.

When I looked back at my original alignment FASTA file and compared those positions, I noticed that those gaps are from C to A, or C to G transitions (and obviously Cymate outputs only C to C, and C to T cases).
Does anyone know why do I get those C to A, or C to G, are those some mismatches, should I manually remove it from alignment, or did I made some other mistake?
Any advice from you would be very useful for me!
My FASTA file alignment, which I uploaded to Cymate, is here (if anyone wants to check). First sequence is master sequence (non-treated, orifinal sequence) and the rest are clones (bisulphite treated sequences).

Consensus_cepa__(289_bp)
GAAAGCCAACCACCACATCCGCCATCCCTCACAGTATGCCAACGAGCAGCTGAATGACTCCGAACAACGCAAAGCATGACGCCCAAACCGACGAACACCCAACCGAAAGGCCACAAGCGCAACGTGCATTCCAAAACCCCATGAACCACCGAATCCTGCAATGCACACCACGCTCGACGCCATTCGCTACAGACCACATCGACACGAGCGCCAAGCCATCCATCACCCGAAGCCATCCACCCACTCACAAACATACCACGAAGCACAGCAAATATGAAGACAAACCCTC
MV_A_SP6_izrezani_s_primerima__(295_bp)
AAAAATCAATCTCCACATCCGCCATCACTCACAGTATATTTAC-AGTAGATAAATAA-TTAAAATAACGCAAAACATAACGCCCAAACAGACGTACTCTCAACCTAATAACCTCGAACGCAACTTACATTCAAAAACTCGATAATTCACGAAATTCTGCAATTCACACCAAA--TATTGCATTTCGCTACGTTCTTCATCGACACGAAAACCAAAATATCCATTGCCAAAAATCATTCAGACGCTCACTGAAATAACACAAAACACATCAAA-ATAATAACAAACTTTC
MV_B_SP6_izrezani_s_primerima__(295_bp)
AAAAATCAATCTCCACATCCGCCATCACTCACAGTATATTTAC-AGTAGATAAATAA-TTAAAATGACGCAAAACATAACGCCCAAACAGACGTACTCTCAACCTAATGACCTTGAACGCAACTTACATTCAAAAACTCGATGATTCACGGAATTCTGCAATTCACACCAAA--TATCACATTTCGCTACGTTCTTCATCAACACGAAAACCAAAATATCCATTACCAGAAATCATTCAGACGCTCACTGAAATAACACAAAACACATCAAA-ATAATAACAAACTTTC
MV_C_SP6_izrezani_s_primerima__(291_bp)
AAAAATCAATCTCCACATCCGCCATCACTCACAGTATATTTAC-AGTAAATAAATAA-TTAAAATAACGCAAAACATGACACCCAAACAGACATACTCTCAACCTAATAACCTCGAACGCAACTTACATTCAAAAACTCGATAATTCACGGAATTCTGCAATTCACACCAAA--TATCGCATTTCGCTACGTTCTTCATCGACACGAGAACCAAAATATCCATTGCCAGAAATCATTCAGACGCTCACTGAAATAACACAAAACACATCAAA-ATAATAACAAACTTTC

Maximilian Press · Answer

I quickly aligned your sequences, and as far as I can tell, the answer is that your "consensus" sequence is a poor match for the 3 other sequences:
CLUSTAL format alignment by MAFFT L-INS-i (v7.310)

Consensus_cepa_ gaaagccaaccaccacatccgccatccctcacagtatgccaacgagcagctgaatgactc
MV_A_SP6_izreza aaaaatcaatctccacatccgccatcactcacagtatatttac-agtagataaataa-tt
MV_C_SP6_izreza aaaaatcaatctccacatccgccatcactcacagtatatttac-agtaaataaataa-tt
MV_B_SP6_izreza aaaaatcaatctccacatccgccatcactcacagtatatttac-agtagataaataa-tt
                .***..***.* ************** **********... ** **.*. *.***.* *.

Consensus_cepa_ cgaacaacgcaaagcatgacgcccaaaccgacgaacacccaaccgaaaggccacaagcgc
MV_A_SP6_izreza aaaataacgcaaaacataacgcccaaacagacgtactctcaacctaataacctcgaacgc
MV_C_SP6_izreza aaaataacgcaaaacatgacacccaaacagacatactctcaacctaataacctcgaacgc
MV_B_SP6_izreza aaaatgacgcaaaacataacgcccaaacagacgtactctcaacctaatgaccttgaacgc
                 .**..*******.***.**.******* ***. ** *.***** ** ..** ..*.***

Consensus_cepa_ aacgtgcattccaaaaccccatgaaccaccgaatcctgcaatgcacaccacgctcgacgc
MV_A_SP6_izreza aacttacattcaaaaactcgataattcacgaaattctgcaattcacaccaaatattgcat
MV_C_SP6_izreza aacttacattcaaaaactcgataattcacggaattctgcaattcacaccaaatatcgcat
MV_B_SP6_izreza aacttacattcaaaaactcgatgattcacggaattctgcaattcacaccaaatatcacat
                *** *.***** *****.* **.* .*** .***.******* ******* .. . .*..

Consensus_cepa_ cattcgctacagaccacatcgacacgagcgccaagccatccatcacccgaagccatccac
MV_A_SP6_izreza --ttcgctacgttcttcatcgacacgaaaaccaaaatatccattgccaaaaatcattcag
MV_C_SP6_izreza --ttcgctacgttcttcatcgacacgagaaccaaaatatccattgccagaaatcattcag
MV_B_SP6_izreza --ttcgctacgttcttcatcaacacgaaaaccaaaatatccattaccagaaatcattcag
                  ********.  *. ****.******. .****. .******..** .**..***.**

Consensus_cepa_ ccactcacaaacataccacgaagcacagcaaatatgaagacaaaccctc
MV_A_SP6_izreza acgctcactgaaataacacaaaacacatcaaa-ataataacaaactttc
MV_C_SP6_izreza acgctcactgaaataacacaaaacacatcaaa-ataataacaaactttc
MV_B_SP6_izreza acgctcactgaaataacacaaaacacatcaaa-ataataacaaactttc
                 *.***** .* *** ***.**.**** **** **.* .******..**

I would suggest trying to come up with a different, more appropriate consensus/reference sequence. This one does not appear to be appropriate.
Granted, I have no idea how you came up with this consensus sequence (it certainly is not a consensus of the other sequences you show), and possibly there is something else going on that I am not understanding.
This does not seem to have anything to do with the bisulfite conversion, as the data also include changes that cannot be attributable to bisulfite, gaps, etc.
Knowing more about the whole workflow would help.
UPDATE
Also, while I'm not familiar with the workflow, could such gaps not be related to methylation on the opposite strand? E.g. CGTA-->TACG.
Is there a particular methylation context that you are looking at? CpGs only?
In some cases there is support for weird variants (e.g. consensus is not an outgroup).
If it's sanger data, it's always a good idea to directly inspect traces.

Why do I get cytosine to guanine/adenine transitions in bisulphite treated sequences?

One Answer

Add your own answers!

Ask a Question