AnswerBun.com

Finding discriminating positions for a sequence alignment column

Bioinformatics Asked by space_gladiator on March 25, 2021

I don’t have any theoretical background in biology, so forgive me if my question is a bit.. well..dumb. I’m trying to use a Monte Carlo approach to find the discriminating position of a given sequence alignment column. So essentially, given a sequence alignment, I want to find the position at which we can distinguish active and inactive conformations.

So, here is my code for the Monte Carlo method:

def MonteCarlo(S, max_iter, threshold):
    """
    Computes the discriminating position for a sequence column

    Input: S = sequence column, max_iter = max iterations, threshold = threshold of acceptance for boltzmann distribution (between 0 and 1)
    Output: 2 classes
    """
    n = len(S)
    S_copy = S.copy()
    num_class1 = int(0.30*n) #number in class1
    print(num_class1)
    for _ in range(num_class1): #intialisation
        rand_position = random.randint(0,len(S)-1)
        if S_copy[rand_position] == 'X':
            rand_position = random.randint(0,len(S)) #reselect if used position
        S_copy[rand_position] = 'X' 

    #Monte Carlo with Insertions and Deletions

    s_entropy = entropy(S)
    n_iter = 0 #counter for iterations
    class_1 = dict() #class 1 determined by X's
    class_2 = dict() #class 2 is rest of S
    for i in range(len(S)):
        if S_copy[i] == 'X':
            class_1[i] =S[i]
        else:
            class_2[i] =S[i]

    res = score(S, class_1, class_2)

    while n_iter < max_iter:
        r = random.random()

        if r < 0.25: #25% chance to do an insertion
          (c1,c2,pos) = Insertion(S, class_1, class_2)

        elif r < 0.5: #25% chance to do a deletion
          (c1,c2,pos) = Deletion(S, class_1, class_2)

        else: # 50% chance to do a swap
          (c1,c2,pos) = Swap(S, class_1, class_2)

        res2 = score(S, c1, c2)

        #Boltzmann Distribution coefficient 
        delta_S = abs(res2 - res)
        lambda_, N = 0.2, 19
        size = 6000
        ind1 = random.randint(0,size - 1)
        rv = boltzmann.rvs(lambda_, N, size=size)/N 
        beta = rv[ind1] #pick float between 0 and 1 with boltzmann distribution
        if  (res2 > res) or (beta*delta_S > threshold): #if better score i.e. higher entropy gain
            res = res2
            class_1 = c1
            class_2 = c2

        n_iter += 1


    return class_1, class_2, pos

Here I’m using the Boltzmann distribution as a probability measure. All the other functions (swap, insertion, deletion, entropy) calculate pretty much what the name says (if you need more clarification on my side, I’d be happy to provide it!)

Let me explain what my code is trying to do: Given a sequence alignment (as a list) say [‘M’,’M’,’M’,…,’L’,’L’,’L’] I first set a bunch of random positions and mark them out as ‘X’ (the number of Xs are determined by num_class1). Next, in the while loop I do swaps,insertions, and deletions with the goal of obtaining a better score. This meansI’m adding, removing, or swapping positions between class1 and class2. Eventually, my code returns two classes and the discriminating position.

While my code works, I don’t think it’s giving the right result. When I use a base case example of a list with two letters ‘M’ and ‘L with a clear discriminating position, I get random results.

Could someone explain theoretically (or via code) how one can find the discriminating position using a Monte Carlo method as above?

Thanks!

One Answer

Without asking further questions, my suspicion would be how indels (gaps) are counted in your code against the aligment. Its the classic place for alignment mis-position errors. In other words, based on my experience your calculation probably performs fine, but linking the calculated alignment position against the actual observed alignment position is classic area of logical errors and needs careful monitoring of variables using code breakpoints within your loops.

Answered by M__ on March 25, 2021

Add your own answers!

Related Questions

Principle of TMT Tags in Multiplex Proteomics

1  Asked on August 22, 2021 by osaama-shehzad

     

How I change a function not to need an input

1  Asked on August 22, 2021 by exhausted

 

How do I get cromwell to create soft-links?

0  Asked on August 22, 2021 by greg-dougherty

 

How can I restrict the search space when querying GEO?

1  Asked on August 22, 2021 by an-ignorant-wanderer

 

Ask a Question

Get help from others!

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir