Finding discriminating positions for a sequence alignment column

Question

I don't have any theoretical background in biology, so forgive me if my question is a bit.. well..dumb. I'm trying to use a Monte Carlo approach to find the discriminating position of a given sequence alignment column. So essentially, given a sequence alignment, I want to find the position at which we can distinguish active and inactive conformations. So, here is my code for the Monte Carlo method: def MonteCarlo(S, max_iter, threshold): """ Computes the discriminating position for a sequence column Input: S = sequence column, max_iter = max iterations, threshold = threshold of acceptance for boltzmann distribution (between 0 and 1) Output: 2 classes """ n = len(S) S_copy = S.copy() num_class1 = int(0.30*n) #number in class1 print(num_class1) for _ in range(num_class1): #intialisation rand_position = random.randint(0,len(S)-1) if S_copy[rand_position] == 'X': rand_position = random.randint(0,len(S)) #reselect if used position S_copy[rand_position] = 'X' #Monte Carlo with Insertions and Deletions s_entropy = entropy(S) n_iter = 0 #counter for iterations class_1 = dict() #class 1 determined by X's class_2 = dict() #class 2 is rest of S for i in range(len(S)): if S_copy[i] == 'X': class_1[i] =S[i] else: class_2[i] =S[i] res = score(S, class_1, class_2) while n_iter < max_iter: r = random.random() if r < 0.25: #25% chance to do an insertion (c1,c2,pos) = Insertion(S, class_1, class_2) elif r < 0.5: #25% chance to do a deletion (c1,c2,pos) = Deletion(S, class_1, class_2) else: # 50% chance to do a swap (c1,c2,pos) = Swap(S, class_1, class_2) res2 = score(S, c1, c2) #Boltzmann Distribution coefficient delta_S = abs(res2 - res) lambda_, N = 0.2, 19 size = 6000 ind1 = random.randint(0,size - 1) rv = boltzmann.rvs(lambda_, N, size=size)/N beta = rv[ind1] #pick float between 0 and 1 with boltzmann distribution if (res2 > res) or (beta*delta_S > threshold): #if better score i.e. higher entropy gain res = res2 class_1 = c1 class_2 = c2 n_iter += 1 return class_1, class_2, pos Here I'm using the Boltzmann distribution as a probability measure. All the other functions (swap, insertion, deletion, entropy) calculate pretty much what the name says (if you need more clarification on my side, I'd be happy to provide it!) Let me explain what my code is trying to do: Given a sequence alignment (as a list) say ['M','M','M',...,'L','L','L'] I first set a bunch of random positions and mark them out as 'X' (the number of Xs are determined by num_class1). Next, in the while loop I do swaps,insertions, and deletions with the goal of obtaining a better score. This meansI'm adding, removing, or swapping positions between class1 and class2. Eventually, my code returns two classes and the discriminating position. While my code works, I don't think it's giving the right result. When I use a base case example of a list with two letters 'M' and 'L with a clear discriminating position, I get random results. Could someone explain theoretically (or via code) how one can find the discriminating position using a Monte Carlo method as above? Thanks!

M__ · Answer

Without asking further questions, my suspicion would be how indels (gaps) are counted in your code against the aligment. Its the classic place for alignment mis-position errors. In other words, based on my experience your calculation probably performs fine, but linking the calculated alignment position against the actual observed alignment position is classic area of logical errors and needs careful monitoring of variables using code breakpoints within your loops.

Answered by M__ on March 25, 2021

Finding discriminating positions for a sequence alignment column

One Answer

Add your own answers!

Ask a Question