TransWikia.com

Peak-calling in CLIP: What is the effect of RNA-concentration?

Biology Asked by KaPy3141 on February 6, 2021

I hope it’s ok to repost my question from 8 months ago from StackExchange:Bioinformatics, that is still in beta.
https://bioinformatics.stackexchange.com/questions/10730/peak-calling-in-clip-what-is-the-effect-of-rna-concentration


Question:

I would like to ask how RNA-concentrations influence the diverse CLIP peak-callers in their evaluation of peaks. The peak-callers I would like to understand are: Piranha, CIMS, Paralyzer and the one used in eCLIP (CLIPper). (After weeks of literature research I still can’t find an answer!)


More details:

If the concentration of an RNA is too low it can’t be cross-linked to the tested protein and no cross-link-peaks will be found, independent on the peak-caller and independent on whether there would be physicochemical interaction hot-spots (peaks). (obviously)

But what is the consequence of increasing the RNA-concentration from a "sufficient", to a high or extremely high level?

Which of the peak-callers will find more peaks on an RNA, the higher it’s concentration and which peak-callers could be considered to be robust? (Robust in a sense that the score of the peaks is independent of the RNA concentration. Also the number of called peaks of a particular RNA-protein interaction depends mainly on the number of "real interaction-hotspots" and would not necessarily increase with concentration. )

Also let’s assume the interaction frequency/density is quite uniform (no physical hot-spots): Then which peak-callers would almost necessarily start calling (more) peaks, the higher the concentration of the RNA was in the experiment?


Some considerations:

I know that there are many RNAs that don’t interact with any proteins (in the databases) and that there are some RNAs that interact basically everywhere throughout their sequence to almost all tested proteins. (Potentially this is observed due to concentration dependencies!)

CIMS & Paralyzer, to my understanding, take the fitted crosslink-induced mutation rate at the specific sites to evaluate peaks. This ratio (mutated/total), in theory seems robust to me.

Piranha, as I understand, tries to fit the raw data to a sum of many individual binomial-functions, basically assuming the bulk of reads as background noise, only evaluating local peaks in read-counts relative to the background. This seems robust to effects of very high concentrations.

I would be thankful for any comments or references!

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP