# If a gene is expressed at a level of 1/1200 compared to the average gene, how is probability 50:50 that we have a read mapped to it?

Bioinformatics Asked by Bio314 on December 20, 2020

"To calculate the probability that a read will map to a specific gene, we can assume an average gene size of 4000 nt (100 M nt divided by 25,000 genes). At 30 M reads equivalent to 30× coverage, at single read 100 nt (or paired-end read 50 nt) length, we can expect a single read to map to the average expressed and length gene 4000 nt× 30 coverage/100 nt 1200 times. Thus, if the gene is expressed at a level of 1/1200 compared to the average gene, then we have a 50:50 probability to have a read map to it."

I understand that a single read would map to the average expressed 1200 times but I don’t understand how the 50:50 probability is calculated. Please could you explain how this probability is determined.

I actually disagree with that.. I guess I write this down as a discussion.

The expected value for an average gene is 1200. For this gene at 1/1200 expression, you expect 1 read.

However, because of sampling, sometimes you get 0, 1 or 2. If we assume the sampling follows a poisson distribution, we can calculate the probability of getting 0, 1 , 2 ... reads:

plot(dpois(0:10,lambda=1),ylab="Probability",xlab="No of reads")


The probabilities are like:

data.frame(n=0:5,p=dpois(0:5,lambda=1))

n           p
1 0 0.367879441
2 1 0.367879441
3 2 0.183939721
4 3 0.061313240
5 4 0.015328310
6 5 0.003065662


The probability of getting exactly 0 or 1 read should be ~0.37. The probability of getting at least 1 read would be 1 - P(n=0) = 1- 0.367879441 = 0.6321206

Answered by StupidWolf on December 20, 2020

## Related Questions

### Unable to open .bam file in C++ using SeqAn due to ‘seqan::UnknownExtensionError’

3  Asked on August 10, 2021

### How to export summary data from Bowtie2 for MultiQC to read?

1  Asked on August 7, 2021

### What are some ways to error correct Oxford Nanopore long read sequencing?

1  Asked on August 3, 2021

### BioPython – Retrieve sequence records from pubmed database

0  Asked on August 3, 2021

### How to plot a PCOA biplot with OTU loadings as arrows

0  Asked on August 2, 2021 by gal-t

### Nonbonded interactions on the GPU

1  Asked on July 31, 2021

### CIBERSORT runtime error eval failed

1  Asked on July 28, 2021 by dr_hope

### How can I get bedtools to tell me which genes are being expressed?

1  Asked on July 27, 2021

### Receptor-ligand database

3  Asked on July 23, 2021

### Standard Way to Preprocess Gene Expression?

1  Asked on July 23, 2021 by wedgeantilles

### Is there a way to measure cell line similarity using python?

1  Asked on July 22, 2021 by yousef

### How to convert a SDF molecular structure file to a pandas dataframe with Python?

1  Asked on July 21, 2021

### cmapPy exception

1  Asked on July 20, 2021 by blue-bells

2  Asked on July 19, 2021 by c-zeil

### Understanding “Measurement of mRNA abundance using RNA-seq data” by Wagner et al 2012

1  Asked on July 19, 2021

### Understanding ViennaRNA RNAdistance scoring table

1  Asked on July 18, 2021

### Viral genome finishing

1  Asked on July 17, 2021

### Installing HiC-Pro issues

2  Asked on July 16, 2021

### “IndexError: list index out of range” in snakemake

0  Asked on July 15, 2021 by vitaly

### How to remove sequences from a fasta file using a sequence ID list which contains a space within the id?

2  Asked on July 15, 2021