TransWikia.com

Statistics: How are protein species distributed over cell types?

Biology Asked on April 4, 2021

There are roughly 10,000 to 20,000 protein species in the human proteome (while I’ve seen also numbers of 500,000 to 1,000,000). Furthermore, there are roughly 200 different cell types in the human body. My question is:

How are the protein species distributed over the cell types?

This means specifically:

How many proteins are expressed by n cell types?

How many cell types express n proteins?

Probably these numbers are not known exactly because not all the proteins may be known that a given cell type expresses.

But there might be evidence which general form the two distribution curves do have. Are they Poisson distributions and do look more like this?

enter image description here

Or this?

enter image description here

Or that?

enter image description here

Or some other kind of distribution, e.g. Gauss or even multi-modal?

One Answer

From my searches, there is no single resource that includes an atlas of all human proteins produced across all human cell types. However, there are several recent mass spectrometry studies that look at cell-type-resolved proteomes for specific human and mouse tissues that provide some insight into the distribution of proteins across different cells.

A Cell-type-resolved Liver Proteome

This study identified between 6200 and 8500 mouse gene products in each of four cell types -- hepatocytes, hepatic stellate cells, Kupffer cells, and liver sinusoidal endothelial cells -- and a total of 10,075 gene products across all four cell types. Figure 1D shows that there is significant overlap between the proteomes; 5,246 proteins (52.1%) are shared by all four cell types, and only 1,451 proteins (14.4%) are cell-type exclusive in this set. Besides cell-resident proteins, the authors also report secreted proteins unique to hepatocytes and Kupffer cells. These results by-and-large corroborate an earlier publication, Cell-Type-Resolved Quantitative Proteomics of Murine Liver, where the authors reported 8,338 of 11,520 (72.4%) proteins were common to the five hepatic cell types tested.

Cell type- and brain region-resolved mouse brain proteome

This study looked at four types of isolated neurons from specific regions of the mouse brain as well as five types of primary cultured neurons. Of 13,061 total proteins, 10,529 (80.6%) are common to the five cultured cell types tested, and only 194 (1.4%) are unique to a single cell type. Figure 2D makes clear that protein abundance across cell types - not just binary presence/absence - is important to cell identity. (More on that below...)

Region and cell-type resolved quantitative proteomic map of the human heart

Getting to the human proteins mentioned in your question, this publication analyzed three cardiac cell types and adipose fibroblasts across 16 anatomical regions, giving both spatial and functional perspectives on protein type distribution in the human heart. Figure 5A, like Figure 1D for the mouse liver proteome paper, shows a great amount of overlap of proteins between cell types: of 11,163 total proteins, 7,965 (71.4%) are common to all cell types (including adipose fibroblasts), and 617 (5.5%) are unique to one type of cell. Interestingly, and perhaps not surprisingly, the subsets of proteins unique to specific cells are enriched for cell surface markers. (Related:The in silico human surfaceome)

Social network architecture of human immune cells unveiled by quantitative proteomics

These authors identified more than 10,000 different proteins across 28 primary human hematopoietic cell populations, with three or four biological replicates per cell type, including 17 distinct types of immune cells. They identified an average of 9,500 proteins per cell type, and, for the immune cells, generated both "steady state" and "activated" proteomes, giving insight into how the proteome changes both across cell types and within each type between states.

To directly answer your question,

How are the protein species distributed over the cell types?

I took the data from Supplementary Table 6, averaged the replicates, converted protein copy-number values to a binary presence/absence matrix, and computed the number of immune cell types (1 - 17) represented by each unique protein ID. The inset graph combines cell type and activation state labels to ask whether the effect of activation dominates over cell-type differences in proteome divergence.

immune cell proteome comparison, binary

Presence/Absence, no threshold -- Unique protein distribution across 17 primary immune cells types. A protein is considered "present" in a cell type if average copy-number value is greater than zero. Data from Rieckmann et al. Nat Immunol. 2017.

Because the inferred protein copy-numbers from the mass spectrometry data cover a high dynamic range, I also looked at the distribution of proteins that had average copy-number values at least double the zero-depleted cell-type-specific median copy-number value.

median threshold, immune cell proteomes

Highly abundant proteins -- Unique protein distribution across 17 primary immune cells types. Protein-cell pairs are only counted if the average protein copy-number is at least double the median protein copy-number for each cell type. Data from Rieckmann et al. Nat Immunol. 2017.

Taken together, these distributions suggest a few things:

  • for this diverse set of immune cells, a majority of proteins are present in all cell types when looking at strict presence/absence data
  • after subsetting for highly abundant proteins, a bimodal distribution appears, suggesting that protein abundance is a better metric for functional comparison of proteomes than a simple binary metric
  • the proteomes of one cell type between different activation states are more similar than the proteomes of different cell types in the same state

For anyone interested, I've made the cleaned-up data available on Dropbox.

A more complete answer might combine the data from the immune cell and heart cell publications to get a sense of proteome concordance across tissues, but, just from a cursory analysis, differences in protein labels between the datasets would make comparison tedious, so I'll leave that to someone else!

Answered by acvill on April 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP