Specific cell type identification in Single Cell Sequencing

Bioinformatics Asked on September 27, 2021

In order to define which cell is of which type we need to identify a set of rules, for instance neurons should express one of the following: Thy1, Rbfox3, MAP2, Camk2b, Gad1,Cck, Reln, and should not express any of the following: cd45, Tmem119, CD11b, … and others. Maybe Rbfox3 should always be expressed in any neurons. So, these rules need to be compiled manually as I understand to identify each cell type, from the literature. I would assume that many people already faced this issue, and maybe came up with software to do that? Are there any software where we would be able to supply a matrix of single cell expression data and have the types assigned with certain degree of certainty?

I know that maybe we can also use heatmaps for that, like:

cell type specific genes – heatmap using rank based approach

But again, this seems too manual work. Besides, I do not think that we can assign different neuronal types based on just a heatmap.

4 Answers

I do not know of such software.

However, I believe this effort is a bit misdirected. The purpose of single-cell sequencing is to get a better understanding of cells; their heterogeneity and functional diversity or developmental / biological processes such as differentiation, using a higher "resolution" method. In other words, if we had the methods you are asking for, there would be no need for the single-cell experiment.

Running your data through some pipeline utilizing previous knowledge has the danger of forcing ideas on the data, rather than seeing what the data tells you: it would be better to try and understand and explain the biologically relevant heterogeneity and diversity in your cells, together with a critical comparison of cell type characteristics (expression of genes, pathways) with knowledge described in literature. This question is also very similar to the one you previously asked and was answered.

Also good to recognize that your approach has a lot of implicit assumptions, such as: cells can be clearly categorized as +/- expression for each gene, or it ignores the systems level (networks and pathways), thus again forcing concepts on the data.

This said, one method would be to construct a table listing cell type markers as described in the linked answer above, then write a script that determines a cutoff value for expression of genes in your data (see this), then ranks cell types for each cell. For example, you can measure the number or proportion of uniquely genes expressed (although that's very simplistic).

Another option, if you are familiar with machine learning, is to train a classifier on an annotated dataset, and then use that on new data.

Also see a convenience function below, which requires a table of marker genes for each cell cluster (i.e. (the Seurat::FindAllMarkers() output), and a reference df listing genes and corresponding cell types in its HGNC_symbol and Cell_type columns, and returns the table with listing clusters, their marker genes and corresponding cell types.

getCelltypes <- function(markers, reference) {
  marker.celltype <- markers
  marker.celltype$Cell_type <- marker.celltype$gene
  marker.celltype$Cell_type <- with(reference, celltype[match(marker.celltype$Cell_type, gene)])

Many variants of this reference table and function can be created, feel free to modify it.

In the future, I think we can expect such classifier software that uses reference data from the Human Protein Atlas, HCA and similar projects.

Correct answer by Peter on September 27, 2021

SingleR, a novel computational method for unbiased cell type recognition of scRNA-seq:

Answered by Shrek on September 27, 2021

singleCellNet is a computational method using supervised machine learning for quantitatively cell type annotation. singleCellNet also enables cross-platform and cross-species comparison. It is also actively-supported.

Here is the github , and the vignette

Answered by yuqi_yuqi on September 27, 2021

We have recently published a web-based tool called CIPR (cluster identity predictor) that helps with cluster annotations. You can select one of 7 reference datasets (some contains neurons) or upload a custom reference gene expression file for predicting cluster identity. You can read the manuscript for more information about how the algorithm works, but in a nutshell, it compares differentially expressed genes or global expression patterns within your data to known reference datasets and creates quick visual outputs. I find it helpful for interactive analyses where I try different clustering parameters and such.

This article is a good resource that compares different cluster calling algorithms.

Answered by Atakan on September 27, 2021

Add your own answers!

Related Questions

Changing active.ident in Seurat

1  Asked on January 1, 2022


can I download DESeq2 in R 3.6.3 in Linux MInt?

2  Asked on December 30, 2021


no result from heat map WGCNA

2  Asked on December 22, 2021


Kraken2 or metaphlan2 report to phyloseq

0  Asked on December 11, 2021


Comparing aligned amino acids to codon

1  Asked on December 7, 2021 by kcm


Ask a Question

Get help from others!

© 2023 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir