I’ve been trying to code (in R) a way to convert gene accession numbers to gene names (from RNAseq data). I’ve looked at all the related questions and tried to modify my code such, but for some reason it’s still not working. Here is my code, where
charg is a character vector of the gene accession ID’s of the data set
charg <- resdata$genes head(charg) library(biomaRt) ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") theBM <- getBM(attributes='ensembl_gene_id','hgnc_symbol', filters = 'external_gene_name', values = charg, mart = ensembl) resdata <- merge.data.frame(resdata, theBM, by.x="genes",by.y="ensembl_gene_id")
Here’s some output (where I’m struggling):
> head(charg)  "ENSG00000261150.2" "ENSG00000164877.18" "ENSG00000120334.15"  "ENSG00000100906.10" "ENSG00000182759.3" "ENSG00000124145.6" > dim(theBM)  0 1 > head(theBM)  ensembl_gene_id <0 rows> (or 0-length row.names) > dim(resdata)  20381 11 > resdata <- merge.data.frame(resdata, theBM, by.x="genes",by.y="ensembl_gene_id") > dim(resdata) #after merge  0 11 #isn't correct -- just row names! where'd my genes go?
This is the code to get a look-up table to convert between Ensembl ID and HGNC:
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") theBM <- getBM(attributes=c('ensembl_gene_id','hgnc_symbol'), filters = c('ensembl_gene_id'), values = gsub("..*", "", charg), mart = ensembl)
What Devon was posting is correct but misses a
c() around the attributes values.
For further help please provide the content of
resdata which you should always do when posting a question, since we cannot read minds.
Does not work by the way is not a proper error description.
Once you have the output do:
resdata$genes <- gsub("..*", "", resdata$genes) merge(x = theBM, by.x = "ensembl_gene_id", y = resdata, by.y = "genes")
Note that I had to go to that SE crosspost to get the content of
resdata, this is not how this goes. Please post all relevant data up front in the future otherwise your questions might get downvoted and closed. Please also avoid cross-posting. if you provide proper information you usually get a good answer in time.
Edit: Just realized you also cross-posted this to Biostars even twice. Please stop this. I closed the Biostars posts and gave my two cents on this behaviour over there.
Correct answer by ATpoint on June 23, 2021
Those aren't external_gene_name's, they're ensembl_gene_id_versions:
theBM <- getBM(attributes='ensembl_gene_id','hgnc_symbol', filters = 'ensembl_gene_id_version', values = charg2, mart = ensembl)
Note that you'll get more hits if you strip the gene ID versions off:
charg2 = sapply(strsplit(charg, '.', fixed=T), function(x) x) theBM = getBM(attributes='ensembl_gene_id','hgnc_symbol', filters = 'ensembl_gene_id', values = charg2, mart = ensembl)
Answered by Devon Ryan on June 23, 2021
4 Asked on February 18, 2021 by eb2127
4 Asked on February 15, 2021
2 Asked on February 12, 2021 by jared_mamrot
2 Asked on February 11, 2021 by gringer
1 Asked on February 11, 2021 by dn1
0 Asked on February 10, 2021
3 Asked on February 8, 2021
0 Asked on February 6, 2021
1 Asked on February 3, 2021
1 Asked on February 2, 2021
0 Asked on February 2, 2021 by gsq
0 Asked on February 2, 2021
0 Asked on February 1, 2021 by jantek-mikulski
1 Asked on January 30, 2021
Get help from others!