Bioinformatics Asked by Equinox on June 23, 2021
This question has also been asked on Biostars and StackOverflow
I’ve been trying to code (in R) a way to convert gene accession numbers to gene names (from RNAseq data). I’ve looked at all the related questions and tried to modify my code such, but for some reason it’s still not working. Here is my code, where charg
is a character vector of the gene accession ID’s of the data set resdata
:
charg <- resdata$genes
head(charg)
library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
theBM <- getBM(attributes='ensembl_gene_id','hgnc_symbol',
filters = 'external_gene_name',
values = charg,
mart = ensembl)
resdata <- merge.data.frame(resdata, theBM, by.x="genes",by.y="ensembl_gene_id")
Here’s some output (where I’m struggling):
> head(charg)
[1] "ENSG00000261150.2" "ENSG00000164877.18" "ENSG00000120334.15"
[4] "ENSG00000100906.10" "ENSG00000182759.3" "ENSG00000124145.6"
> dim(theBM)
[1] 0 1
> head(theBM)
[1] ensembl_gene_id
<0 rows> (or 0-length row.names)
> dim(resdata)
[1] 20381 11
> resdata <- merge.data.frame(resdata, theBM, by.x="genes",by.y="ensembl_gene_id")
> dim(resdata) #after merge
[1] 0 11 #isn't correct -- just row names! where'd my genes go?
Thank you.
This is the code to get a look-up table to convert between Ensembl ID and HGNC:
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
theBM <- getBM(attributes=c('ensembl_gene_id','hgnc_symbol'),
filters = c('ensembl_gene_id'),
values = gsub("..*", "", charg),
mart = ensembl)
What Devon was posting is correct but misses a c()
around the attributes values.
For further help please provide the content of resdata
which you should always do when posting a question, since we cannot read minds. Does not work
by the way is not a proper error description.
Once you have the output do:
resdata$genes <- gsub("..*", "", resdata$genes)
merge(x = theBM,
by.x = "ensembl_gene_id",
y = resdata,
by.y = "genes")
Note that I had to go to that SE crosspost to get the content of resdata
, this is not how this goes. Please post all relevant data up front in the future otherwise your questions might get downvoted and closed. Please also avoid cross-posting. if you provide proper information you usually get a good answer in time.
Edit: Just realized you also cross-posted this to Biostars even twice. Please stop this. I closed the Biostars posts and gave my two cents on this behaviour over there.
Correct answer by ATpoint on June 23, 2021
Those aren't external_gene_name's, they're ensembl_gene_id_versions:
theBM <- getBM(attributes='ensembl_gene_id','hgnc_symbol',
filters = 'ensembl_gene_id_version',
values = charg2,
mart = ensembl)
Note that you'll get more hits if you strip the gene ID versions off:
charg2 = sapply(strsplit(charg, '.', fixed=T), function(x) x[1])
theBM = getBM(attributes='ensembl_gene_id','hgnc_symbol',
filters = 'ensembl_gene_id',
values = charg2,
mart = ensembl)
Answered by Devon Ryan on June 23, 2021
4 Asked on February 18, 2021 by eb2127
4 Asked on February 15, 2021
2 Asked on February 12, 2021 by jared_mamrot
2 Asked on February 11, 2021 by gringer
1 Asked on February 11, 2021 by dn1
0 Asked on February 10, 2021
3 Asked on February 8, 2021
0 Asked on February 6, 2021
1 Asked on February 3, 2021
1 Asked on February 2, 2021
0 Asked on February 2, 2021 by gsq
0 Asked on February 2, 2021
delta delta ct normalization qpcr relative expression ratio r
0 Asked on February 1, 2021 by jantek-mikulski
1 Asked on January 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir