TransWikia.com

Annotating gene names or gene IDs to a dataframe containing SNPs?

Bioinformatics Asked on May 29, 2021

I have a large data-frame (excel file) of SNPs with genotyping data. I need to filter the data to get SNP information of a specific gene alone. The list is too long to do it manually. I was wondering of annotating the SNPs with the gene names so that I can filter using the gene name. Does anyone know of a good R package to perform the task?

One Answer

I assume your SNP file contains genomic positions - please clarify if not. For example, you could have an rsID column, chromsome, and position column that contain "rs3088379", "chr1", "54225281".

Approach #1: use R, as you suggest.

ensembldb is an R package that may be what you're looking for: https://bioconductor.org/packages/release/bioc/html/ensembldb.html.

You could map the genomic positions of your SNPs to gene names by adapting their workflow: https://bioconductor.org/packages/release/bioc/vignettes/ensembldb/inst/doc/coordinate-mapping.html. After you've mapped the genomic position, you could filter for your gene name.

Approach #2: use bedtools.

You can convert your Excel file of genomic positions to a tab-delimited BED format. https://bedtools.readthedocs.io/en/latest/content/overview.html has an overview. So, your first SNP would turn into

chr1 54225280 54225281 rs3088379

--> make sure you turn 1-based to 0-based coordinates! You could then download the coordinates of your gene in BED format from your favorite source, like UCSC Table Browser.

Then, you run bedtools intersect with your SNP BED and your gene BED. What remains is just that gene's SNPs.

Correct answer by rk13 on May 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP