TransWikia.com

How to import large .bed, .gff, .vcf, .paf, .sam files into an SQL database?

Bioinformatics Asked on June 3, 2021

Are there best practices to load different bioinformatics file formats such as VCF, BED, GFF, and SAM to SQL databases? I am wondering how people out there do that efficiently.

All of these three formats are tab-separated files, so basically the following should work. I feel weird about it since most people I know don’t use MySQL to work with these files.

LOAD DATA LOCAL INFILE 'bed.bed' INTO TABLE bed-file FIELDS TERMINATED BY 't' LINES TERMINATED BY 'n' IGNORE 1 ROWS (list of the columns) SET creation_date  = STR_TO_DATE(@creation_date, '%m/%d/%y');

One Answer

Answer from @liam-mcintyre converted from comment:

I don't use dask as it doesn't support enough pandas functionality (unfortunately). With pandas I do it with read_csv... if its big then read in chunks and send chunks to separate threads. If you want to ask a specific question with example data etc then I can show code.

Answered by gringer on June 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP