TransWikia.com

How do I download the mitochondrial haplogroup datasets for human genetics online?

Bioinformatics Asked on February 2, 2021

I seem to have landed on the mitomap.org site, but I don’t know what to make of it or what do with it / how to get the genomes onto my computer. It sounds like the genomes are stored in GenBank, but that mitomap simply lists which of the GenBank genomes are human mitochondrial haplogroup genomes.

First of all, I can’t find a simple list of links or list of names of the haplogroups the mitomap has… Can you show me where I can get or download that? They say:

This brings our total number of FL sequences to 51,673, and the number of CR sequences to 74,660. Our SNVs now total 19,227.

I don’t know what FL or CR (Control Region sequences?) means, but I am expecting to see lists of genomes or something. What am I to do with this information? Where do I find this data? Is this it? If I click on one of the results, I see this. It has path nuccore/MT742594.1, can I find this in some NCBI FTP server somewhere? I don’t see anything of that folder structure in the GenBank FTP server

Basically my question is:

  1. Where is the complete list of human mitochondrial haplogroup genomes?
  2. Where can I download all of those human mitochondrial haplogroup genomes so I can get them onto my computer?

One Answer

Hmmm, I am not experienced enough in biology or bioinformatics to truly understand the question, but I am guessing you want to download the FASTA file or genebank file containing the sequence of the accession number you listed above. There are 2 ways to do this -

  1. Using your browser, if you want to download a small sample of files. On the right hand side of your screen you can see the "send to" button on this link, click on it and choose the what's appropriate for you and then click on create file.

https://www.ncbi.nlm.nih.gov/nuccore/MT742594.1

  1. If you familiar with Biopython and python programming you can download multiple files at once in whatever format you want, there is an easy script to do this and has been taken from Biopython's documentation on biopython.org

    import os

    from Bio import SeqIO

    from Bio import Entrez

    Entrez.email = "[email protected]" # Always tell NCBI who you are

    filename = "MT742594.1.gb"

    if not os.path.isfile(filename):

     # Downloading...
    
     net_handle = Entrez.efetch(
    
         db="nucleotide", id="MG762674", rettype="gb", retmode="text"
    
     )
     out_handle = open(filename, "w")
    
     out_handle.write(net_handle.read())
    
     out_handle.close()
    
     net_handle.close()
    
     print("Saved")
    

    print("Parsing...")

    record = SeqIO.read(filename, "gb")

    print(record)

You can make a list of the accession numbers of the sequences you want to download and pass the list in the id, if you are curious as to how the code works visit biopython.org .

Answered by Neeleshwar Pandey on February 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP