AnswerBun.com

How do I obtain a vcf read from a pyvcf.Reader?

Bioinformatics Asked on April 4, 2021

I wondered how would I be able to obtain variant based on chrom and location range.

Optimally, this function would be like:

import vcf
vcf_reader = vcf.Reader(filename=some_file_name)
vcf_read = vcf.variants_between(CHROM, START, END)

Does anyone how to do so?

One Answer

I think vcf_reader.fetch is what you want. You have to index the file first though.

If starting from regular text VCF you can run:

bgzip example.vcf
tabix example.vcf.gz

Then the Python code is almost the same as yours:

import vcf
vcf_reader = vcf.Reader(filename="example.vcf.gz")
vcf_read = vcf_reader.fetch("NC_023663.1", 7803980, 7804000)
for record in vcf_read:
    print(record)

For this example VCF you get:

Record(CHROM=NC_023663.1, POS=7803981, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803982, REF=T, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803983, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803984, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803985, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803986, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803987, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803988, REF=C, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803989, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803990, REF=T, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803991, REF=A, ALT=[G, <*>])
Record(CHROM=NC_023663.1, POS=7803992, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803993, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803994, REF=C, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803995, REF=T, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803996, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803997, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803998, REF=C, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803999, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7804000, REF=T, ALT=[<*>])

API reference: https://pyvcf.readthedocs.io/en/latest/API.html#vcf.Reader.fetch

Note the "zero-based, half-open coordinate system" it describes (which is why 7803980 in the second Python argument becomes POS=7803981 from the VCF but then it stops at POS=7804000).

Correct answer by Jesse on April 4, 2021

Add your own answers!

Related Questions

Viral Metagenomics

1  Asked on November 1, 2020 by l-r-joshi

     

Issues with AutoDock Vina

0  Asked on October 18, 2020 by ibio_rep1

       

Swapping to effect increasing allele in case/control studies

0  Asked on October 9, 2020 by dale-handley

     

Error When Using biocLite as an installer in rpy2 python library

1  Asked on September 27, 2020 by abiologist

     

Seurat DE t.test

1  Asked on August 11, 2020 by vdu12345

       

Number of reactions per metabolic pathway

0  Asked on August 11, 2020 by mmphysics

         

Find all the bases for given reference position

0  Asked on August 8, 2020 by diesel__100

   

Convert VCF to genotype table

1  Asked on July 30, 2020 by snowflake

       

Ask a Question

Get help from others!

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir