TransWikia.com

How do I obtain a vcf read from a pyvcf.Reader?

Bioinformatics Asked on April 4, 2021

I wondered how would I be able to obtain variant based on chrom and location range.

Optimally, this function would be like:

import vcf
vcf_reader = vcf.Reader(filename=some_file_name)
vcf_read = vcf.variants_between(CHROM, START, END)

Does anyone how to do so?

One Answer

I think vcf_reader.fetch is what you want. You have to index the file first though.

If starting from regular text VCF you can run:

bgzip example.vcf
tabix example.vcf.gz

Then the Python code is almost the same as yours:

import vcf
vcf_reader = vcf.Reader(filename="example.vcf.gz")
vcf_read = vcf_reader.fetch("NC_023663.1", 7803980, 7804000)
for record in vcf_read:
    print(record)

For this example VCF you get:

Record(CHROM=NC_023663.1, POS=7803981, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803982, REF=T, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803983, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803984, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803985, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803986, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803987, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803988, REF=C, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803989, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803990, REF=T, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803991, REF=A, ALT=[G, <*>])
Record(CHROM=NC_023663.1, POS=7803992, REF=G, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803993, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803994, REF=C, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803995, REF=T, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803996, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803997, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803998, REF=C, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7803999, REF=A, ALT=[<*>])
Record(CHROM=NC_023663.1, POS=7804000, REF=T, ALT=[<*>])

API reference: https://pyvcf.readthedocs.io/en/latest/API.html#vcf.Reader.fetch

Note the "zero-based, half-open coordinate system" it describes (which is why 7803980 in the second Python argument becomes POS=7803981 from the VCF but then it stops at POS=7804000).

Correct answer by Jesse on April 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP