TransWikia.com

VCF spec: is it possible to have other alleles in addition to the MISSING value ('.') in ALTS?

Bioinformatics Asked by revl on August 22, 2021

I’m writing a VCF parser, so I have to consider and handle all corner cases regardless of how contrived they may seem. The specification is a bit unclear about the MISSING value (‘.’) in the ALTS column:

Options are base Strings made up of the bases A,C,G,T,N,*, (case insensitive) or a MISSING value ‘.’ (no variant) or an angle-bracketed ID String (“”) or a breakend replacement string.

I’ve seen examples with a single dot in the ALTS column:

4       31789170        PTV021  G       .       77      PASS    .

The question is whether the following data lines are also valid:

1       12345   ID1     A       .,T,.   22.88   PASS    .
1       12346   ID2     G       C,.     22.88   PASS    .

In other words, does MISSING indicate that the entire ALTS field is missing, or does it mean that there’s a missing allele?

By extension, how do I represent the case when there’s a single dot in the ALTS field (as in the first example)? Is it an empty list (because the whole field is MISSING) or is it a list containing a MISSING value? In other words, is it [] or ["."]?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP