VCF Specification

November 12, 2021

The current version of VCF is 4.3. However, your files are likely formatted to version 4.1 because that’s the version available on Teton. The VCF format specification has an extensive manual available here.

1. VCF Header

The header is usually quite large and contains definitions of each field. The most important header line is the one that starts with #CHROM. Let’s look at it.

zgrep "#CHROM" vcf/CroVir.vcf.gz > vcf.header

vim vcf.header


We first sent the header to a text file and then replaced all tabs with line breaks so the file is easier to read.

2. The FORMAT Field

This is one of the most important fields in the VCF file. It consists of one or more of the following:

3. Using VCFTools

VCFTools is a very fast and wonderful program to parse and filter VCF files. Let’s run a few examples of what this program can do:

3.1 Summarize VCF Files

module load swset gcc vcftools perl

vcftools --gzvcf CroVir.vcf.gz

3.2 Filter Loci

vcftools --gzvcf CroVir.vcf.gz --min-alleles 2 --max-alleles 2
vcftools --gzvcf CroVir.vcf.gz --freq
vcftools --gzvcf CroVir.vcf.gz --chr 2
vcftools --gzvcf CroVir.vcf.gz --thin 20000