VCF Utilities

Automates various utilites for VCF-formatted files. This currently includes: obtain a list of the chromosomes within a VCF-based file, obtain a list of the samples within a VCF-based file, concatenate multiple VCF-based files, merge multiple VCF-based files, and sort a VCF-based file.

Command-line Usage

The VCF utilites function may be called using the following command:


Example usage

Concatenate multiple VCF files:

python --vcfs chr21.vcf.gz chr22.vcf.gz --utility concatenate

Merge multiple VCF files:

python --vcfs chr22.ceu.vcf.gz chr22.yri.vcf.gz --utility merge


Input Command-line Arguments

--vcf <input_filename>
Argument used to define the filename of the VCF file.
--vcfs <input_filename> <input1_filename, input2_filename, etc.>
Argument used to define the filename of the VCF file(s). May be used multiple times.

Output Command-line Arguments

--out <output_filename>
Argument used to define the complete output filename, overrides --out-prefix.
--out-prefix <output_prefix>
Argument used to define the output prefix (i.e. filename without file extension)
Argument used to define if previous output should be overwritten.

Utility Command-line Specification

--utility <sample-list, chr-list, concatenate, merge, sort>
Argument used to define the desired utility. Current utilities include: creation of a file of the samples within the VCF (sample-list); creation of a file of the chromosomes within the VCF (chr-list); combine multiple VCF files with different variants but the same samples (concatenate); combine multiple VCF files with different samples but the same variants (merge); or sort a single VCF file (sort).

Additional Utility Command-line Arguments

--record-merge-mode <none, snps, indels, both, all, id>
Argument used to define the type of multiallelic records to create. Only usable with the merge utility.
Argument used to define that missing records should be converted to the reference allele. Only usable with the merge and concatenate utilites.
--out-format <vcf, vcf.gz, bcf>
Argument used to define the desired output format. Formats include: uncompressed VCF (vcf); compressed VCF (vcf.gz) [default]; and BCF (bcf). Only usable with the merge and concatenate utilites.