vcf_utilities.py: VCF Utilities¶
Automates various utilites for VCF-formatted files. This currently includes: obtain a list of the chromosomes within a VCF-based file, obtain a list of the samples within a VCF-based file, concatenate multiple VCF-based files, merge multiple VCF-based files, and sort a VCF-based file.
Command-line Usage¶
The VCF utilites function may be called using the following command:
python vcf_utilites.py
Example usage¶
Concatenate multiple VCF files:
python vcf_utilites.py --vcfs chr21.vcf.gz chr22.vcf.gz --utility concatenate
Merge multiple VCF files:
python vcf_utilites.py --vcfs chr22.ceu.vcf.gz chr22.yri.vcf.gz --utility merge
Input Command-line Arguments¶
- --vcf <input_filename>
- Argument used to define the filename of the VCF file.
- --vcfs <input_filename> <input1_filename, input2_filename, etc.>
- Argument used to define the filename of the VCF file(s). May be used multiple times.
Output Command-line Arguments¶
- --out <output_filename>
- Argument used to define the complete output filename, overrides --out-prefix.
- --out-prefix <output_prefix>
- Argument used to define the output prefix (i.e. filename without file extension)
- --overwrite
- Argument used to define if previous output should be overwritten.
Utility Command-line Specification¶
- --utility <sample-list, chr-list, concatenate, merge, sort>
- Argument used to define the desired utility. Current utilities include: creation of a file of the samples within the VCF (sample-list); creation of a file of the chromosomes within the VCF (chr-list); combine multiple VCF files with different variants but the same samples (concatenate); combine multiple VCF files with different samples but the same variants (merge); or sort a single VCF file (sort).
Additional Utility Command-line Arguments¶
- --record-merge-mode <none, snps, indels, both, all, id>
- Argument used to define the type of multiallelic records to create. Only usable with the merge utility.
- --record-missing-as-ref
- Argument used to define that missing records should be converted to the reference allele. Only usable with the merge and concatenate utilites.
- --out-format <vcf, vcf.gz, bcf>
- Argument used to define the desired output format. Formats include: uncompressed VCF (vcf); compressed VCF (vcf.gz) [default]; and BCF (bcf). Only usable with the merge and concatenate utilites.