Four Gamete Test Function

The four-gamete test is a method for determining whether or not there has been recombination between a pair of variants. To do this, all individuals must have haplotypes defined as the variants at the two sites.


In this illustration of four-gamete test, the haplotypes of the samples from 197337 to 199256 (highlighted in green) pass the four-gamete test. In comparison, the haplotypes from 196944 to 197337 and from 199256 to 199492 (highlighted in red) both fail the four-gamete test as all possible haplotypes are observed.

Given phased input with individual variants over a region of the genome, four_gamete generates an interval within those variants that passes the four-gamete filtering criteria, then return either that interval or an output file with variants in that interval.

Common usage for this function is to input a VCF file that contains variants for individuals at a single locus, with output returned being a VCF that contains a subsample of these variants. A full VCF can be used with --vcfreg, where the second argument is a BED file with one or more regions, output will be either a VCF for four-gamete passing regions or a new BED file with the truncated regions.

Input Arguments

--vcfs <input_vcf_1>...*<input_vcf_n>*
Input name of one or more VCF files, where each VCF represents a locus.
--vcfreg <input_vcf> <BED file>
Input name of VCF file containing genome data and name of BED file with regions to be analyzed.

Output Aguments

--out <output_filename>
Name for output file.
--out-prefix <ouput_prefix>
If multiple files are output, this option is required to set a prefix for the output files.

Interval Arguments

--numinf <minimum informative site count>
Region returned must have at least n informative sites, defaults to 1
If set, returns intervals with at least one recombination event instead of regions with no recombination.
This script will generate a list of valid regions with no recombination. Selecting this option will return a single interval as specified by other arguments
Returns all valid intervals, either as a list of intervals or multiple output files

Single Returned Region Arguments

Select one of: --rani

Returns random interval (default)
Returns random interval, with probability of interval proportional to interval length
Return first interval with enough informative sites
Return last interval with enough informative sites
Return interval with most informative sites

Other Arguments

Removes multi-alleleic sites from analysis
Include sites with missing data in analysis
Extend region to include non-informative variants between an edge variant and a variant that breaks the four-gamete criteria
Include informative variants from overlapping regions