eigenstrat_fstats.py: F-statistics Analysis

Automates the calculation of multiple admixture statistics, including: Patterson's D, F4 statistic, F4-ratio statistic, and F3 statistic.

Command-line Usage

The admixture statistics automater may be called using the following command:

eigenstrat_fstats.py

Example usage

Command-line to calculate Patterson's D:

eigenstrat_fstats.py --eigenstrat-prefix snps --calc-admix-statistic D --admix-w-pop French --admix-x-pop Yoruba --admix-y-pop Vindija --admix-z-pop Chimp 

Command-line to calculate the F4-ratio:

eigenstrat_fstats.py --eigenstrat-prefix snps --calc-admix-statistic F4-ratio --admix-a-pop Altai --admix-b-pop Vindija --admix-c-pop Yoruba --admix-x-pop French --admix-o-pop Chimp  

Dependencies

Input Command-line Arguments

--eigenstrat-prefix <input_prefix>
Argument used to define the filename prefix shared by the genotype file (.geno), the individual file (.ind), and the SNP file (.snp). Should not be used alongside the specific file arguments (e.g. --geno).
--geno <geno_filename>
Argument used to define the filename of the eigenstrat genotype file (.geno). Must be called alongside --ind and --snp. Cannot be called alongside --eigenstrat-prefix.
--ind <ind_filename>
Argument used to define the filename of the eigenstrat individual file (.ind). Must be called alongside --geno and --snp. Cannot be called alongside --eigenstrat-prefix.
--snp <snp_filename>
Argument used to define the filename of the eigenstrat SNP file (.snp). Must be called alongside --geno and --ind. Cannot be called alongside --eigenstrat-prefix.
--model-file <model_filename>
Argument used to define the model file. Please note that this argument cannot be used with the individual-based filters.
--model <model_str>
Argument used to define the model (i.e. the individual(s) to include and/or the populations for relevant statistics). May be used with any statistic. Please note that this argument cannot be used with --pop-file argument or the individual-based filters.

Output Command-line Arguments

--out <output_filename>
Argument used to define the complete output filename, overrides --out-prefix. Cannot be used if multiple output files are created.
--out-prefix <output_prefix>
Argument used to define the output prefix (i.e. filename without file extension)
--overwrite
Argument used to define if previous output should be overwritten.

Statistic Command-line Specification

--calc-admix-statistic <D, F4, F4-ratio, F3>
Argument used to define the admix statistic to be calculated. Patterson's D (D), F4 statistic (F4), F4-ratio statistic (F4-ratio), and F3 statistic (F3). See below for details on the arguments requried by each statistic .

Statistic Command-line Requirements

It should be noted that each admix statistic has a specific set of population labels arguments. These labels are used to specify a representive population. For instance, the argument '--admix-w-pop CEU' will replace the W label of Patterson's D and the F4 statistic with the CEU population. These arguments may be found in the next section.

--calc-admix-statistic D
Requires: --admix-w-pop/--admix-w-pop-file, --admix-x-pop/--admix-x-pop-file, --admix-y-pop/--admix-y-pop-file, and --admix-z-pop/--admix-z-pop-file.
--calc-admix-statistic F4
Requires: --admix-w-pop/--admix-w-pop-file, --admix-x-pop/--admix-x-pop-file, --admix-y-pop/--admix-y-pop-file, and --admix-z-pop/--admix-z-pop-file.
--calc-admix-statistic F4-ratio
Requires: --admix-a-pop/--admix-a-pop-file, --admix-b-pop/--admix-b-pop-file, --admix-c-pop/--admix-c-pop-file, --admix-x-pop/--admix-x-pop-file, and --admix-o-pop/--admix-o-pop-file.
--calc-admix-statistic F3
Requires: --admix-a-pop/--admix-a-pop-file, --admix-b-pop/--admix-b-pop-file, and --admix-c-pop/--admix-c-pop-file.

Additional Statistic Command-line Arguments

--admix-w-pop <w_pop_str> <w_pop1_str, w_pop2_str, etc.>
Argument used to define the population(s) to represent W in the supported admixure statistic. This argument may be used multiple times if desired. If multiple populations the statistic will be repeated until each population has represented W.
--admix-w-pop-file <w_pop_filename>
Argument used to define a file of population(s) to represent W in the supported admixure statistic. If multiple populations the statistic will be repeated until each population has represented W.
--admix-x-pop <x_pop_str> <x_pop1_str, x_pop2_str, etc.>
Argument used to define the population(s) to represent X in the supported admixure statistic. This argument may be used multiple times if desired. If multiple populations the statistic will be repeated until each population has represented X.
--admix-x-pop-file <x_pop_filename>
Argument used to define a file of population(s) to represent X in the supported admixure statistic. If multiple populations the statistic will be repeated until each population has represented X.
--admix-y-pop <y_pop_str> <y_pop1_str, y_pop2_str, etc.>
Argument used to define the population(s) to represent Y in the supported admixure statistic. This argument may be used multiple times if desired. If multiple populations the statistic will be repeated until each population has represented Y.
--admix-y-pop-file <y_pop_filename>
Argument used to define a file of population(s) to represent Y in the supported admixure statistic. If multiple populations the statistic will be repeated until each population has represented Y.
--admix-z-pop <z_pop_str> <z_pop1_str, z_pop2_str, etc.>
Argument used to define the population(s) to represent Z in the supported admixure statistic. This argument may be used multiple times if desired. If multiple populations the statistic will be repeated until each population has represented Z.
--admix-z-pop-file <z_pop_filename>
Argument used to define a file of population(s) to represent Z in the supported admixure statistic. If multiple populations the statistic will be repeated until each population has represented Z.
--admix-a-pop <a_pop_str> <a_pop1_str, a_pop2_str, etc.>
Argument used to define the population(s) to represent A in the supported admixure statistic. This argument may be used multiple times if desired. If multiple populations the statistic will be repeated until each population has represented A.
--admix-a-pop-file <a_pop_filename>
Argument used to define a file of population(s) to represent A in the supported admixure statistic. If multiple populations the statistic will be repeated until each population has represented A.
--admix-b-pop <b_pop_str> <b_pop1_str, b_pop2_str, etc.>
Argument used to define the population(s) to represent B in the supported admixure statistic. This argument may be used multiple times if desired. If multiple populations the statistic will be repeated until each population has represented B.
--admix-b-pop-file <b_pop_filename>
Argument used to define a file of population(s) to represent B in the supported admixure statistic. If multiple populations the statistic will be repeated until each population has represented B.
--admix-c-pop <c_pop_str> <c_pop1_str, c_pop2_str, etc.>
Argument used to define the population(s) to represent C in the supported admixure statistic. This argument may be used multiple times if desired. If multiple populations the statistic will be repeated until each population has represented C.
--admix-c-pop-file <c_pop_filename>
Argument used to define a file of population(s) to represent C in the supported admixure statistic. If multiple populations the statistic will be repeated until each population has represented C.