plink_linkage_disequilibrium.py: Linkage Disequilibrium Analysis¶
Automates the calculation of multiple LD statistics using Plink.
Command-line Usage¶
The LD statistics automater may be called using the following command:
plink_ld.py
Example usage¶
Command-line to calculate Lewontin's D-prime statistic
plink_ld.py --ped-prefix hapmap1 --ld-format table --ld-statistic r2 --table-d-statistic dprime
Input Command-line Arguments¶
- --ped-prefix <input_prefix>
- Argument used to define the filename prefix shared by the ped file (.ped) and the map file (.map). Should not be used alongside the specific file arguments (e.g. --ped).
- --ped <ped_filename>
- Argument used to define the filename of the plink ped file (.ped). Must be called alongside --map. Cannot be called alongside --ped-prefix.
- --map <map_filename>
- Argument used to define the filename of the plink map file (.map). Must be called alongside --ped. Cannot be called alongside --ped-prefix.
- --binary-ped-prefix <input_prefix>
- Argument used to define the filename prefix shared by the binary ped file (.bed), the fam file (.fam), and the bim file (.bim). Should not be used alongside the specific file arguments (e.g. --binary-ped).
- --binary-ped <binary_ped_filename>
- Argument used to define the filename of the plink binary ped file (.bed). Must be called alongside --fam and --bim. Cannot be called alongside --binary-ped-prefix.
- --fam <fam_filename>
- Argument used to define the filename of the plink fam file (.fam). Must be called alongside --binary-ped and --bim. Cannot be called alongside --binary-ped-prefix.
- --bim <bim_filename>
- Argument used to define the filename of the plink bim file (.bim). Must be called alongside --binary-ped and --fam. Cannot be called alongside --binary-ped-prefix.
- --allow-extra-chr
- Argument used to force invalid chromosome names to be accepted.
Output Command-line Arguments¶
- --out-format <output_format>
- Argument used to define the output format. Supported formats include: gzip compressed (gzipped); standard uncompresed (standard); single-precision binary (bin32); and double-precision binary (bin64). Please note that both binary formats are only supported when called with the square --lf-format. By default gzip compressed files are produced.
- --out <output_filename>
- Argument used to define the complete output filename, overrides --out-prefix. Cannot be used if multiple output files are created.
- --out-prefix <output_prefix>
- Argument used to define the output prefix (i.e. filename without file extension)
- --overwrite
- Argument used to define if previous output should be overwritten.
Basic LD Command-line Arguments¶
- --ld-statistic <r, r2>
- Argument used to define the correlation statistic to report. Two options are supported: the raw inter-variant allele count correlations (r) and squared correlations (r2).
- --ld-format <table, square, square-zero, triangle, inter-chr>
Argument used to define the matrix result format. Five formats are supported: The matrix as a limited window in table format (table); A symmetric matrix (square); a square matrix in which the cells of the upper right triangle are zeroed out (square-zero); only the lower-triangular of the matrix (triangle); the matrix with all pairs in a table (inter-chr). --ld-window-snps <snp_int>
Argument used to define the maximum number of SNPs between LD comparisons.- --ld-window-kb <snp_int>
- Argument used to define the maximum distance in bp between LD comparisons.
- --ld-window-cm <snp_int>
- Argument used to define the maximum distance in cM between LD comparisons.
Table Command-line Arguments¶
Please note that the following arguments may only be used with --ld-format table.
--table-d-statistic <dprime, dprime-signed, d> Argument used to add the specified D statistic to table-formatted results. Three options are supported: the absolute value of Lewontin's D-prime statistic (dprime); Lewontin's D-prime statistic (dprime-signed); and the value of D prior to division by Dmax (d). --table-in-phase
Argument used to add in-phase allele pairs to table-formatted results.
- --table-maf
- Argument used to add MAF values to table-formatted results.
- --table-r2-threshold <r2_float>
- Argument used to define the threshold for filtering pairs of r2 values.
- --table-snp <snp_str> <snp1_str, snp2_str, etc.>
- Argument used to define one or more SNP(s) for LD analysis. This argument may be used
multiple times if desired.
- --table-snps <snp_filename>
- Argument used to define a file with one or more SNP(s) for LD analysis.