# stat_sampler.py: STAT File Sampler¶

As a single statistic file may include far more loci/windows than a technique is capable of analyzing, it is often necessary to sample the loci/windows from the file. Given a statistic file and a sampling scheme, stat_sampler will generate a pseudorandomly sampled file.

In this illustration of the sampling process, the loci found within Data.VCF are pseudorandomly sampled using the corrdinates found within the given statistic file.

Two pseudorandomly sampling schemes are provided: i) a random sampler that will randomly select loci/windows and ii) a uniform sampler that will evenly sample across equal-sized bins of the given statistic. Please note that all sampling is done without replacement.

For BED-based sampling, please see ../Utilities/bed_utilities.rst.

## Command-line Usage¶

The statistic sampler may be called using the following command:

stat_sampler.py


### Example usage¶

Randomly sampling 20 windows from a windowed Fst statistic file merged_chr1_10000.windowed.weir.fst.

stat_sampler.py --statistic-file examples/files/merged_chr1_10000.windowed.weir.fst --calc-statistic windowed-weir-fst --sampling-scheme random --sample-size 20


Uniform sampling 20 windows from four bins from a windowed pi statistic file merged_chr1_10000.windowed.pi.

stat_sampler.py --statistic-file examples/files/merged_chr1_10000.windowed.pi --calc-statistic window-pi --sampling-scheme uniform --uniform-bins 4 --sample-size 20


## Input Command-line Arguments¶

--statistic-file <statistic_filename>
Argument used to define the filename of the statistic file for sampling.

## Output Command-line Arguments¶

--out <output_filename>
Argument used to define the complete output filename, overrides --out-prefix. Cannot be used if multiple output files are created.
--out-prefix <output_prefix>
Argument used to define the output prefix (i.e. filename without file extension)
--overwrite
Argument used to define if previous output should be overwritten.