Search for a command to run...
We developed a novel approximate likelihood-based method that can be applied to estimate haplotypes of freshwater-adaptive alleles in any Threespine Stickleback genome. This program takes information from multi-SNP loci each with SNPs that have polarized alleles. i.e., for each snp within a mutli-SNP haplotype, we have identified which allele increases in freshwater environment. See the section Jackpot carriers increased in frequency in Scout Lake in the main paper and also supplementary section 6 for more details on these loci. There are two versions of the program: one takes bams and estimate the genotype likelihoods given the read data and the other takes vcf with genotype probabilities (emitted from the imputation program beagle 4.0). The version of the program that takes bams (fw_caller_V1.py) takes hours to run, but the version that takes vcf (fw_caller_V2.py) is relatively faster. I have provided the two versions in the code folder. In the run,which should take less than a minute, I feed numpy arrays with the genotype calls for all the timepoints which can be found in the folder /data/Genotypes. The numpy arrays were generated with the fw_caller_V1.py. These arrays can be used to generate panel F in Figure 1 and also Figure 2. The results from this run will be plots used to generate panel F of Figure 1. The bam files that were used in our study have been uploaded to SRA, PRJNA1231081. I have also included beagle 4.0 imputed vcf for SC2014 as an example to run the fw_caller_V2.py. As we stated in the main paper, both approaches produced similar haplotype genotypes. **The remaining scripts were used for other analyses in the paper, as indicated below ** We used the functions in the dadi.py module to generate the site frequency spectra in Figure 3C, Figure 5 and those in the supplementary section Figure 4 was generated with relatedness.py