Search for a command to run...
Data and code for the article "Population genetic structure and patterns of hybridization of the rare Lupinus tidestromii and its congener L. chamissonis (Fabaceae) inform seed sourcing strategies for population augmentation and reintroduction" The RMarkdown Notebook "Lupinus_analysis_final.Rmd" contains code to replicate all analyses performed in R and recreate most figures. Output from external software (ADMIXTURE, SplitsTree4) are read into R for futher analysis and visualization. Figure 1 was created entirely in ArcGIS Pro. Figure 4 was created from ADMIXTURE output which was processed in R and loaded into ArcGIS Pro. Analyses were performed in RStudio 2023.03.0+386 "Cherry Blossom" using R version 4.3.0. All packages used in these analyses and their version information are specified in the RMarkdown file. Overview of Contents ADMIXTURE_results folder: contains all output from ADMIXTURE analysis, including species-specific analyses contained in subfolders. Figures folder: contains all figures produced for the journal article as .EPS files, plus subfolders containing .PDF and .PNG versions of all figures. NewHybrids_PC folder: contains all output from NewHybrids analysis performed in R using implementation from dartR. In order to rerun these analyses, the NewHybrids source files should be deposited into this folder. vcf folder: contains all vcf files of sequence data from each step of filtering as described in the Methods section of the journal article. barcodes_samplenames.csv: identification information for each genetic sample, linking the sequence data to the other sample data. cp_group.csv: output from chloroplast DNA group identification analysis linking each sample to a cpDNA group. LTLC2023_site_data.csv: information about each sampled site. Lupinus.dist.netout.nex: neighbor-net network output from SplitsTree4 network analysis. Lupinus.dist.nex: simple Euclidean distances between pairs of individuals for input into SplitsTree4 for network analysis. Lupinus.query.Lupinus.megablast.out: output of a megablast search of the sequences in the reference.fasta file on the NCBI "nt" database (2025-03-31) in format 6, containing the fields qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids sscinames sblastnames. Lupinus_names.csv: harmonizing sample names from field data with final sample names. Lupinus_names_LC.RData: harmonizing sample names for only L. chamissonis samples. Lupinus_names_LT.RData: harmonizing sample names for only L. tidestromii samples. pie_chart_coords.csv: coordinates for plotting the pie charts from Figure 4 in ArcGIS Pro. reference.fasta: de novo reference containing 36769 sequences of contigs generated by dDocent based on ddRAD sequence data of 532 samples of Lupinus tidestromii and L. chamissonis. Data Availability Site and sample GPS coordinates have been anonymized by truncating to the first decimal place to protect populations of the endangered L. tidestromii. This anonymization resulted in a discrepancy between the version of Figure 5 that is produced in this code and the version which may appear in publication. Exact location data can be made available by reasonable request. Raw sequence data is available from the European Nucleotide Archive (https://www.ebi.ac.uk/ena) under accession number PRJEB72106 and individual sample numbers ERS17855825 - ERS17856375.