Search for a command to run...
Context This Zenodo repository stems from the publication "Targeted sequencing enhances detection of pangolin trafficking hotspots and dynamics of both domestic and global trade markets" by Heighton et al. Article Summary Pangolins are among the most trafficked mammals globally, facing critical threats from illegal wildlife trade and habitat loss. Effective conservation of wild pangolin populations requires the ability to genetically identify distinct lineages and trace the origins of seized individuals. To address this, we developed and applied a targeted gene-capture sequencing approach optimised for low-quality DNA, such as that from confiscated and museum specimens. Our bait set was designed to target genomic regions of high evolutionary and adaptive value across all eight extant pangolin species. Using this approach, we generated population-level genomic data for over 700 individuals, with a focus on the three most heavily traded species: the white-bellied (Phataginus tricuspis), Sunda (Manis javanica), and Chinese pangolins (Manis pentadactyla). We present a comprehensive, geo-referenced population genomics dataset spanning the range of these species, providing new insight into biogeographic population structure. Our findings reveal distinct regional trade patterns and highlight several international trafficking hotspots. Moreover, we detect overlap in the sourcing patterns of domestic and international trade, indicating that localised markets may feed into global trafficking networks. This dataset enhances our understanding of pangolin population structure, informs targeted interventions, and offers a framework for the future integration of seizure data into conservation planning. Dataset & Code breakdown 1) Gene-capture references We provide the gene-capture bait references for the Sunda (Manis javanica - "Manis_javanica_BAITS_Reference_40K_kit") and white-bellied (Phataginus tricuspis - "Phataginus_tricuspis_BAITS_Reference_40K_kit") pangolins. These two references served as the basis for designing gene-capture probes used for targeted sequencing. They were also used as Asian and African references for mapping target-sequenced samples. These together represent around 1,332 sequences (~1.2mbp of nuclear genome data), which were used to design 38 557 baits (70nt long) at 3X tiling density and 98% sequence similarity (between the two groups). More details of the design can be found in the published article. The target-sequenced raw reads of each sample used in this study have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB93883 (https://www.ebi.ac.uk/ena/browser/view/PRJEB93883). 2) RScripts & Underlying Data We provide RScripts used to analyse and plot downstream analyses outputs (post SNP-calling) for each species. The underlying data for these scripts are provided as examples to allow you to run them on the white-bellied pangolin (Phataginus tricuspis). These relate to key figures in the article and include: (i) Plotting ADMIXTURE barplots and mapping them as pie charts across the species range - "Heighton_admixture_and_pie_charts_maps.R" Underlying data include "PTri_metadata_ADMIXTURE.txt" (sample metadata), PTri_synthetic_ADMIXTURE.vcf (a synthetic vcf file of 500 SNPs to run the scripts), and "PTri_PTri_ADMIXTURE.X.Q" (the ADMIXTURE outputs of each admixture proportion for a given K, we provide the first 6 Ks). (ii) Plotting trade tracing hotspots, their trade distances, and their error heatmaps across the species range - "Heighton_Tracing_trade.R" Underlying data include "PTri_metadata_Locator_All.txt" (sample metadata) and "PTri_predicted_Locations_LargeRange_All.txt" (Locators output files 'predlocs' which have been concatenated into a single text file). (iii) Measuring trade tracing error using a modified version of plot_locator (Battey et al. 2020 - https://doi.org/10.7554/eLife.54507) - "Heighton_plot_locator.R" Underlying data include "PTri_metadata_Locator_LargeRange.txt" (sample metadata) and "PTri_pd_table.csv" (collected, 'true' sampling location vs the predicted sampling location by Locator 'predlocs' output files). Additional information about each file can be found in its corresponding script. Additional information on the analyses can be found in the manuscript.