Search for a command to run...
This Evaluator requests expression predictions for sequences in THP-1 monocytes and Jurkat T cells. It then computes the log2 fold change (log2FC) between alternate and reference sequence predictions, and evaluates performance by calculating the Pearson correlation between the measure Included in Engreitz_evaluator.sif : Scripts to process the data and connect to predictors in the GAME API Scripts to parse the returned predictions and calculate performance metrics All software dependencies /evaluator_data folder contents: /Jurkat all_jurkat_sequences.tsv: SPDI ID; Reference Sequence; Alternate Sequence. ~2000 bp long sequence (depending on indel size) all_Jurkat.tsv: concatenated variants from individual variant files in SPDI format. 332 total variants. /THP1 all_THP1_sequences.tsv: SPDI ID; Reference Sequence; Alternate Sequences. ~2000 bp long sequence (depending on indel size) all_THP1.tsv : concatenated variants from individual variant files in SPDI format. 392 total variants. /SPDI_toseq instructions.tsv : How to run the Rscript to pull sequences from SPDI IDs (hg38) .yaml to create the conda enviroment for the Rscript parse_Engreitz_data.py File Specification_VariantEffectsFiles.png: Details about information in the variant files How to run: apptainer run --containall -B /path_to/evaluator_data/:/evaluator_data -B /path_to/prediction_folder/:/predictions Engreitz_evaluator.sif HOST PORT /predictions Notes: The main evaluator script (Engreitz_evaluator.py)will read in the *_sequences.tsv files from their respective folders and send two seperate requests to the connected Predictor Duplicated SPDI ID sequences are only sent to the Predictors once to minimize duplicated computation and the values are merged with the measured values to calculate the correlation Additional information can be found on GitHub: Genomic API for Model Evaluation Original Publication can be found here: Martyn et al. (2025)