Supplementary material for "Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study"

20260 citationsOthergreen Open Access

Authors

Daniel Fernandes · Iscte – Instituto Universitário de Lisboa

Carlos M. Fernandes · Universidade Lusófona

João Pedro Carvalho · Sistemas Genómicos

Abstract

Overview This package contains the following files and folders: data - Folder containing raw LLM answers, code extracted from those answers, and simulation output + run times for runnable LLM-generated models as well as from the NetLogo baseline. 00_extract_code.ipynb - Code that extracts Python code from LLM-generated answers. 01_check_code.ipynb - Code used to perform a very quick smoke test of each LLM-generated function to detect potential errors, yielding results_temp.csv. 02_run_models.ipynb - Code which performs full runs of the LLM-generated models that didn't errored in the previous step, saving the simulation and run time results to the data folder. 03_model_comparison.ipynb - Code to check if the simulation outputs of the LLM-generated implementations (that didn't errored out) are statistically indistinguishable from the NetLogo baseline. Yields results.csv. 04_scores_analysis.ipynb - Code that analyzes the execution and validation results, yielding results_final.csv. 05_time_analysis.ipynb - Code which analyze the execution time of the LLM-generated implementations, comparing it (for coarse context) with the runtime of the NetLogo baseline. 06_code_quality.ipynb - Code that performs the static code analysis of the successful (score=6) LLM-generated implementations of PPHPC. LICENSE_CODE.txt - The license for the included code. LICENSE_DATA.txt - The license for the included data. nlexps.in.xml - Configuration template for the NetLogo baseline model, used by run_netlogo.sh. pphpc.nlogo - The baseline NetLogo model, included for completeness. prompt.txt - The prompt used in this study. README.md - This file. requirements.txt- Specifies the Python dependencies required for running the included notebooks. results.csv - Scores (and errors, if any) for all tested models and seeds in a wide 1-9 scale. results_final.csv - Final scores (and errors, if any) for all tested models and seeds on a compacted (and more representative) 1-6 scale, as presented in the paper. results_temp.csv - Initial scores (and errors, if any) for all tested models and seeds, on a 1-7 scale (if and error occurred) or NA if the model runs and has to be further analyzed. ruff_rules.txt - The employed Ruff rules for obtaining flaws and formatting errors in the code quality analysis. run_netlogo.sh - Script to perform the reproducible executions of the baseline NetLogo model. Reproducibility of data analysis These datasets and workflows constitute the supplementary materials of the following research paper: Fachada, N., Fernandes, D., Fernandes, C. M., & Matos-Carvalho, J. P. (2026). Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study. arXiv preprint. https://doi.org/10.48550/arXiv.2602.10140 Specifically, the data and code quality analysis presented in this paper can be reproduced with the Jupyter notebooks included in this package. Licenses The code in the Jupyter notebook is made available under the MIT license (see LICENSE_CODE.txt). The non-code materials are made available under a CC-BY 4.0 license (see LICENSE_DATA.txt).

Topics & Keywords

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.18521256

Command Palette

Supplementary material for "Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study"

Authors

Abstract

Topics & Keywords

Publication Details