Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

20252 citationsJournal Article

Authors

Qiong Feng · Nanjing University of Science and Technology

Xiaotian Ma · Nanjing University of Science and Technology

Jiayi Sheng · Nanjing University of Science and Technology

Ziyuan Feng · Nanjing University of Science and Technology

Wei Song · Nanjing University of Science and Technology

Peng Liang · Wuhan University

Abstract

LLMs have garnered considerable attention for their potential to streamline Automated Program Repair (APR). LLM-based approaches can either insert the correct code using an infilling-style technique or directly generate patches when provided with buggy methods, aiming for plausible patches to pass all tests. However, most of LLM-based APR methods rely on a single type of software information, such as issue descriptions or error stack traces, without fully leveraging a combination of diverse software artifacts. Human developers, in contrast, often use a range of information — such as debugging data, issue discussions, and error stack traces — to diagnose and fix bugs. Despite this, many LLM-based approaches do not explore which specific types of software information best assist in localizing and repairing software bugs. Addressing this gap is crucial for advancing LLM-based APR techniques. To investigate this and mimic the way human developers fix bugs, we propose DEVLoRe (short for DEV eloper Lo calization and Re pair). In this framework, LLMs first use issue content (description and discussion) and stack error traces to localize buggy methods, then rely on debug information in buggy methods and issue content and stack error to localize buggy lines and generate valid patches. We evaluated the effectiveness of issue content, error stack traces, and debugging information in bug localization and automatic program repair. Our results show that while issue content and error stack is particularly effective in assisting LLMs with fault localization and program repair respectively, different types of software artifacts complement each other in addressing various bugs. By incorporating these three types of artifacts and using the Defects4J v2.0 dataset for evaluation, DEVLoRe successfully localizes 49.3% of single-method bugs and generates 56.0% plausible patches. Additionally, DEVLoRe can localize 47.6% of non-single-method bugs and generates 14.5% plausible patches. Moreover, our framework streamlines the end-to-end process from buggy source code to a complete repair, and achieves a 39.7% and 17.1% of single-method and non-single-method bug repair rate, outperforming current state-of-the-art APR methods. Furthermore, we re-implemented and evaluated our framework, demonstrating its effectiveness in resolving 9 unique issues compared to other state-of-the-art frameworks using the same or more advanced models on SWE-bench Lite. We also discussed whether a leading framework for Python code can be directly applied to Java code, or vice versa. The source code and experimental results of this work for replication are available at https://github.com/XYZboom/DEVLoRe .

Topics & Keywords

Software Testing and Debugging Techniques Software Reliability and Analysis Research Software Engineering Research

Publication Details

Published in: ACM Transactions on Software Engineering and Methodology

DOI: 10.1145/3770581

Field-Weighted Citation Impact: 5.10

Command Palette

Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

Authors

Abstract

Topics & Keywords

Publication Details