Search for a command to run...
In the rapidly evolving landscape of open-source software, code cleanliness has become a critical metric for assessing project quality and maintainability. However, understanding its evolution is essential for identifying long-term sustainability and best practices. This study investigates trends in code cleanliness across representative projects from three major open-source communities: Apache, Google, and Spring. These were selected based on popularity and active commit frequencies. We focus on indicators such as cyclomatic complexity, lines of code per function and file, line length, and naming practices, drawn from project-specific coding guidelines and relevant for evaluating AI-generated code. Using abstract syntax trees and thresholds from best practice to parse positive and negative modifications and integrating differential information from commit logs, we establish precise mappings between functions and code lines before and after modifications. Our analysis shows complexity and lines of code growth as projects mature, largely due to feature growth and faulttolerance logic in distributed systems. Continuous refactoring by key contributors can curb this rise overall. Projects also differ in standards and statement use: conditional and exception-handling statements correlate with higher complexity, while variable declarations and expressions track method size and line length, suggesting disciplined statement choices support cleaner code in practice.