Search for a command to run...
<ns3:p>Background Data preparation is a fundamental aspect of data engineering, a prerequisite for later tasks such as data visualization, reporting, and training machine learning models. Despite the recurring patterns in data transformation processes, the specific steps often vary depending on the project context, data sources, and application domain. Methods To address these challenges, this paper presents a flexible and extensible framework that enables the coordinated execution of modular data processing steps defined in a configuration file. By adopting a declarative, configuration-driven approach, the framework promotes modular, step-by-step development while substantially improving code reuse, maintainability, and adaptability. The framework also supports basic iterative execution constructs, such as loops and limited recursion, within the data pipeline definitions to accommodate more complex workflows. Results By enabling the reuse of existing code snippets, the framework shifts development efforts toward enhancing and refining a shared code base, rather than repeatedly creating project-specific, disposable implementations. The long-term benefits of this approach become increasingly apparent as the system evolves. As more generalized modules and functions are developed, they can reduce duplication and improve maintainability without sacrificing flexibility. Conclusions To assess the effectiveness of the framework, we apply cyclomatic complexity as a metric, demonstrating how the proposed approach impacts the development effort across some relatively simple, real-world data engineering scenarios.</ns3:p>