Search for a command to run...
Accurate prediction of IT project costs is crucial for successful project planning, budgeting, and resource allocation. However, typical cost estimation methods, such as Function Point Analysis, or expert-based evaluations, frequently fail to produce trustworthy conclusions, especially in developing countries like Kazakhstan where previous project data is few or incomplete. This study looks into how ensemble machine learning algorithms, notably Random Forest and Gradient Boosting, can be used to predict IT project costs when there is insufficient data available. To solve data shortage, this study applies synthetic data creation techniques, which result in extended datasets that model various project scenarios while retaining statistical features observed in real-world cases. The presented models use essential project variables, such as team size, project complexity, development process, and project size, as inputs for cost prediction. Experimental results show that ensemble approaches outperform standard estimating techniques in terms of predictive accuracy. Random Forest achieved the lowest mean absolute error (MAE = 0.09) and highest coefficient of determination (R² = 0.603). Furthermore, feature importance analysis shows that project size and development time are the most important elements in cost estimation. The findings demonstrate ensemble learning’s usefulness in dealing with complicated, nonlinear connections among project variables, as well as providing a feasible approach for improving cost estimation techniques in the absence of high-quality historical data. This work adds to the development of intelligent decision support systems and offers practical insights for IT project managers and policymakers in emerging economies who want to improve IT project budgeting and planning.
Published in: Herald of Kazakh-British technical university
Volume 23, Issue 1, pp. 107-116