Search for a command to run...
When learning to program, students’ efforts can be hampered by a variety of misconceptions pertaining to fundamental programming concepts, which can prevent them from developing appropriate mental models of these concepts. This can create a barrier to learning and subsequently impact students’ confidence. As such, it is necessary to identify students who are likely to require support with learning to program at the earliest opportunity. This investigation utilises data collected from 285 first-year computer science undergraduate university students to examine the potential for using a pre-course aptitude test to predict the results of students’ first introductory programming assessment, thereby providing an indication of which students would benefit most from additional support from the outset. The aptitude test, which was developed as part of this investigation, collates information on students’ backgrounds and prior experiences, their perceived levels of confidence, and their likelihood of holding appropriate mental models for several core programming concepts. The data collected using the aptitude test were subsequently used to train a variety of regression and classification models to explore their potential for predicting students’ assessment results. This culminated in the selection of a Random Forest Regressor and a Random Forest Classifier to be refined using Sequential Feature Selection and then finally validated against a holdout test-set to assess the generalisability of these models. The Random Forest Classifier achieved a good level of performance during training (AUC = 0.8688, F1 = 0.8353, accuracy = 0.7450). However, this was seen to reduce when evaluated on the hold-out test set (AUC = 0.7670, F1 = 0.7020, accuracy = 0.7020), demonstrating a moderate degree of overfitting, likely due to an imbalance in the classes being predicted and the limited amount of data available. In contrast, the Random Forest Regressor exhibited a generally consistent level of performance between training (RMSE = 0.1616, MAE = 0.1209) and testing (RMSE = 0.1713, MAE = 0.1396). Although there is still a sizeable margin of error, the results suggest that the Random Forest Regressor is not overfitting the data and has the potential to be used as a guide for identifying students who would benefit from additional support. This work contributes a novel, pre-course, aptitude-testing approach that integrates students’ mental models, background factors, and perceived levels of confidence to enable early identification of students who may require additional support through the prediction of introductory programming assessment results. As such, these findings provide a foundation for future work to develop targeted support interventions that can be integrated into introductory programming modules.