Early prediction of students at risk in a virtual learning environment using ensemble machine learning techniques
Date
2021-12-13
Authors
Soobramoney, Ranjin
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Students at risk (SAR) are those students who are considered to have a higher probability of
failing academically or dropping out of an academic programme. The literature reveals that
SAR is a global problem at Higher Education Institutions (HEIs). A high failure rate can not
only harm the reputation of the HEIs, but if left unchecked, can be detrimental to these HEIs.
The problem of identifying SAR is a pervasive and persistent one. However, early
identification of SAR will allow for timely and focused interventions, thereby reducing the
problem. Various techniques have been used by HEIs to identify SAR. The traditional
statistical approach is one such technique. One of the key challenges with this technique
however, is that it often requires a large amount of manual analysis of the data to predict SAR,
which in turn also makes early predictions of SAR more computationally challenging. To
overcome some of the challenges of the traditional statistical approach, machine learning-based
techniques have been proffered to predict SAR. Since machine learning (ML) models are based
on the input data rather than the underlying problem, they are expected to have better predictive
capabilities than traditional statistical models. Several ML-based techniques have been applied
to predict SAR with varying degrees of success. This study proposes the use of ensemble ML
techniques for early and accurate prediction of SAR using students’ demographic and weekly
online Virtual Learning Environment (VLE) data. Aggregating the predictions of a group of
ML classifiers is expected to provide a better generalization performance than each of the
individual classifiers on their own. The use of ensemble ML techniques for this study will
provide an improved solution to the problem of predicting SAR. To this end, this study focused
on training forty different ML predictive models, one for each week of the semester, using
twenty-five different ML classifiers. Each model was trained using students’ demographic data
combined with data from their weekly interactions with a VLE. Based on the training results,
four classifiers, namely AdaBoostClassifier, LGBMClassifier, RandomForestClassifier, and
XGBClassifier were selected as base learners for the ensemble classifier. Hyperparameter
optimization was performed using Random Search on each of the four classifiers. These
classifiers were then used to create a voting classifier ensemble for each of the forty weeks,
with 10-fold cross validation being used to evaluate the predictive models. The results show
that the voting classifier ensemble method outperformed the individual classifiers overall over
forty weeks and can thus provide an improved solution to the problem of predicting SAR.
Description
Submitted in fulfillment of the requirements for the Degree of Masters of Information and Communication Technology, Durban University of Technology, Durban, South Africa, 2021.
Keywords
Students at Risk, Ensemble learning, Lazypredict, Machine Learning Algorithms, Virtual Learning Environment
Citation
DOI
https://doi.org/10.51415/10321/4072