Early prediction of students at risk in a virtual learning environment using ensemble machine learning techniques

Soobramoney, Ranjin

Early prediction of students at risk in a virtual learning environment using ensemble machine learning techniques

dc.contributor.advisor	Singh, Alveen
dc.contributor.author	Soobramoney, Ranjin	en_US
dc.date.accessioned	2022-06-15T12:32:28Z
dc.date.available	2022-06-15T12:32:28Z
dc.date.issued	2021-12-13
dc.description	Submitted in fulfillment of the requirements for the Degree of Masters of Information and Communication Technology, Durban University of Technology, Durban, South Africa, 2021.	en_US
dc.description.abstract	Students at risk (SAR) are those students who are considered to have a higher probability of failing academically or dropping out of an academic programme. The literature reveals that SAR is a global problem at Higher Education Institutions (HEIs). A high failure rate can not only harm the reputation of the HEIs, but if left unchecked, can be detrimental to these HEIs. The problem of identifying SAR is a pervasive and persistent one. However, early identification of SAR will allow for timely and focused interventions, thereby reducing the problem. Various techniques have been used by HEIs to identify SAR. The traditional statistical approach is one such technique. One of the key challenges with this technique however, is that it often requires a large amount of manual analysis of the data to predict SAR, which in turn also makes early predictions of SAR more computationally challenging. To overcome some of the challenges of the traditional statistical approach, machine learning-based techniques have been proffered to predict SAR. Since machine learning (ML) models are based on the input data rather than the underlying problem, they are expected to have better predictive capabilities than traditional statistical models. Several ML-based techniques have been applied to predict SAR with varying degrees of success. This study proposes the use of ensemble ML techniques for early and accurate prediction of SAR using students’ demographic and weekly online Virtual Learning Environment (VLE) data. Aggregating the predictions of a group of ML classifiers is expected to provide a better generalization performance than each of the individual classifiers on their own. The use of ensemble ML techniques for this study will provide an improved solution to the problem of predicting SAR. To this end, this study focused on training forty different ML predictive models, one for each week of the semester, using twenty-five different ML classifiers. Each model was trained using students’ demographic data combined with data from their weekly interactions with a VLE. Based on the training results, four classifiers, namely AdaBoostClassifier, LGBMClassifier, RandomForestClassifier, and XGBClassifier were selected as base learners for the ensemble classifier. Hyperparameter optimization was performed using Random Search on each of the four classifiers. These classifiers were then used to create a voting classifier ensemble for each of the forty weeks, with 10-fold cross validation being used to evaluate the predictive models. The results show that the voting classifier ensemble method outperformed the individual classifiers overall over forty weeks and can thus provide an improved solution to the problem of predicting SAR.	en_US
dc.description.level	M	en_US
dc.format.extent	126 p	en_US
dc.identifier.doi	https://doi.org/10.51415/10321/4072
dc.identifier.uri	https://hdl.handle.net/10321/4072
dc.language.iso	en	en_US
dc.subject	Students at Risk	en_US
dc.subject	Ensemble learning	en_US
dc.subject	Lazypredict	en_US
dc.subject	Machine Learning Algorithms	en_US
dc.subject	Virtual Learning Environment	en_US
dc.subject.lcsh	Computer-assisted instruction--South Africa	en_US
dc.subject.lcsh	Academic achievement	en_US
dc.subject.lcsh	Underprepared college students--South Africa	en_US
dc.subject.lcsh	Web-based instruction	en_US
dc.title	Early prediction of students at risk in a virtual learning environment using ensemble machine learning techniques	en_US
dc.type	Thesis	en_US
local.sdg	SDG04

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Soobramoney_R_2021_Redacted.pdf
Size:: 5.14 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.22 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and dissertations (Accounting and Informatics)