Repository logo
 

Faculty of Accounting and Informatics

Permanent URI for this communityhttp://ir-dev.dut.ac.za/handle/10321/1

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    Data mining and machine learning : a study of the CO2 emission trends in South Africa
    (2024) Mohamed, Ghulam Masudh; Patel, Sulaiman Saleem; Naicker, Nalindren
    This study addresses the pressing global issue of elevated carbon dioxide emissions (CO2E), with a particular focus on South Africa (SA), which ranks amongst the world's top emitters and largest in Africa. By introducing a novel integration of Change-point Analysis (CPA) and Machine Learning (ML) techniques, this research addresses significant gaps in CO2E trend analysis. Unlike previous studies, this research applies CPA methodologies within the distinct context of SA, employing algorithms like cumulative sum (CUSUM) and Bootstrap analysis to pinpoint crucial change-points in CO2E data specific to the country. The Bootstrap analysis determines the confidence levels associated with each detected change. Additionally, this study sought to validate historical trends and predict future patterns using ML models, with a specific focus on employing the AdaBoost ensemble learning technique. Drawing on insights from a Preferred Reporting Items for Systematic Reviews and MetaAnalyses (PRISMA)-based systematic review, the research selects input variables based on the factors identified as significant contributors to CO2E, ensuring the models capture the relevant variables effectively. The results of the systematic review highlight energy production and economic growth as key drivers of CO2E, thus validating their selection as input data for constructing the CPA and ML models. To conduct this study, secondary data was obtained from the World Bank's Open Data initiative data repository, a common source for environmental research. This selection was justified by a literature review, which highlighted the reliability and applicability of this data source. The CPA results reveal significant change-points in electricity generation, economic growth, and CO2E, with an average confidence level of 94%, indicating the accuracy of this analytical approach. Moreover, the CPA results emphasise the relationship between economic growth, electricity production, and CO2E in SA. Before forecasting future CO2E trends, the effectiveness of the AdaBoost regressor in enhancing model performance was benchmarked against traditional ML algorithms, including Linear regression, Polynomial regression, Bayesian Linear regression and K-Nearest Neighbors (KNN) regression, to determine the most effective technique for forecasting CO2E. The researcher evaluated model performance using key regression ML performance metrics, including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), coefficient of determination (R2) score, and an additional accuracy score introduced by the researcher. Notably, the AdaBoost models demonstrated superior performance, with an average RMSE score of 10,143.17 kilotons (kt), MAE score of 9,642.64 kt, R2 of 0.90, and accuracy of 96.74%. The study also revealed that, on average, models that were trained using the AdaBoost algorithm surpassed traditional ML models, in terms of performance. They achieved a reduction in RMSE score by 6,417.29 kt, a decrease in MAE score by 4,358.09 kt, an increase in R2 score by 0.07 and enhanced accuracy by 0.60%. Additionally, a comparative analysis of the repeated holdout methods and cross-validation techniques was conducted, with results revealing that repeated holdout had a more significant impact on model performance. After excluding outliers, the average improvement in crossvalidation results, due to the repeated holdout method, was a decrease of 783.32 kt for RMSE, a reduction of 1,289.39 kt for MAE, and an increase of 0.88% for accuracy. The extent to which the repeated holdout method improved the performance of ML models that were integrated with cross-validation techniques, was correlated with the initial model performance. For ML models with RMSE and MAE scores equal to or exceeding 15,000 kt, the findings indicate that the repeated holdout methods studied should enhance performance by at least 2,000 kt. Similarly, an improvement of nearly 3% or higher in accuracy was noted, when the crossvalidation value for this metric was 94% or lower. The AdaBoost model, integrated with repeated holdout, was selected as the optimal model, as evidenced by the results, for forecasting CO2E in SA from 2021 to 2027. The forecasted CO2E trends validate that energy production and economic growth are indeed the primary drivers of CO2E in SA, as previously highlighted by the CPA model. This underscores the importance of addressing these factors to effectively mitigate carbon emissions in the country. Moreover, the forecasted results indicate that SA is unlikely to meet the global temperature limit of 1.5 degrees Celsius by 2030, given the trajectory showing a shortfall in achieving the target level of 334 million tonnes (Mt) of CO2E, agreed upon in the Paris Agreement. However, the country did meet its CO2E commitments outlined in the 2030 National Development Plan, showing some progress towards environmental sustainability. Nonetheless, the failure to meet these targets at their lower ranges suggests the need for further efforts to reduce carbon emissions, which is crucial for aligning with the Paris Agreement objectives and achieving a zero net emission rate by 2050. This highlights the importance of ongoing initiatives to enhance environmental policies and practices in SA. Future research should focus on integrating load-shedding dynamics into the analysis to examine and confirm its effects on energy production, economic growth, and CO2E in SA. Additionally, future research should focus on forecasting future change-points for the socio-economic indicators or variables utilised in this study. This can help policymakers anticipate fluctuations and devise proactive strategies, to address environmental and economic challenges effectively. It is also recommended that future research consider the output of renewable energy production, when analysing CO2E trends.
  • Thumbnail Image
    Item
    A bisociated research paper recommendation model using BiSOLinkers
    (Insight Society, 2022-01-01) Maake, Benard M.; Ojo, Sunday O.; Zuva, Keneilwe; Mzee, Fredrick A.
    In the current days of information overload, it is nearly impossible to obtain a form of relevant knowledge from massive information repositories without using information retrieval and filtering tools. The academic field daily receives lots of research articles, thus making it virtually impossible for researchers to trace and retrieve important articles for their research work. Unfortunately, the tools used to search, retrieve and recommend relevant research papers suggest similar articles based on the user profile characteristic, resulting in the overspecialization problem whereby recommendations are boring, similar, and uninteresting. We attempt to address this problem by recommending research papers from domains considered unrelated and unconnected. This is achieved through identifying bridging concepts that can bridge these two unrelated domains through their outlying concepts – BiSOLinkers. We modeled a bisociation framework using graph theory and text mining technologies. Machine learning algorithms were utilized to identify outliers within the dataset, and the accuracy achieved by most algorithms was between 96.30% and 99.49%, suggesting that the classifiers accurately classified and identified the outliers. We additionally utilized the Latent Dirichlet Allocation (LDA) algorithm to identify the topics bridging the two unrelated domains at their point of intersection. BisoNets were finally generated, conceptually demonstrating how the two unrelated domains were linked, necessitating cross-domain recommendations. Hence, it is established that recommender systems' overspecialization can be addressed by combining bisociation, topic modeling, and text mining approaches.
  • Thumbnail Image
    Item
    A meta-analysis of educational data mining for predicting students performance in programming
    (The Science and Information Organization, 2021-02) Moonsamy, Devraj; Naicker, Nalindren; Adeliyi, Timothy T.; Ogunsakin, Ropo E.
    An essential skill amid the 4th industrial revolution is the ability to write good computer programs. Therefore, higher education institutions are offering computer programming as a module not only in computer related programmes but other programmes as well. However, the number of students that underperform in programming is significantly higher than the non-programming modules. It is, therefore, crucial to be able to accurately predict the performance of students pursuing programming since this will help in identifying students that may underperform and the necessary support interventions can be timeously put in place to assist these students. The objective of this study is therefore to obtain the most effective Educational Data Mining approaches used to identify those students that may underperform in computer programming. The PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analysis) approach was used in conducting the meta-analysis. The databases searched were, namely, ACM, Google Scholar, IEEE, Pro-Quest, Science Direct and Scopus. A total of 11 scientific research publications were included in the meta-analysis for this study from 220 articles identified through database searching. The residual amount of heterogeneity was high (τ2 = 0.03; heterogeneity I2 = 99.46% with heterogeneity chi-square = 1210.91, a degree of freedom = 10 and P = >0.001). The estimated pooled performance of the algorithms was 24% (95% CI (13%, 35%). Meta-regression analysis indicated that none of the moderators included have influenced the heterogeneity of studies. The result of effect estimates against its standard error indicated publication bias with a P-value of 0.013. These meta-analysis findings indicated that the pooled estimate of algorithms is high.