Faculty of Accounting and Informatics
Permanent URI for this communityhttp://ir-dev.dut.ac.za/handle/10321/1
Browse
Item Data mining and machine learning : a study of the CO2 emission trends in South Africa(2024) Mohamed, Ghulam Masudh; Patel, Sulaiman Saleem; Naicker, NalindrenThis study addresses the pressing global issue of elevated carbon dioxide emissions (CO2E), with a particular focus on South Africa (SA), which ranks amongst the world's top emitters and largest in Africa. By introducing a novel integration of Change-point Analysis (CPA) and Machine Learning (ML) techniques, this research addresses significant gaps in CO2E trend analysis. Unlike previous studies, this research applies CPA methodologies within the distinct context of SA, employing algorithms like cumulative sum (CUSUM) and Bootstrap analysis to pinpoint crucial change-points in CO2E data specific to the country. The Bootstrap analysis determines the confidence levels associated with each detected change. Additionally, this study sought to validate historical trends and predict future patterns using ML models, with a specific focus on employing the AdaBoost ensemble learning technique. Drawing on insights from a Preferred Reporting Items for Systematic Reviews and MetaAnalyses (PRISMA)-based systematic review, the research selects input variables based on the factors identified as significant contributors to CO2E, ensuring the models capture the relevant variables effectively. The results of the systematic review highlight energy production and economic growth as key drivers of CO2E, thus validating their selection as input data for constructing the CPA and ML models. To conduct this study, secondary data was obtained from the World Bank's Open Data initiative data repository, a common source for environmental research. This selection was justified by a literature review, which highlighted the reliability and applicability of this data source. The CPA results reveal significant change-points in electricity generation, economic growth, and CO2E, with an average confidence level of 94%, indicating the accuracy of this analytical approach. Moreover, the CPA results emphasise the relationship between economic growth, electricity production, and CO2E in SA. Before forecasting future CO2E trends, the effectiveness of the AdaBoost regressor in enhancing model performance was benchmarked against traditional ML algorithms, including Linear regression, Polynomial regression, Bayesian Linear regression and K-Nearest Neighbors (KNN) regression, to determine the most effective technique for forecasting CO2E. The researcher evaluated model performance using key regression ML performance metrics, including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), coefficient of determination (R2) score, and an additional accuracy score introduced by the researcher. Notably, the AdaBoost models demonstrated superior performance, with an average RMSE score of 10,143.17 kilotons (kt), MAE score of 9,642.64 kt, R2 of 0.90, and accuracy of 96.74%. The study also revealed that, on average, models that were trained using the AdaBoost algorithm surpassed traditional ML models, in terms of performance. They achieved a reduction in RMSE score by 6,417.29 kt, a decrease in MAE score by 4,358.09 kt, an increase in R2 score by 0.07 and enhanced accuracy by 0.60%. Additionally, a comparative analysis of the repeated holdout methods and cross-validation techniques was conducted, with results revealing that repeated holdout had a more significant impact on model performance. After excluding outliers, the average improvement in crossvalidation results, due to the repeated holdout method, was a decrease of 783.32 kt for RMSE, a reduction of 1,289.39 kt for MAE, and an increase of 0.88% for accuracy. The extent to which the repeated holdout method improved the performance of ML models that were integrated with cross-validation techniques, was correlated with the initial model performance. For ML models with RMSE and MAE scores equal to or exceeding 15,000 kt, the findings indicate that the repeated holdout methods studied should enhance performance by at least 2,000 kt. Similarly, an improvement of nearly 3% or higher in accuracy was noted, when the crossvalidation value for this metric was 94% or lower. The AdaBoost model, integrated with repeated holdout, was selected as the optimal model, as evidenced by the results, for forecasting CO2E in SA from 2021 to 2027. The forecasted CO2E trends validate that energy production and economic growth are indeed the primary drivers of CO2E in SA, as previously highlighted by the CPA model. This underscores the importance of addressing these factors to effectively mitigate carbon emissions in the country. Moreover, the forecasted results indicate that SA is unlikely to meet the global temperature limit of 1.5 degrees Celsius by 2030, given the trajectory showing a shortfall in achieving the target level of 334 million tonnes (Mt) of CO2E, agreed upon in the Paris Agreement. However, the country did meet its CO2E commitments outlined in the 2030 National Development Plan, showing some progress towards environmental sustainability. Nonetheless, the failure to meet these targets at their lower ranges suggests the need for further efforts to reduce carbon emissions, which is crucial for aligning with the Paris Agreement objectives and achieving a zero net emission rate by 2050. This highlights the importance of ongoing initiatives to enhance environmental policies and practices in SA. Future research should focus on integrating load-shedding dynamics into the analysis to examine and confirm its effects on energy production, economic growth, and CO2E in SA. Additionally, future research should focus on forecasting future change-points for the socio-economic indicators or variables utilised in this study. This can help policymakers anticipate fluctuations and devise proactive strategies, to address environmental and economic challenges effectively. It is also recommended that future research consider the output of renewable energy production, when analysing CO2E trends.Item Experimental comparison of support vector machines with random forests for hyperspectral image land cover classification(Indian Academy of Sciences, 2014-06-12) Marwala, T.; Abe, B. T.; Olugbara, Oludayo O.The performances of regular support vector machines and random forests are experimentally com-pared for hyperspectral imaging land cover classification. Special characteristics of hyperspectral imaging dataset present diverse processing problems to be resolved under robust mathematical formalisms such as image classification. As a result, pixel purity index algorithm is used to obtain endmember spectral responses from Indiana pine hyperspectral image dataset. The generalized reduced gradient optimiza-tion algorithm is thereafter executed on the research data to estimate fractional abundances in the hyperspectral image and thereby obtain the numeric values for land cover classification. The Waikato environment for knowledge analysis (WEKA) data mining framework is selected as a tool to carry out the classification process by using support vector machines and random forests classifiers. Results show that performance of support vector machines is comparable to that of random forests. This study makes a positive contribution to the problem of land cover classification by exploring generalized reduced gra-dient method, support vector machines, and random forests to improve producer accuracy and overall classification accuracy. The performance comparison of these classifiers is valuable for a decision maker to consider tradeoffs in method accuracy versus method complexity.Item Exploring first-year engineering student perceptions of the engineering librarian as an IL instructor in multimodal teaching and learning environments(Emerald, 2023-12-08) Omarsaib, MousinThis study aims to explore first-year engineering students’ perceptions of the engineering librarian as an instructor in multimodal environments related to Information Literacy (IL) topics, teaching strategy, content evaluation, organising, planning and support. Design/methodology/approach A quantitative approach was used through a survey instrument based on an online questionnaire. Questions were adopted and modified from a lecturer evaluation survey. A simple random sampling technique was used to collect data from first-year cohorts of engineering students in 2020 and 2022. Findings Respondents perception of the engineering librarian as an instructor in multimodal learning environment was good. Findings revealed students’ learning experiences were aligned with IL instruction even though the environment changed from blended to online. However, an emerging theme that continuously appeared was a lack of access to technology. Practical implications These findings may help in developing and strengthening the teaching identity of academic librarians as instructors in multimodal learning environments. Originality/value To the best of the author’s knowledge, this study is novel in that it evaluates the teaching abilities of an academic librarian in multimodal environments through the lens of students.Item Hyperspectral image classification using random forests and neural networks(International Association of Engineers, 2012) Abe, B. T.; Olugbara, Oludayo O.; Marwala, T.Spectral unmixing of hyperspectral images are based on the knowledge of a set of unknown endmembers. Unique characteristics of hyperspectral dataset enable different processing problems to be resolved using robust mathematical logic such as image classification. Consequently, pixel purity index is used to find endmembers from Washington DC mall hyperspectral image dataset. The generalized reduced gradient algorithm is used to estimate fractional abundances in the hyperspectral image dataset. The WEKA data mining tool is selected to construct random forests and neural networks classifiers from the set of fractional abundances. The performances of these classifiers are experimentally compared for hyperspectral data land cover classification. Results show that random forests give better classification accuracy when compared to neural networks. The study proffers solution to the problem associated with land cover classification by exploring generalized reduced gradient approach with learning classifiers to improve overall classification accuracy. The classification accuracy comparison of classifiers is important for decision maker to consider tradeoffs in accuracy and complexity of methods.Item An implementation of SAP enterprise resource planning : a case study of the South African revenue services and taxation sectors(Informa UK Limited, 2023) Aroba, Oluwasegun Julius; Abayomi, AbdultaofeekA SAP enterprise resource planning (ERP) is a software system that assists organizations in automating and managing fundamental business processes for ideal performance. This research study aims to ameliorate the business operation problems of the South African Revenue Services (SARS) and Taxation sectors. Involving tax sectors in the preliminary stages of SAP ERP design and implementation saves SARS’ clients some resources such as money and time while also allowing various departments to improve their tax technology ecosystem. To address the associated financial, operational, technical, and compliance challenges, an ERP implementation requiring significant support from strategy execution is suggested. This proposed model for designing and implementing an ERP with a case study of the South African Revenue Services (SARS) and other taxation sectors, the benefits of ERP system within the taxation sector, implementation challenges, and proposed solutions are presented in this article while utilising data from a survey that was conducted for 50 SARS employees and taxpayers outside the organization. The proposed ERP system will enhance the connectivity of all operations within the SARS with a central access to all departments rather than having silos of business operations. From our analysis, the Cronbach report of 0.85 obtained, which is greater than 0.7 minimum, shows that it fits the proposed solution of a SAP ERP mobile app for inclusivity in the operational processes for both SARS employees and other taxpayers.Item A modelling approach to elephant and tree population dynamics for a small game farm(2005) Stretch, Anne-MarieThroughout Africa, growing human populations and resulting loss of wildlife habitat is a critical issue for most animal species. It is more and more common for privately owned small or medium sized farms to reintroduce wildlife on their land and such protected areas are fast becoming the only refuges available to wild animals. However a comprehensive understanding of the complex ecological processes taking place is vital for the effective management of restricted areas and the conservation of biodiversity. Due to the enormous complexity of an ecological system and the long periods of the related dynamics, it is very difficult to analyse the interaction between animals and plant populations without suitable computer models. In this thesis, the dynamics between elephant and trees (a major food source) are considered using computer simulations.Item A modelling approach to elephant and tree population dynamics for a small game farm(2005) Stretch, Anne-MarieThroughout Africa, growing human populations and resulting loss of wildlife habitat is a critical issue for most animal species. It is more and more common for privately owned small or medium sized farms to reintroduce wildlife on their land and such protected areas are fast becoming the only refuges available to wild animals. However a comprehensive understanding of the complex ecological processes taking place is vital for the effective management of restricted areas and the conservation of biodiversity. Due to the enormous complexity of an ecological system and the long periods of the related dynamics, it is very difficult to analyse the interaction between animals and plant populations without suitable computer models. In this thesis, the dynamics between elephant and trees (a major food source) are considered using computer simulations.