Machine learning : a data-point approach to solving misclassifications in the imbalanced Credit Card Datasets

Mqadi, Nhlakanipho Michael

Machine learning : a data-point approach to solving misclassifications in the imbalanced Credit Card Datasets

dc.contributor.advisor	Naicker, N.
dc.contributor.advisor	Adeliyi, Timothy Temitope
dc.contributor.author	Mqadi, Nhlakanipho Michael	en_US
dc.date.accessioned	2022-01-20T04:56:29Z
dc.date.available	2022-01-20T04:56:29Z
dc.date.issued	2021-10-30
dc.description	Submitted in fulfilment of the requirement for the degree of Master of Information and Communications Technology, Durban University of Technology, Durban, South Africa, 2021.	en_US
dc.description.abstract	Machine learning (ML) uses algorithms with the complexity to iterate over massive datasets to analyse the data for past behaviour with the aim to predict future outcomes. Financial institutions are using ML to detect Credit Card Fraud (CCF) by learning the patterns that distinguish between legitimate and fraudulent actions from historic data of credit card transactions to combat CCF. The market economic order has been negatively affected by CCF, which has contributed to low consumer confidence in financial institutions, and loss of interest from investors. The CCF loses continue increasing every year despite existing efforts to prevent fraud, which amount to billions of dollars lost annually. ML techniques consume large volumes of historical credit card transaction data as examples for learning. In ordinary credit card datasets, there are far fewer fraudulent transactions than legitimate transactions. In dealing with the credit card data imbalance problem, the ideal solution must have low bias, low variance, and high accuracy. The aim of this study was to provide an in-depth experimental investigation of the effect of using the data-point approach to resolve the class misclassification problem in imbalanced credit card datasets. The study focused on finding a novel way to handle imbalanced data, to improve the performance of ML algorithms in identifying fraud or anomaly patterns in massive amounts of financial transaction records, where the class distribution was imbalanced. The experiment led to the introduction of two unique multi-level hybrid data-point approach solutions, namely, Feature Selection with Near Miss Undersampling; and Feature Selection with SMOTe based Oversampling. The results were obtained using four widely used ML algorithms, namely, Random Forest, Support Vector Machine, Decision Tree, and Logistic Regression to build the classifiers. These algorithms were implemented for classification of credit card datasets and the performance was assessed using selected performance metrics. The findings show that using the data-point approach improved the predictive accuracy of the ML fraud detection solution.	en_US
dc.description.level	M	en_US
dc.format.extent	122 p	en_US
dc.identifier.doi	https://doi.org/10.51415/10321/3797
dc.identifier.uri	https://hdl.handle.net/10321/3797
dc.language.iso	en	en_US
dc.subject	Machine learning (ML)	en_US
dc.subject	Credit Card Fraud (CCF)	en_US
dc.subject	Credit card datasets	en_US
dc.subject	ML fraud detection	en_US
dc.subject.lcsh	Credit cards	en_US
dc.subject.lcsh	Data sets	en_US
dc.subject.lcsh	Database management	en_US
dc.subject.lcsh	Credit cards--Security measures--South Africa	en_US
dc.subject.lcsh	Credit card fraud	en_US
dc.subject.lcsh	Identity theft--South Africa--Prevention	en_US
dc.title	Machine learning : a data-point approach to solving misclassifications in the imbalanced Credit Card Datasets	en_US
dc.type	Thesis	en_US
local.sdg	SDG07

Files

Original bundle

Now showing 1 - 1 of 1

Name:: MqadiN_Masters_2021.pdf
Size:: 3.99 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.22 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and dissertations (Accounting and Informatics)