Meta-heuristic search methods for big data analytics and visualization of frequently changed patterns
Date
2019-03-20
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Throughout the world, data plays a prominent role in making decisions relevant to the socio-economic growth of organizations. As organizations grow, they tend to use diverse technologies or platforms to collect data and make data readily available for quick decision-making. These technologies have resulted in exponential growth of data whereby the problem of managing this data in a limited time interval increases in complexity, starting from the preprocessing stage to the visualization stage. Apart from the issue of managing the huge growth of data, finding a suitable method to manage certain aspects of this frequently changed data has been overlooked. These frequent changes in data form the topic of interest of this thesis. Consequently, there is a need to develop a framework both to manage big data at different stages of processing, from preprocessing to visualization, and to handle frequently changed data. The need to develop such a framework arises because traditional methods/algorithms are limited to finding frequent patterns of frequently occurring items while overlooking frequently changed data, which has a numeric and time dimension that can provide interesting business insights. Additionally, traditional visualization methods are challenged with performance scalability and response time. This thesis looked at resolving this limitation by using a meta-heuristic/bio-inspired algorithm that is modelled based on observation of the behavior and characteristics of two different animals, namely the kestrel and the dung beetle. The motivation behind the use of these animals is their ability to explore, exploit and adapt to different situations in their natural environment. The development of the computational model and testing with actual data were formulated as a six-step procedure. Based on the six steps, the proposed computational model was evaluated against selected comparative algorithms, namely BAT, WSA-MP, PSO, Firefly and ACO. The main findings on optimal value/results suggest that, in handling frequently changed data during the data preprocessing, pattern discovery and visualization stages, the proposed computational models performed optimally against the comparative meta-heuristic algorithms on test datasets. Further statistical tests, using the Wilcoxon signed rank test, were conducted on optimal results from the comparative meta-heuristic algorithms. The basis for using the statistical procedure was to select the best choice of algorithm without making any underlying assumption on accuracy of results from the comparative meta-heuristic algorithms. Theoretically, the study contributes to enhancing frequency of item frameworks by including time and numeric dimensions of item occurrence. Practically, the contribution of the study lies in its finding frequently changed patterns in big data analytics. Additionally, the concept of half-life of substances/trails was applied as part of the computational model, and this also forms part of the unique contribution of this thesis. The half-life constitutes the lifetime of interestingness of recent patterns that were discovered. In summary, this thesis is about the mathematical formulation of animal behavior and characteristics into an implementable big data management algorithm and its application to frequently changed patterns.
Description
Submitted in fulfilment of the requirements of the degree of Doctor of Philosophy in Information Technology (IT), Durban University of Technology, Durban, South Africa. 2019.
Keywords
Citation
DOI
https://doi.org/10.51415/10321/3372