Faculty of Accounting and Informatics
Permanent URI for this communityhttp://ir-dev.dut.ac.za/handle/10321/1
Browse
Search Results
Item Recognition of speech emotion in Call Centre conversations in a multilingual environment(2021-10-15) Zvarevashe, KudakwasheThe use of customer call centres has increased exponentially in the modern business world and is the heart of marketing in the customer services industry. Previous studies have shown that the quality of services that customers receive from the call centres paint a picture of how they view the company. Reliance on the use of suggestion boxes to crowdsource customer views on call centre services is not adequate and at times, may not give a correct record about the services in question. Therefore, speech emotion recognition has been applied in customer call centres as a tool for evaluating customer service perception, emotion, and sentiment. This approach presents several advantages, for instance, the performance of call centre agents can adequately be scrutinised because their emotions can be automatically classified based on machine learning methods for emotion recognition. In recent times, various techniques and methods have been used to develop robust speech emotion recognition systems for customer call centres, but the primary problem associated with these novel applications is that most of them do not perform well in multilingual environments. In addition, most of the proposed models do not properly recognise the fear archetype of emotion. The effectiveness of a speech emotion recognition system depends largely on the strength of the features used. Consequently, the purpose of this research was to discover the most efficacious features in recognising speech emotion in call centre conversations. Therefore, this thesis reports on the development of hybrid acoustic features based on spectral and prosodic descriptors. The set of hybrid features proposed in this study comprises the logarithm of energy, fundamental frequency, zero-crossing rate, spectral roll- off point, spectral flux, spectral centroid, spectral compactness, spectral variability, fast Fourier transform, Mel frequency cepstral coefficients, and linear prediction cepstral coefficients. Furthermore, this thesis reports on the development of a novel stacked ensemble machine learning algorithm based on a combination of inducers and ensemble classifiers. The discovery of effective speech emotion features and the development of an efficient machine learning algorithm are essential stages of effective speech emotion recognition in call centre conversations. The verification and validation of the proposed speech emotion recognition methods based on feature extraction and feature classification for applications in call centre conversions were done using a series of experiments. This was accomplished by testing the crafted hybrid acoustic features on five distinct speech emotion databases. The acoustic features were evaluated against deep learning auto-generated features and a hybrid of popular acoustic features. In addition, a set of four ensemble algorithms were evaluated against the newly invented stacked ensemble algorithm. The performance of the developed stacked ensemble algorithm in this study was analysed based on the widely used statistical evaluation metrics of accuracy, precision, F-score, area under the receiver operating characteristic curve and computation time. The results have indeed demonstrated that the newly developed stacked ensemble algorithm coupled with the crafted hybrid acoustic features have consistently performed better than many other state-of-the-art algorithms and speech features across various standard speech corpora.