Repository logo
 

Recognition of speech emotion in Call Centre conversations in a multilingual environment

Thumbnail Image

Date

2021-10-15

Authors

Zvarevashe, Kudakwashe

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The use of customer call centres has increased exponentially in the modern business world and is the heart of marketing in the customer services industry. Previous studies have shown that the quality of services that customers receive from the call centres paint a picture of how they view the company. Reliance on the use of suggestion boxes to crowdsource customer views on call centre services is not adequate and at times, may not give a correct record about the services in question. Therefore, speech emotion recognition has been applied in customer call centres as a tool for evaluating customer service perception, emotion, and sentiment. This approach presents several advantages, for instance, the performance of call centre agents can adequately be scrutinised because their emotions can be automatically classified based on machine learning methods for emotion recognition. In recent times, various techniques and methods have been used to develop robust speech emotion recognition systems for customer call centres, but the primary problem associated with these novel applications is that most of them do not perform well in multilingual environments. In addition, most of the proposed models do not properly recognise the fear archetype of emotion. The effectiveness of a speech emotion recognition system depends largely on the strength of the features used. Consequently, the purpose of this research was to discover the most efficacious features in recognising speech emotion in call centre conversations. Therefore, this thesis reports on the development of hybrid acoustic features based on spectral and prosodic descriptors. The set of hybrid features proposed in this study comprises the logarithm of energy, fundamental frequency, zero-crossing rate, spectral roll- off point, spectral flux, spectral centroid, spectral compactness, spectral variability, fast Fourier transform, Mel frequency cepstral coefficients, and linear prediction cepstral coefficients. Furthermore, this thesis reports on the development of a novel stacked ensemble machine learning algorithm based on a combination of inducers and ensemble classifiers. The discovery of effective speech emotion features and the development of an efficient machine learning algorithm are essential stages of effective speech emotion recognition in call centre conversations. The verification and validation of the proposed speech emotion recognition methods based on feature extraction and feature classification for applications in call centre conversions were done using a series of experiments. This was accomplished by testing the crafted hybrid acoustic features on five distinct speech emotion databases. The acoustic features were evaluated against deep learning auto-generated features and a hybrid of popular acoustic features. In addition, a set of four ensemble algorithms were evaluated against the newly invented stacked ensemble algorithm. The performance of the developed stacked ensemble algorithm in this study was analysed based on the widely used statistical evaluation metrics of accuracy, precision, F-score, area under the receiver operating characteristic curve and computation time. The results have indeed demonstrated that the newly developed stacked ensemble algorithm coupled with the crafted hybrid acoustic features have consistently performed better than many other state-of-the-art algorithms and speech features across various standard speech corpora.

Description

Submitted in fulfilment of the requirements of the Degree of Doctor of Philosophy in Information Technology, Durban University of Technology, Durban, South Africa, 2021.

Keywords

Customer call centres, Speech emotion recognition

Citation

DOI

https://doi.org/10.51415/10321/3801

Endorsement

Review

Supplemented By

Referenced By