Automatic speech recognition of the isiZulu language

Reddy, SerenShezi Nokwanda2022-01-192022-01-192021-12-01https://hdl.handle.net/10321/3794Submitted in fulfillment of the degree of Master of Engineering at the Department of Electronic and Computer Engineering in the Faculty of Engineering and the Built Environment at the Durban University of Technology, 2021.A key component of artificial intelligence is human-to-machine communication. Such communication has been realised through virtual assistants such as Apple's Siri, Google's Now, Amazon's Alexa, etc. This technology is made possible through Automatic Speech Recognition (ASR). Only in recent years have the previously marginalised or developing countries started researching ASR for their indigenous languages. This research focuses on ASR in isiZulu, which is one of South Africa's most spoken indigenous language. The research involves two main fields of study i.e., digital signal processing (DSP) and machine learning (ML). DSP was applied in word boundary estimation and feature extraction. Machine learning was used to convert the work boundary estimation and feature extraction. Machine learning was used to convert the word boundary estimation problem to a classification problem as well as for word recognition. Word boundary estimation achieved an accuracy of 68.4%, which is on par with the current research. the Mel-frequency cepstrum coefficient (MFCC) was used for the feature extraction of the speech and deep neural networks were chosen for the ML component. For the detection and classification of a word in a sentence, the trained neural network was tested by considering the effect of including and excluding explicit boundaries on the overall recognition. Word recognition accuracy with manually demarcated boundaries was 78.18%. In sentence recognition accuracy achieved without demarcated boundaries was 17.74% while a 23.28% accuracy was achieved without demarcated using classification. While in-sentence recognition accuracy for the two algorithms was both low, the accurately recognised words were determined by different heuristics. Other factors, such as the complex differences between the indigenous isiZulu languages and other more commonly spoken languages, are also highlighted and further research avenues are proposed.132 penAutomatic speech recognitionZulu language--Data processingSignal processing--Digital techniquesNatural language processing (Computer science)Machine learningAutomatic speech recognition of the isiZulu languageThesishttps://doi.org/10.51415/10321/3794