Automatic speech recognition of the isiZulu language

Shezi Nokwanda

Automatic speech recognition of the isiZulu language

dc.contributor.advisor	Reddy, Seren
dc.contributor.author	Shezi Nokwanda	en_US
dc.date.accessioned	2022-01-19T10:56:29Z
dc.date.available	2022-01-19T10:56:29Z
dc.date.issued	2021-12-01
dc.description	Submitted in fulfillment of the degree of Master of Engineering at the Department of Electronic and Computer Engineering in the Faculty of Engineering and the Built Environment at the Durban University of Technology, 2021.	en_US
dc.description.abstract	A key component of artificial intelligence is human-to-machine communication. Such communication has been realised through virtual assistants such as Apple's Siri, Google's Now, Amazon's Alexa, etc. This technology is made possible through Automatic Speech Recognition (ASR). Only in recent years have the previously marginalised or developing countries started researching ASR for their indigenous languages. This research focuses on ASR in isiZulu, which is one of South Africa's most spoken indigenous language. The research involves two main fields of study i.e., digital signal processing (DSP) and machine learning (ML). DSP was applied in word boundary estimation and feature extraction. Machine learning was used to convert the work boundary estimation and feature extraction. Machine learning was used to convert the word boundary estimation problem to a classification problem as well as for word recognition. Word boundary estimation achieved an accuracy of 68.4%, which is on par with the current research. the Mel-frequency cepstrum coefficient (MFCC) was used for the feature extraction of the speech and deep neural networks were chosen for the ML component. For the detection and classification of a word in a sentence, the trained neural network was tested by considering the effect of including and excluding explicit boundaries on the overall recognition. Word recognition accuracy with manually demarcated boundaries was 78.18%. In sentence recognition accuracy achieved without demarcated boundaries was 17.74% while a 23.28% accuracy was achieved without demarcated using classification. While in-sentence recognition accuracy for the two algorithms was both low, the accurately recognised words were determined by different heuristics. Other factors, such as the complex differences between the indigenous isiZulu languages and other more commonly spoken languages, are also highlighted and further research avenues are proposed.	en_US
dc.description.level	M	en_US
dc.format.extent	132 p	en_US
dc.identifier.doi	https://doi.org/10.51415/10321/3794
dc.identifier.uri	https://hdl.handle.net/10321/3794
dc.language.iso	en	en_US
dc.subject.lcsh	Automatic speech recognition	en_US
dc.subject.lcsh	Zulu language--Data processing	en_US
dc.subject.lcsh	Signal processing--Digital techniques	en_US
dc.subject.lcsh	Natural language processing (Computer science)	en_US
dc.subject.lcsh	Machine learning	en_US
dc.title	Automatic speech recognition of the isiZulu language	en_US
dc.type	Thesis	en_US
local.sdg	SDG17

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Nokwanda Shezi.pdf
Size:: 11.91 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.22 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and dissertations (Engineering and Built Environment)