Repository logo
 

Automatic speech recognition of the isiZulu language

dc.contributor.advisorReddy, Seren
dc.contributor.authorShezi Nokwandaen_US
dc.date.accessioned2022-01-19T10:56:29Z
dc.date.available2022-01-19T10:56:29Z
dc.date.issued2021-12-01
dc.descriptionSubmitted in fulfillment of the degree of Master of Engineering at the Department of Electronic and Computer Engineering in the Faculty of Engineering and the Built Environment at the Durban University of Technology, 2021.en_US
dc.description.abstractA key component of artificial intelligence is human-to-machine communication. Such communication has been realised through virtual assistants such as Apple's Siri, Google's Now, Amazon's Alexa, etc. This technology is made possible through Automatic Speech Recognition (ASR). Only in recent years have the previously marginalised or developing countries started researching ASR for their indigenous languages. This research focuses on ASR in isiZulu, which is one of South Africa's most spoken indigenous language. The research involves two main fields of study i.e., digital signal processing (DSP) and machine learning (ML). DSP was applied in word boundary estimation and feature extraction. Machine learning was used to convert the work boundary estimation and feature extraction. Machine learning was used to convert the word boundary estimation problem to a classification problem as well as for word recognition. Word boundary estimation achieved an accuracy of 68.4%, which is on par with the current research. the Mel-frequency cepstrum coefficient (MFCC) was used for the feature extraction of the speech and deep neural networks were chosen for the ML component. For the detection and classification of a word in a sentence, the trained neural network was tested by considering the effect of including and excluding explicit boundaries on the overall recognition. Word recognition accuracy with manually demarcated boundaries was 78.18%. In sentence recognition accuracy achieved without demarcated boundaries was 17.74% while a 23.28% accuracy was achieved without demarcated using classification. While in-sentence recognition accuracy for the two algorithms was both low, the accurately recognised words were determined by different heuristics. Other factors, such as the complex differences between the indigenous isiZulu languages and other more commonly spoken languages, are also highlighted and further research avenues are proposed.en_US
dc.description.levelMen_US
dc.format.extent132 pen_US
dc.identifier.doihttps://doi.org/10.51415/10321/3794
dc.identifier.urihttps://hdl.handle.net/10321/3794
dc.language.isoenen_US
dc.subject.lcshAutomatic speech recognitionen_US
dc.subject.lcshZulu language--Data processingen_US
dc.subject.lcshSignal processing--Digital techniquesen_US
dc.subject.lcshNatural language processing (Computer science)en_US
dc.subject.lcshMachine learningen_US
dc.titleAutomatic speech recognition of the isiZulu languageen_US
dc.typeThesisen_US
local.sdgSDG17

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
Nokwanda Shezi.pdf
Size:
11.91 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: