Statistical pattern recognition based on LVQ artificial neural networks : application to TATA box motif
dc.contributor.advisor | Bajic, Vladimir B. | |
dc.contributor.author | Wang, Haiyan | en_US |
dc.date.accessioned | 2017-01-31T06:46:01Z | |
dc.date.available | 2017-01-31T06:46:01Z | |
dc.date.issued | 2000 | |
dc.description | Dissertation submitted in compliance with the requirements for Masters Degree in Technology in the Department of Electrical Engineering (Light Current, Technikon Natal, Durban, South Africa, 2000. | en_US |
dc.description.abstract | The computational analysis of eukaryotic promoters are among the most important and complex research domains that may contribute to complete gene identification. The current methods for promoter recognition are not sufficiently developed. Eukaryotic promoters contain a number of short motifs that may be used in promoter recognition. Having good computational models for these motifs can be crucial for increased efficiency of promoter recognition programs. This study proposes a combined statistical and LVQ neural network system as a computational model of the TAT A box motif of eukaryotic promoters. The methodology used is universal and applicable to any short functional motif in DNA. The statistical analysis of the core TAT A motif hexamer and its neighboring haxamers show strong regularities that can be used in motif recognition. Moreover, the positional distribution of the TAT A motif in terms of its distance from the transcription start site is very regular and is used in the statistical modeling. Furthermore, the matching score of the position weight matrix for the motif was used as a part of the model. Based on these statistical properties. a novel LV Q classifier for TAT A motif recognition is developed. The characteristics of the method are that the genetic algorithm was used for finding good initial weights of the LV Q system, while fine tuning of two LVQ networks was done by the lvq? algorithm. The final computational model is developed for a recognition level of 67.8o/c correct recognition on the test set with less than 1% false recognition. This model is evaluated in the task of promoter recognition on an independent test set. The results in promoter recognition outperform three other promoter recognition programs. It is shown that the recognition of promoters based on the recognition of the TAT A motifs using this new model is superior to the recognition based on the currently used position weight matrix description of this motif. | en_US |
dc.description.level | M | en_US |
dc.format.extent | 120 p | en_US |
dc.identifier.doi | https://doi.org/10.51415/10321/1861 | |
dc.identifier.other | 124035 | |
dc.identifier.uri | http://hdl.handle.net/10321/1861 | |
dc.language.iso | en | en_US |
dc.subject.lcsh | Genetics | en_US |
dc.subject.lcsh | Neural networks (Computer science) | en_US |
dc.subject.lcsh | Human genome | en_US |
dc.subject.lcsh | Information storage and retrieval systems | en_US |
dc.subject.lcsh | Punched card systems | en_US |
dc.title | Statistical pattern recognition based on LVQ artificial neural networks : application to TATA box motif | en_US |
dc.type | Thesis | en_US |