Sparse coding for speech recognition

dc.contributor.advisorBarnard, E.en
dc.contributor.emailwillie.smit@gmail.comen
dc.contributor.postgraduateSmit, Willem Jacobusen
dc.date.accessioned2013-09-07T15:36:20Z
dc.date.available2008-12-11en
dc.date.available2013-09-07T15:36:20Z
dc.date.created2008-09-02en
dc.date.issued2008-12-11en
dc.date.submitted2008-11-11en
dc.descriptionThesis (PhD)--University of Pretoria, 2008.en
dc.description.abstractThe brain is a complex organ that is computationally strong. Recent research in the field of neurobiology help scientists to better understand the working of the brain, especially how the brain represents or codes external signals. The research shows that the neural code is sparse. A sparse code is a code in which few neurons participate in the representation of a signal. Neurons communicate with each other by sending pulses or spikes at certain times. The spikes send between several neurons over time is called a spike train. A spike train contains all the important information about the signal that it codes. This thesis shows how sparse coding can be used to do speech recognition. The recognition process consists of three parts. First the speech signal is transformed into a spectrogram. Thereafter a sparse code to represent the spectrogram is found. The spectrogram serves as the input to a linear generative model. The output of themodel is a sparse code that can be interpreted as a spike train. Lastly a spike train model recognises the words that are encoded in the spike train. The algorithms that search for sparse codes to represent signals require many computations. We therefore propose an algorithm that is more efficient than current algorithms. The algorithm makes it possible to find sparse codes in reasonable time if the spectrogram is fairly coarse. The system achieves a word error rate of 19% with a coarse spectrogram, while a system based on Hidden Markov Models achieves a word error rate of 15% on the same spectrograms.en
dc.description.availabilityunrestricteden
dc.description.departmentElectrical, Electronic and Computer Engineeringen
dc.identifier.citationa 2008en
dc.identifier.otherD535/gmen
dc.identifier.upetdurlhttp://upetd.up.ac.za/thesis/available/etd-11112008-151309/en
dc.identifier.urihttp://hdl.handle.net/2263/29409
dc.language.isoen
dc.publisherUniversity of Pretoriaen_ZA
dc.rights© University of Pretoria 2008 D535/en
dc.subjectMathematical optimizationen
dc.subjectSpike train classificationen
dc.subjectSpike trainen
dc.subjectSpeech recognitionen
dc.subjectSparse codeen
dc.subjectLinear generative modelen
dc.subjectSparse code measurementen
dc.subjectDictionary trainingen
dc.subjectOvercomplete dictionaryen
dc.subjectSpectrogramen
dc.subjectUCTDen_US
dc.titleSparse coding for speech recognitionen
dc.typeThesisen

Files

Original bundle

Now showing 1 - 5 of 5
Loading...
Thumbnail Image
Name:
00front.pdf
Size:
40.13 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
01chapters1-2.pdf
Size:
487.53 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
02chapters3-4.pdf
Size:
241.03 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
03references.pdf
Size:
59 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Complete.pdf
Size:
796.51 KB
Format:
Adobe Portable Document Format