Phoneme duration modelling for speaker verification

Van Heerden, Charl Johannes

UPSpace Home
→
University of Pretoria: Research Output
→
Theses and Dissertations (University of Pretoria)
→
View Item

dc.contributor.advisor	Barnard, E.	en
dc.contributor.postgraduate	Van Heerden, Charl Johannes	en
dc.date.accessioned	2013-09-07T01:04:42Z
dc.date.available	2009-06-29	en
dc.date.available	2013-09-07T01:04:42Z
dc.date.created	2009-04-15	en
dc.date.issued	2009-06-29	en
dc.date.submitted	2009-06-26	en
dc.description	Dissertation (MEng)--University of Pretoria, 2009.	en
dc.description.abstract	Higher-level features are considered to be a potential remedy against transmission line and cross-channel degradations, currently some of the biggest problems associated with speaker verification. Phoneme durations in particular are not altered by these factors; thus a robust duration model will be a particularly useful addition to traditional cepstral based speaker verification systems. In this dissertation we investigate the feasibility of phoneme durations as a feature for speaker verification. Simple speaker specific triphone duration models are created to statistically represent the phoneme durations. Durations are obtained from an automatic hidden Markov model (HMM) based automatic speech recognition system and are modeled using single mixture Gaussian distributions. These models are applied in a speaker verification system (trained and tested on the YOHO corpus) and found to be a useful feature, even when used in isolation. When fused with acoustic features, verification performance increases significantly. A novel speech rate normalization technique is developed in order to remove some of the inherent intra-speaker variability (due to differing speech rates). Speech rate variability has a negative impact on both speaker verification and automatic speech recognition. Although the duration modelling seems to benefit only slightly from this procedure, the fused system performance improvement is substantial. Other factors known to influence the duration of phonemes are incorporated into the duration model. Utterance final lengthening is known be a consistent effect and thus “position in sentence” is modeled. “Position in word” is also modeled since triphones do not provide enough contextual information. This is found to improve performance since some vowels’ duration are particularly sensitive to its position in the word. Data scarcity becomes a problem when building speaker specific duration models. By using information from available data, unknown durations can be predicted in an attempt to overcome the data scarcity problem. To this end we develop a novel approach to predict unknown phoneme durations from the values of known phoneme durations for a particular speaker, based on the maximum likelihood criterion. This model is based on the observation that phonemes from the same broad phonetic class tend to co-vary strongly, but that there is also significant cross-class correlations. This approach is tested on the TIMIT corpus and found to be more accurate than using back-off techniques.	en
dc.description.availability	unrestricted	en
dc.description.department	Electrical, Electronic and Computer Engineering	en
dc.identifier.citation	2008 Please cite as follows Van Heerden, CJ 2008, Pnoneme duration modelling for speaker verification, MEng dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/25869 >	en
dc.identifier.other	E1309/gm	en
dc.identifier.upetdurl	http://upetd.up.ac.za/thesis/available/etd-06262009-150945/	en
dc.identifier.uri	http://hdl.handle.net/2263/25869
dc.language.iso		en
dc.publisher	University of Pretoria	en_ZA
dc.rights	©University of Pretoria 2008 Please cite as follows Van Heerden, CJ 2008, Pnoneme duration modelling for speaker verification, MEng dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://upetd.up.ac.za/thesis/available/etd-06262009-150945/ > E1309/	en
dc.subject	Eigen vectors	en
dc.subject	Speech rate normalization	en
dc.subject	Speaker verification	en
dc.subject	Phoneme durations	en
dc.subject	Duration modeling	en
dc.subject	Prosodic features	en
dc.subject	Hidden markov models	en
dc.subject	Gaussian mixture models	en
dc.subject	Maximum likelihood	en
dc.subject	UCTD	en_US
dc.title	Phoneme duration modelling for speaker verification	en
dc.type	Dissertation	en