Cross-language acoustic adaptation for automatic speech recognition

Nieuwoudt, Christoph

UPSpace Home
→
University of Pretoria: Research Output
→
Theses and Dissertations (University of Pretoria)
→
View Item

dc.contributor.advisor	Botha, Elizabeth C.	en
dc.contributor.postgraduate	Nieuwoudt, Christoph	en
dc.date.accessioned	2013-09-07T09:38:51Z
dc.date.available	2005-01-06	en
dc.date.available	2013-09-07T09:38:51Z
dc.date.created	2000-04-08	en
dc.date.issued	2006-01-06	en
dc.date.submitted	2005-01-06	en
dc.description	Thesis (PhD (Electrical, Electronic and Computer Engineering))--University of Pretoria, 2006.	en
dc.description.abstract	Speech recognition systems have been developed for the major languages of the world, yet for the majority of languages there are currently no large vocabulary continuous speech recognition (LVCSR) systems. The development of an LVCSR system for a new language is very costly, mainly because a large speech database has to be compiled to robustly capture the acoustic characteristics of the new language. This thesis investigates techniques that enable the re-use of acoustic information from a source language, in which a large amount of data is available, in implementing a system for a new target language. The assumption is that too little data is available in the target language to train a robust speech recognition system on that data alone, and that use of acoustic information from a source language can improve the performance of a target language recognition system. Strategies for cross-language use of acoustic information are proposed, including training on pooled source and target language data, adaptation of source language models using target language data, adapting multilingual models using target language data and transforming source language data to augment target language data for model training. These strategies are allied with Bayesian and transformation-based techniques, usually used for speaker adaptation, as well as with discriminative learning techniques, to present a framework for cross-language re-use of acoustic information. Extensions to current adaptation techniques are proposed to improve the performance of these techniques specifically for cross-language adaptation. A new technique for transformation-based adaptation of variance parameters and a cost-based extension of the minimum classification error (MCE) approach are proposed. Experiments are performed for a large number of approaches from the proposed framework for cross-language re-use of acoustic information. Relatively large amounts of English speech data are used in conjunction with smaller amounts of Afrikaans speech data to improve the performance of an Afrikaans speech recogniser. Results indicate that a significant reduction in word error rate (between 26% and 50%, depending on the amount of Afrikaans data available) is possible when English acoustic data is used in addition to Afrikaans speech data from the same database (i.e both sets of data were recorded under the same c`12onditions and the same labelling process was used). For same-database experiments, best results are achieved for approaches that train models on pooled source and target language data and then perform further adaptation of the models using Bayesian or discriminative techniques on target language data only. Experiments are also performed to evaluate the use of English data from a different database than the Afrikaans data. Peak reductions in word error rate of between 16% and 35% are delivered, depending on the amount of Afrikaans data available. Best results are achieved for an approach that performs a simple transformation of source model parameters using target language data, and then performs Bayesian adaptation of the transformed model on target language data.	en
dc.description.availability	unrestricted	en
dc.description.department	Electrical, Electronic and Computer Engineering	en
dc.identifier.citation	Nieuwoudt, C 2000, Cross-language acoustic adaptation for automatic speech recognition, PhD thesis, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/26974 >	en
dc.identifier.upetdurl	http://upetd.up.ac.za/thesis/available/etd-01062005-071829/	en
dc.identifier.uri	http://hdl.handle.net/2263/26974
dc.language.iso		en
dc.publisher	University of Pretoria	en_ZA
dc.rights	© 2000, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.	en
dc.subject	Minimum classification error adaptation	en
dc.subject	Transformation-based adaptation	en
dc.subject	Bayesian adaptation	en
dc.subject	Cross-language acoustic adaptation	en
dc.subject	Multilingual speech recognition	en
dc.subject	UCTD	en_US
dc.title	Cross-language acoustic adaptation for automatic speech recognition	en
dc.type	Thesis	en