Cross-language acoustic adaptation for automatic speech recognition

Cross-language acoustic adaptation for automatic speech recognition

Files

00front.pdf (2.01 MB)

01chapter1.pdf (3.64 MB)

02chapter2.pdf (4.36 MB)

03chapter3.pdf (11.87 MB)

04chapter4.pdf (8.04 MB)

Date

2006-01-06

Authors

Nieuwoudt, Christoph

Publisher

University of Pretoria

Abstract

Speech recognition systems have been developed for the major languages of the world, yet for the majority of languages there are currently no large vocabulary continuous speech recognition (LVCSR) systems. The development of an LVCSR system for a new language is very costly, mainly because a large speech database has to be compiled to robustly capture the acoustic characteristics of the new language. This thesis investigates techniques that enable the re-use of acoustic information from a source language, in which a large amount of data is available, in implementing a system for a new target language. The assumption is that too little data is available in the target language to train a robust speech recognition system on that data alone, and that use of acoustic information from a source language can improve the performance of a target language recognition system. Strategies for cross-language use of acoustic information are proposed, including training on pooled source and target language data, adaptation of source language models using target language data, adapting multilingual models using target language data and transforming source language data to augment target language data for model training. These strategies are allied with Bayesian and transformation-based techniques, usually used for speaker adaptation, as well as with discriminative learning techniques, to present a framework for cross-language re-use of acoustic information. Extensions to current adaptation techniques are proposed to improve the performance of these techniques specifically for cross-language adaptation. A new technique for transformation-based adaptation of variance parameters and a cost-based extension of the minimum classification error (MCE) approach are proposed. Experiments are performed for a large number of approaches from the proposed framework for cross-language re-use of acoustic information. Relatively large amounts of English speech data are used in conjunction with smaller amounts of Afrikaans speech data to improve the performance of an Afrikaans speech recogniser. Results indicate that a significant reduction in word error rate (between 26% and 50%, depending on the amount of Afrikaans data available) is possible when English acoustic data is used in addition to Afrikaans speech data from the same database (i.e both sets of data were recorded under the same c`12onditions and the same labelling process was used). For same-database experiments, best results are achieved for approaches that train models on pooled source and target language data and then perform further adaptation of the models using Bayesian or discriminative techniques on target language data only. Experiments are also performed to evaluate the use of English data from a different database than the Afrikaans data. Peak reductions in word error rate of between 16% and 35% are delivered, depending on the amount of Afrikaans data available. Best results are achieved for an approach that performs a simple transformation of source model parameters using target language data, and then performs Bayesian adaptation of the transformed model on target language data.

Description

Thesis (PhD (Electrical, Electronic and Computer Engineering))--University of Pretoria, 2006.

Keywords

Minimum classification error adaptation, Transformation-based adaptation, Bayesian adaptation, Cross-language acoustic adaptation, Multilingual speech recognition, UCTD

Citation

Nieuwoudt, C 2000, Cross-language acoustic adaptation for automatic speech recognition, PhD thesis, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/26974 >

URI

http://hdl.handle.net/2263/26974

Collections

Theses and Dissertations (University of Pretoria)
Theses and Dissertations (Electrical, Electronic and Computer Engineering)

Full item page

Cross-language acoustic adaptation for automatic speech recognition

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Sustainable Development Goals

Citation

URI

Collections