Abstract:
Speaker accent influences the accuracy of automatic speech recognition (ASR) systems. Knowledge of accent based acoustic variations can therefore be used in the develop¬ment of more robust systems. This project investigates the differences between first language (L1) and second language (L2) English in South Africa with respect to vowels and diphthongs. The study is specifically aimed at L2 English speakers with a native African mother tongue, for instance speakers of isi-Zulu, isi-Xhosa, Tswana or South Sotho. The vowel systems of English and African languages, as described in the linguistic literature, are compared to predict the expected deviations of L2 South African English from L1. A number of vowels and diphthongs from L1 and L2 speakers are acoustically compared and the results are correlated with the linguistic predictions. The comparison is firstly made in formant space using the first three formants found using the Split Levinson algorithm. The L1 vowel centroids and diphthong trajectories in this three-dimensional space are then compared to their L2 counterparts using analysis of variance. The second analysis method is based on simple hidden Markov models (HMMs) using Mel-scaled cepstral features. Each HMM models a vowel or diphthong from one of the two speaker groups and analysis of variance is again used to compare the L1 and L2 HMMs. Significant differences are found in the vowel and diphthong qualities of the two language groups which supports the linguistically predicted effects such as vowel substitution, peripheralisation and changes in diphthong strength. The long-term goal of this project is to enable the adaptation of existing L1 English recognition systems to perform equally well on South African L2 English.