Abstract:
This dissertation investigates the cost-complexity-performance relationship between two automatic language identification systems. The first is a state-of-the-art archi¬tecture, trained on about three hours of phonetically hand-labelled telephone speech obtained from the recognised OGLTS corpus. The second system, introduced by our¬selves, is a simpler design with a smaller, less complex parameter space. It is a vector quantisation-based approach which bears some resemblance to a system suggested by Sugiyama. Though trained on the same data, it has no need for any labels and is therefore less costly. A number of experiments are performed to find quasi-optimal parameters for the two systems. In further experiments the systems are evaluated and compared on a set of ten two-language tasks, spanning five languages. The more com¬plex system is shown to have a substantial performance advantage over the simpler design - 81% versus 65% on 40 seconds of speech. However, both results are well under reported state-of-the-art performance of 94% and would suggest that our systems can benefit from additional attention to implementation detail and optimisation of various parameters. Given the above, our suggested architecture may potentially provide an adequate solution where the high development cost associated with state-of-the-art technology and the necessary training corpora are prohibitive.