This paper describes a series of experiments that compare different approaches to training a speaker-independent continuous-speech digit recognizer using the CSLU Toolkit. Comparisons are made between the Hidden Markov Model (HMM) and Neural Network (NN) approaches. In addition, a description of the CSLU Toolkit research environment is given.
The CSLU Toolkit is a research and development software environment that provides a powerful and flexible tool for creating and using spoken language systems for telephone and PC applications. In particular, the CSLU-HMM, the CSLU-NN, and the CSLU-FBNN development environments, with which our experiments were implemented, will be described in detail and recognition results will be compared.
Our speech corpus is OGI 30K-Numbers, which is a collection of spontaneous ordinal and cardinal numbers, continuous digit strings and isolated digit strings. The utterances were recorded by having a large number of people recite their ZIP code, street address, or other numeric information over the telephone. This corpus represents a very noisy and difficult recognition task.
Our best results (98% word recognition, 92% sentence recognition), obtained with the FBNN architecture, suggest the effectiveness of the CSLU Toolkit in building real-life speech recognition systems.
Connected Digit Recognition Experiments with the OGI Toolkit's Neural Network and HMM-Based Recognizers
Publication type:
Contributo in atti di convegno
Source:
Interactive Voice Technology for Telecommunications Applications, 1998. 1998 IEEE 4th Workshop IVTTA-ETWR '98, pp. 135–140, Turin, Italy, 29-30 September, 1998
Date:
1998
Resource Identifier:
http://www.cnr.it/prodotto/i/241059
http://www.pd.istc.cnr.it/Papers/PieroCosi/cp-IVTTA98.pdf
Language:
Eng