Yuhas
The visual speech signals from around the mouth complement the acoustic speech signals and can often be used by humans to improve speech intelligibility. How can these visual speech signals be used to improve the performance of an automatic speech recognition system in a noisy environment? This thesis attempts to answer this question by looking at the performance of humans and then exploring methods of automatically processing visual speech signals.
This thesis uses two approaches to automatically interprete mouth images with neural networks. At first, the visual signals are treated categorically and an attempt is made to identify phonemes directly from the images. This demonstrated that the neural network could obtain speech information from these visual speech signals. At the same time, it raised questions as to the most efficient form of this information. In the second approach, the neural networks are asked to estimate of the acoustic speech directly from the visual signals. This interpretation does not require any symbolic identification of the speech signals and provides a way of reducing the ambiguity in a noise-degraded acoustic signal. It is shown that vowel recognition rates can be significantly improved by augmenting the noise-degraded acoustic signals with acoustic estimates obtained from the visual signals. The results obtained with the neural network are compared with human performance and more traditional estimation and pattern recognition algorithms.
A central component of this work is the use of simulated parallel distributed architectures. Traditional serial digital computers require us to retreat to the symbolic level of computation rather quickly. In contrast, parallel distributed architectures allow us to explore alternative approaches that use distributed representations that would otherwise be computationally prohibitive. One reason that humans are able to understand speech in noise is because we are able to process and integrate information from multiple sources at the same time. Massively-parallel architectures may provide the power to replicate these prcesses.