Tamura
The authors propose a method of detecting and normalizing pattern position by the backpropagation network (BN) with three layers. First, they clarify the troublesome situation of restoring the shifted position by BPN experimentally, using a one-dimensional signal. Next, it is shown that by cascading two networks, one can detect and normalize the signal position and demonstrate its usefulness. This method is applied to a two-dimensional image, and its validity is confirmed.
We make a comparison of classification ability between BPN (BackPropagation Neural Network) and k-NN (k-Nearest Neighbor) classification methods. Voice data and patellar subluxation images are used. The result was that the average recognition rate of BPN was 9.2 percent higher than that of the k-NN classification method. Although k-NN classification is simple in theory, classification time was fairly long. Therefore, it seems that real time recognition is difficult. On the other hand, the BPN method has a long learning time but a very short recognition time. Especially if the number of dimensions of the samples is large, it can be said that BPN is better than k-NN in classification ability.
The authors describe a method of detecting and normalizing pattern location by back propagation network (BPN) with three layers. First, the authors clarify the troublesome situation of compensating the shifted position by BPN experimentally, using one dimensional signals. Next, they propose a new method of solving this problem by cascading two networks where the signal location is detected and normalized, and they demonstrate its usefulness. They show the weight distribution has a characteristic of Fourier-series expansion. Finally they apply the method to 2-dimensional image. They compare three methods. As a result, the 2-dimensional location signal method that expands pattern and location signals into 2-dimensional ones is best from the view point of convergence in learning phase. The method of cascading two one-dimensional networks which normalize pattern first in horizontal direction by simply binding one-dimensional networks and then in vertical direction by the same method is considered. Although the correct answer rate of position normalization is not as good as the others, it does not need learning in 2-dimensional space and its normalization processing is fastest.
Intensive studies have been made on individual classifications by facial images and analysis of facial expressions, etc. using neural networks from the viewpoint of learning. The authors carried out experimental classifications by 8*8, 16*16 and 32*32 mosaic facial images. As a result, man-and-woman classifications can be made with probability of 87% from 8*8 unknown images which are very difficult to analyze.
Automatic syllable recognition is adversely affected by noises and similar syllables, Lip-reading may improve recognition. The authors propose such a system, for a finite number of words, using a microphone and an X-Y tracker. The tracker performs initial picture processing and yields co-ordinate pairs. These are processed together with sounds. The weight coefficients to change the importance of visual or voice data are introduced and selected to get the best performance for some registered words. The experiments were carried out to prove that the proposed system can improve the recognition rate in the presence of continuous noises. The results show improvements of about 10% and about 20% for SNRs of 35 dB and 26 dB respectively.
Describes a neural approach intended to improve the performance of an automatic speech recognition system for unrestricted speakers by using not only voice sound features but also image features of the mouth shape. In particular, the authors used the natural sample voice signals and mouth shape images that were acquired in the general environment, neither in the sound isolation room nor under specific lighting conditions. The FFT power spectrum of acoustic speech was used as the voice feature. In addition, the gray level image, binary image and geometrical shape features of the mouth were used as the compensatory information, and compared to find which kinds of image features were effective. This method can be applied not only to the improvement of voice recognition, but also to aid the communication of hearing-impaired people.
The paper describes a neural approach intended to improve the performance of a voice recognition system for unrestricted speakers using not only voice sound features but also image features of the mouth shape. The FFT power spectrum of acoustic speech was used as the voice feature. In addition, the gray-level image, binary image, and geometrical shape features of the mouth were used as the compensatory information and a comparison made of which kinds of image features are effective for voice recognition by a neural network. For unrestricted speakers, a vowel recognition rate of about 80 percent was obtained using only voice features. However, this increased to some 92 percent when voice features plus binary images were used. This method can be applied not only to the improvement of voice recognition, but also to aid the communication of hearing-impaired people.
This paper describes a neural approach intended to improve the performance of a voice recognition device by using not only voice sound features but also image features of the mouth shape. The FFT power spectrum was used as the voice feature. In addition, the gray level image, binary image, and geometrical shape features of the mouth was tested for comparison to check which kinds of features are effective for voice recognition by a neural network. For unrestricted speakers, a vowel recognition rate of about 80% was obtained using voice only features, but this increased to some 92% when voice features plus binary images were used. This method can be applied not only to the improvement of the voice recognition, but also to aid the communication of hearing impaired people.
The article mentions: lip contour extraction by zero-crossing method, lip contour tracking.