<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PSL</title>
	<atom:link href="http://mambo.ucsc.edu/feed" rel="self" type="application/rss+xml" />
	<link>http://mambo.ucsc.edu</link>
	<description>Perceptual Science Lab</description>
	<lastBuildDate>Tue, 25 Oct 2011 17:58:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Goldschen</title>
		<link>http://mambo.ucsc.edu/goldschen.html</link>
		<comments>http://mambo.ucsc.edu/goldschen.html#comments</comments>
		<pubDate>Wed, 18 May 2011 02:44:51 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Speechreading (Lipreading)]]></category>

		<guid isPermaLink="false">http://mambo.ucsc.edu/?p=306</guid>
		<description><![CDATA[Goldschen Goldschen, Alan J (1993) Continuous Automatic Speech Recognition by Lipreading, Ph.D. Dissertation, George Washington University, Washington, D.C., September 1993.This study describes the design and implementation of a novel continuous speech recognizer that uses optical information from the oral-cavity shadow [...]]]></description>
			<content:encoded><![CDATA[<h1><img src="../psl/goldschen.jpg" alt="" /> Goldschen</h1>
<ul>
<li> Goldschen, Alan J (1993) <em>Continuous Automatic Speech Recognition by Lipreading</em>, Ph.D. Dissertation, George Washington University, Washington, D.C., September 1993.This study describes the design and implementation of a novel continuous speech recognizer that uses optical information from the oral-cavity shadow of a speaker.  The system uses hidden Markov models (HMMs) trained to discriminate optical information and achieves a recognition rate of 25.3 percent on 150 test sentences.  This is the first system to accomplish continuous optical automatic speech recognition (OASR).  This level of performance &#8211; without the use of syntactical, semantic, or any other contextual guide to the recognition process &#8211; indicates that OASR may be used as a major supplement for robust multi-modal recognition in noisy environments.  Additionally, new features important for OASR were discovered, and novel approaches to vector quantization, training, and clustering were utilized.
<p>This study contains three major components.  First, it hypothesize 35 static and dynamic optical features to characterize the shadow of the oral-cavity for the speaker.  Using the corresponding correlation matrix and a principal component analysis, the study discarded 22 oral-cavity features.  The remaining 13 oral-cavity features are mostly dynamic features, unlike the static features used by previous researchers.  Second, the study merged phonemes that appear optically similar on the speaker&#8217;s oral-cavity region into visemes.  The visemes were objectively analyzed and discriminated using HMM and clustering algorithms.  Most significantly, the visemes for the speaker, obtained through computation, are consistent with the phoneme-to-viseme mapping discussed by most lipreading experts.  This similarity, in a sense, verifies the selection of oral-cavity features. Third, the study trained the HMMs to recognize, without a grammar, a set of sentences having a perplexity of 150, using visemes, trisemes (triplets of visemes), and generalized trisemes (clustered trisemes).  The system recognition rates of 2 percent, 12.7 percent, and 25.3 using, respectively, viseme HMMs, triseme HMMs, and triseme HMMs.</p>
<p>The study concludes that methodologies used in this investigation demonstrate the need for further research on continuous OASR and on the integration of optical information with other recognition methods.  While this study focuses on the feasibility, validity, and segregated contribution of exclusively continuous OASR, future highly robust recognition systems should combine optical and acoustic information with syntactic, semantic and pragmatic aids.</li>
<li> Garcia, Oscar, Goldschen, Alan J., &amp; <a href="../psl/petajan.html">Petajan, Eric D.</a> (1992) Feature extraction for optical automatic speech recognition or automatic lipreading. <em>Technical Report GWU-IIST-92-32</em>, Department of Electrical Engineering and Computer Science, George Washington University, Washington, D.C., November 1992.There is evidence that information from the oral cavity region of a speaker&#8217;s face can enhance the robustness of classical acoustic automatic speech recognition systems.  We describe experimental data and research to determine the less correlated, but discriminating, features of the oral cavity region of a speaker for optical automatic speech recognition.  We reduced our feature space from 35 to 13 features using a correlation matrix, principal component analysis, and heuristics.  We include a description of the database and describe previous research that helped us to determine our initial features.  This investigation demonstrates the importance of the dynamic aspects of the optical perception of certain speech facial articulation features for speech recognition by humans and machines.  These results should be of significant value for the design of more robust speech recognizers that utilize both optical and acoustic information, and for the teaching of lipreading to the hearing impaired.
<p>Index Terms &#8212; feature extraction, feature analysis, lipreading, facial expression, speech recognition, optical speech recognition, multimodal speech recognition.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://mambo.ucsc.edu/goldschen.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Watanabe and Kohda</title>
		<link>http://mambo.ucsc.edu/watanabe-and-kohda.html</link>
		<comments>http://mambo.ucsc.edu/watanabe-and-kohda.html#comments</comments>
		<pubDate>Wed, 18 May 2011 02:43:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Speechreading (Lipreading)]]></category>

		<guid isPermaLink="false">http://mambo.ucsc.edu/?p=303</guid>
		<description><![CDATA[Watanabe &#38; Kohda Watanabe, T. &#38; Kohda, M. (1990) Lip-reading of Japanese vowels using neural networks.]]></description>
			<content:encoded><![CDATA[<h1><img src="../psl/watko.jpg" alt="" /> Watanabe &amp; Kohda</h1>
<p>Watanabe, T. &amp; Kohda, M. (1990) Lip-reading of Japanese vowels using neural networks.</p>
]]></content:encoded>
			<wfw:commentRss>http://mambo.ucsc.edu/watanabe-and-kohda.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tamura</title>
		<link>http://mambo.ucsc.edu/tamura.html</link>
		<comments>http://mambo.ucsc.edu/tamura.html#comments</comments>
		<pubDate>Wed, 18 May 2011 02:41:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Speechreading (Lipreading)]]></category>

		<guid isPermaLink="false">http://mambo.ucsc.edu/?p=299</guid>
		<description><![CDATA[Tamura Shinichi Tamura is currently working at Morooka Orthopedic Hospital at Seimeikai Medical Corporation. He attended Kyushu Electric Technology vocational college from 1989 &#8211; 1991, with an interest in focusing on AV equipment. Shinichi matriculated from the College of Electrical [...]]]></description>
			<content:encoded><![CDATA[<h1><img src="../psl/tamura2.jpg" alt="" /> <a href="../psl/tamuraa.html"> Tamura</a></h1>
<p>Shinichi Tamura is currently working at Morooka Orthopedic Hospital at  Seimeikai Medical Corporation.   He attended Kyushu Electric Technology  vocational college from 1989 &#8211; 1991, with an interest in focusing on AV  equipment.  Shinichi matriculated from the College of Electrical  Engineering.  After graduating, Shinichi went to work for Victor Service  &amp; Engineering at JVC KENWOOD Holdings, INC.   There his primary  duties included IT/Network systems design, Field management of  construction sites, AV equipment and Facilities design, and JVC product  repair.  After a few months, Shinichi began working as the Secretary  General and Systems Engineer for Morooka Orthopedic Hosptial.<a href="tamura.html#bio">*</a></p>
<ul>
<li> Tamura, S., Kawai, H., and Mitsumoto, H. (1993) Sex identification from 8 x 8 low resolution face images by neural network. <em>Med. Imag. Tech.</em> vol.11, (no.3):295-6.</li>
<li> Tamura, S., Taketani, H., Mitsumoto, H., Okazaki, H., and others (1992) Detection and normalization of pattern position by neural network. in <em>RNNS/IEEE Symposium on Neuroinformatics and Neurocomputers</em> (Cat. NO.92TH0483-8). New York, NY, USA: IEEE, p. 959-77 vol.2 of 2 vol. xxi+1270 pp.The authors propose a method of detecting and normalizing pattern position by the backpropagation network (BN) with three layers. First, they clarify the troublesome situation of restoring the shifted position by BPN experimentally, using a one-dimensional signal. Next, it is shown that by cascading two networks, one can detect and normalize the signal position and demonstrate its usefulness. This method is applied to a two-dimensional image, and its validity is confirmed.</li>
<li> Kim, E.-K., Wu, J.-T., Tamura, S., Sato, Y., Close, R. Taketani, H., Kawai, H., Inoue, M., and Ono, K. (1993) Comparison of neural network and k-NN classification methods in vowel and patellar subluxation image recognitions. <em>International Journal of Pattern Recognition and Artificial Intelligence</em>, vol.7, (no.4):775-82.We make a comparison of classification ability between BPN (BackPropagation Neural Network) and k-NN (k-Nearest Neighbor) classification methods. Voice data and patellar subluxation images are used. The result was that the average recognition rate of BPN was 9.2 percent higher than that of the k-NN classification method. Although k-NN classification is simple in theory, classification time was fairly long. Therefore, it seems that real time recognition is difficult. On the other hand, the BPN method has a long learning time but a very short recognition time. Especially if the number of dimensions of the samples is large, it can be said that BPN is better than k-NN in classification ability.</li>
<li> Taketani, H., Mitsumoto, H., Tamura, S., Okazaki, K., and others (1992) Detection and normalization of pattern location by neural network. <em>Transactions of the Institute of Electronics, Information and Communication Engineers D-II</em>, vol.J75D-II, (no.7):1260-70 (Japanese).The authors describe a method of detecting and normalizing pattern location by back propagation network (BPN) with three layers. First, the authors clarify the troublesome situation of compensating the shifted position by BPN experimentally, using one dimensional signals. Next, they propose a new method of solving this problem by cascading two networks where the signal location is detected and normalized, and they demonstrate its usefulness. They show the weight distribution has a characteristic of Fourier-series expansion. Finally they apply the method to 2-dimensional image. They compare three methods. As a result, the 2-dimensional location signal method that expands pattern and location signals into 2-dimensional ones is best from the view point of convergence in learning phase. The method of cascading two one-dimensional networks which normalize pattern first in horizontal direction by simply binding one-dimensional networks and then in vertical direction by the same method is considered. Although the correct answer rate of position normalization is not as good as the others, it does not need learning in 2-dimensional space and its normalization processing is fastest.</li>
<li> Kawai, H. and Tamura, S. (1992) Man-and-woman and individual classifications by mosaic facial images with different resolutions using neural network. <em>Journal of the Institute of Television Engineers of Japan</em>, vol.46, (no.1):93-6 (Japanese).Intensive studies have been made on individual classifications by facial images and analysis of facial expressions, etc. using neural networks from the viewpoint of learning. The authors carried out experimental classifications by 8*8, 16*16 and 32*32 mosaic facial images. As a result, man-and-woman classifications can be made with probability of 87% from 8*8 unknown images which are very difficult to analyze.</li>
<li> Furuya, T., Soeda, M., Kurosu, K., and Tamura, S. (1991) Speech recognition with lip movement data using an X-Y tracker. <em>Transactions of the Society of Instrument and Control Engineers</em>, vol.27, (no.8):958-65 (Japanese).Automatic syllable recognition is adversely affected by noises and similar syllables, Lip-reading may improve recognition. The authors propose such a system, for a finite number of words, using a microphone and an X-Y tracker. The tracker performs initial picture processing and yields co-ordinate pairs. These are processed together with sounds. The weight coefficients to change the importance of visual or voice data are introduced and selected to get the best performance for some registered words. The experiments were carried out to prove that the proposed system can improve the recognition rate in the presence of continuous noises. The results show improvements of about 10% and about 20% for SNRs of 35 dB and 26 dB respectively.</li>
<li> Wu, J.-T., Tamura, S., Mitsumoto, H., Kawai, H., Kurosu, K., and Okazaki, K. (1991) Neural network vowel-recognition jointly using voice features and mouth shape image. <em>Pattern Recognition</em>, vol.24, (no.10):921-7.Describes a neural approach intended to improve the performance of an automatic speech recognition system for unrestricted speakers by using not only voice sound features but also image features of the mouth shape. In particular, the authors used the natural sample voice signals and mouth shape images that were acquired in the general environment, neither in the sound isolation room nor under specific lighting conditions. The FFT power spectrum of acoustic speech was used as the voice feature. In addition, the gray level image, binary image and geometrical shape features of the mouth were used as the compensatory information, and compared to find which kinds of image features were effective. This method can be applied not only to the improvement of voice recognition, but also to aid the communication of hearing-impaired people.</li>
<li> Wu, J.-T., Tamura, S., Mitsumoto, H., Kawai, H., and others (1991) Speaker-independent vowel recognition combining voice features and mouth shape image with neural network. <em>Systems and Computers in Japan</em>, vol.22, (no.4):100-9.The paper describes a neural approach intended to improve the performance of a voice recognition system for unrestricted speakers using not only voice sound features but also image features of the mouth shape. The FFT power spectrum of acoustic speech was used as the voice feature. In addition, the gray-level image, binary image, and geometrical shape features of the mouth were used as the compensatory information and a comparison made of which kinds of image features are effective for voice recognition by a neural network. For unrestricted speakers, a vowel recognition rate of about 80 percent was obtained using only voice features. However, this increased to some 92 percent when voice features plus binary images were used. This method can be applied not only to the improvement of voice recognition, but also to aid the communication of hearing-impaired people.</li>
<li> Wu, J.-T., Tamura, S., Mitsumoto, H., Kawai, H., Kurosu, K., and Okazaki, K. (1990) Neural network vowel-recognition jointly using voice features and mouth shape image. <em>Transactions of the Institute of Electronics, Information and Communication Engineers D-II</em>, vol.J73D-II, (no.8):1309-14 (Japanese).This paper describes a neural approach intended to improve the performance of a voice recognition device by using not only voice sound features but also image features of the mouth shape. The FFT power spectrum was used as the voice feature. In addition, the gray level image, binary image, and geometrical shape features of the mouth was tested for comparison to check which kinds of features are effective for voice recognition by a neural network. For unrestricted speakers, a vowel recognition rate of about 80% was obtained using voice only features, but this increased to some 92% when voice features plus binary images were used. This method can be applied not only to the improvement of the voice recognition, but also to aid the communication of hearing impaired people.</li>
<li> Mitsumoto, H., Okazaki, K., Okazaki, K., Kajimi, N., Tamura, S., Kawai, H., and Fukui, Y. (1990) Lip contour extraction, complement, and tracing by using energy function and optical flow. <em>Transactions of the Information Processing Society of Japan</em>, vol.31, (no.3):444-53 (Japanese).The article mentions: lip contour extraction by zero-crossing method, lip contour tracking.</li>
<li> Tamura, S., Kawasaki, S. (1988) Recognition of sign language motion images. <em>Pattern Recognition</em> vol.21, (no.4): 343-353.</li>
</ul>
<p><a name="bio">*</a> Bibliographical and publication information gathered by Stephanie, who frequently writes for websites such as <a href="http://reallycheaphealthinsurance.com/affordable-car-insurance">http://www.reallycheaphealthinsurance.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://mambo.ucsc.edu/tamura.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bregler</title>
		<link>http://mambo.ucsc.edu/bregler.html</link>
		<comments>http://mambo.ucsc.edu/bregler.html#comments</comments>
		<pubDate>Wed, 18 May 2011 02:39:34 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Speechreading (Lipreading)]]></category>

		<guid isPermaLink="false">http://mambo.ucsc.edu/?p=296</guid>
		<description><![CDATA[Bregler Bregler, C., Hild, H., Manke, S., &#38; Waibel, A. (1993) Improving connected letter recognition by lipreading. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (IEEE-ICASSP), Minneapolis, MN.In this paper we show how recognition performance in automated [...]]]></description>
			<content:encoded><![CDATA[<h1><img src="../psl/bregler.jpg" alt="" /> <a href="http://www.cs.berkeley.edu/%7Ebregler"> Bregler </a></h1>
<ul>
<li> <a href="../psl/Bregler/icassp93.bregler.hild.manke.waibel.ps"> Bregler, C., Hild, H., Manke, S., &amp; Waibel, A. (1993) Improving connected letter recognition by lipreading. <em>Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (IEEE-ICASSP)</em>, Minneapolis, MN.</a>In this paper we show how recognition performance in automated speech perception can be significantly improved by additional Lipreading, so called &#8220;speech-reading&#8221;. We show this on an extension of an existing state-of-the-art speech recognition system, a modular MS-TDNN. The acoustic and visual speech data is preclassified in two separate front-end phoneme TDNNs and combined to acoustic-visual hypotheses for the Dynamic Time Warping algorithm. This is shown on a connected word recognition problem, the notoriously difficult letter spelling task. With speech-reading we could reduce the error rate up to half of the error rate of the pure acoustic recognition.</li>
<li> <a href="../psl/Bregler/nips93.final.ps"> Bregler, C. &amp; Omohundro, S. (1994) Surface Learning with Applications to Lip-Reading In J.D. Cowan, G. Tesauro, &amp; J. Alspector (Eds), <em>Advances in Neural Information Processing Systems 6.</em> San Francisco, CA: Morgan Kaufman Publishers</a>Most connectionist research has focused on learning mappings  from one space to another (eg. classification and regression). This paper introduces the more general task of learning  constraint surfaces.  It describes a simple but powerful  architecture for learning and manipulating nonlinear surfaces  from data.  We demonstrate the technique on low dimensional  synthetic surfaces and compare it to nearest neighbor  approaches.  We then show its utility in learning the space of  lip images in a system for improving speech recognition by lip reading.  This learned surface is used to improve the visual tracking performance during recognition.</li>
<li> <a href="../psl/Bregler/icassp94.final.ps"> Bregler, C. &amp; Konig, Y. (1993) &#8220;Eigenlips&#8221; for Robust Speech Recognition In <em>Proceedings of the Int. Conf. on Acoustics Speech and Signal Processing (IEEE-ICASSP), 1994</em>, Adelaide, Australia.</a>In this study we improve the performance of a hybrid  connectionist speech recognition system by incorporating  visual information about the corresponding lip movements.  Specifically, we investigate the benefits of adding visual  features in the presence of additive noise and crosstalk  (cocktail party effect).  Our study extends previous  experiments by using a new visual front end, and an  alternative architecture for combining the visual and acoustic information.  Furthermore, we have extended our recognizer to  a multi-speaker, connected letters recognizer.  Our results  show a significant improvement for the combined architecture  (acoustic and visual information) over just the acoustic  system in the presence of additive noise and crosstalk.</li>
<li> <img src="../psl/icsi.jpg" alt="" align="top" /></li>
</ul>
<p>When I got pregnant <a href="http://www.pregnancyprimer.com/" target="_blank">PregnancyPrimer.com</a> was my #1 source for any pregnancy related info.</p>
]]></content:encoded>
			<wfw:commentRss>http://mambo.ucsc.edu/bregler.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NTT</title>
		<link>http://mambo.ucsc.edu/ntt.html</link>
		<comments>http://mambo.ucsc.edu/ntt.html#comments</comments>
		<pubDate>Wed, 18 May 2011 02:37:51 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Speechreading (Lipreading)]]></category>

		<guid isPermaLink="false">http://mambo.ucsc.edu/?p=293</guid>
		<description><![CDATA[NTT Akimoto, T., Suenega, Y., &#38; Wallace, R.S. (1993) Automatic creation of 3D facial models. IEEE Computer Graphics &#38; Applications, 13, 5, 16-22. Mase, K. (1990). An application of optical flow &#8211; extraction of facial expression. IAPR Workshop on Machine [...]]]></description>
			<content:encoded><![CDATA[<h1><img src="../psl/ntt1.jpg" alt="" /> NTT</h1>
<ul>
<li> Akimoto, T., Suenega, Y., &amp; Wallace, R.S. (1993) Automatic creation of 3D facial models. <em>IEEE Computer Graphics &amp; Applications, 13</em>, 5, 16-22.</li>
<li> Mase, K. (1990). An application of optical flow &#8211; extraction of facial expression. <em>IAPR Workshop on Machine Vision and Applications</em>, 195-198.</li>
<li> Mase, K. (1991). Recognition of facial expression from optical flow. <em>IEICE Transactions, E 74</em>, 10, 3474-3483.</li>
<li> Mase, K. &amp; <a href="../psl/pentland.html">Pentland, A.</a> (1990). Lip reading by optical flow, <em>IEICE of Japan, J73-D-II, 6</em>, 796- 803.</li>
<li> Mase, K. &amp; <a href="../psl/pentland.html">Pentland, A.</a> (1990). Automatic lipreading by computer. <em>Trans. Inst. Elec. Info. and Comm. Eng., J73-D-II(6), 796-803</em>.</li>
<li> Mase, K. &amp; <a href="../psl/pentland.html">Pentland, A.</a> (1991), Automatic Lipreading by Optical &#8211; Flow Analysis, <em>Systems and Computers in Japan, 22</em>, N06.</li>
<li> Mase, K., Watanabe, Y., &amp; Suenaga, Y. (1990). A real time head motion detection system. <em>Proceedings SPIE, 1260</em>, 262-269,.</li>
<li> <a href="../psl/pentland.html">Pentland, A.</a> &amp; Mase, K. (1989), Lipreading: Automatic visual recognition of spoken words. <em>Proc. Image Understanding and Machine Vision</em>, Optical Society of America, June 12-14.</li>
<li> <a href="http://www.ntt.co.jp/"> <img src="../psl/mosaic.jpg" alt="" /> NTT Home Page </a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://mambo.ucsc.edu/ntt.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stork</title>
		<link>http://mambo.ucsc.edu/stork.html</link>
		<comments>http://mambo.ucsc.edu/stork.html#comments</comments>
		<pubDate>Wed, 18 May 2011 02:35:50 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Speechreading (Lipreading)]]></category>

		<guid isPermaLink="false">http://mambo.ucsc.edu/?p=290</guid>
		<description><![CDATA[Stork Dr. Daniel Stork obtained a B.S. degree in Physics from Massachusetts Institute of Technology (MIT) and a M.S. in Physics from the University of Maryland, College Park. He matriculated from the University of Maryland, College Park in 1984 with [...]]]></description>
			<content:encoded><![CDATA[<h1><img src="../psl/ricoh.jpg" alt="" /> <a href="http://www.cs.indiana.edu/finger/gateway?stork@crc.ricoh.com"> </a>Stork</h1>
<p>Dr. Daniel Stork obtained a B.S. degree in Physics from Massachusetts  Institute of Technology (MIT) and a M.S. in Physics from the University  of Maryland, College Park.  He matriculated from the University of  Maryland, College Park in 1984 with a Ph.D. in physics.  His doctoral  thesis, &#8220;Determination of symmetry and phase in human visual response  functions: Theory and experiments,&#8221; was written under the advisement of  Prof.  David Falk.   He holds 40 US patents relating to topics varying  from visual physics and speech recognition systems to speech extraction  and color encoding systems, and holds the honor of Ricoh Junior Patent  Master (2008).  Dr. Stork has authored over 120 publications, and he has  written 10 complete books, including Seeing the Light: Optics in  nature, photography, color vision and holography (Wiley), which leads  among textbooks for artistic optics, and Computer image analysis in the  study of art (SPIE) and Computer vision and image analysis of art  (SPIE), which are both considered the first books within the discipline  at large.  He has been selected as Fellow of the International  Association of Pattern Recognition (2008), and is currently a Senior  member of both the Institute of Electronics and Electrical Engineers  (IEEE) and the Association for Computing Machinery (ACM).  He has worked  as a member of several international journal editorial boards and  delivered nearly 300 seminars, lectures, and conference presentations.   He was awarded the title Distinguished Lecturer by the Association of  Computing Machinery (1998-1999).  Dr. Stork has held academic  appointments or taught at Stanford University in over 5 different  departments.  In addition, he has held faculty positions at Wellesley  College, Swarthmore College, Clark University, and Boston University in  various science and mathematics departments.  Dr. Stork has devoted time  to developing an appreciation for the arts as well as the sciences,  studying art history at Wellesley College and subsequently became  Artist-in-Residence through the New York State Council of the Arts.  Dr.  Stork has held corporate positions in addition to his faculty positions  with Neural Ware (Chief Scientist), Neural Applications Corporation  (Scientific Advisory Board), and Ricoh Innovations (Senior Research  Scientist).   He currently is employed by Ricoh Innovations as Chief  Scientist.*</p>
<p>Selected Publications by Stork</p>
<ul>
<li> <a href="../psl/tr93-26.ps"> Prasad, K.V., Stork D.G., Wolff G. (1993) Preprocessing video images for neural learning of lipreading. <em>Ricoh California Research Center, Technical Report CRC-TR-93-26</em>.</a></li>
<li> Stork D.G., Wolff G. &amp; Levine E. (1992) Neural network lipreading system for improved speech recognition. <em>Ricoh California Research Center, Technical Report CRC-TR-92-01</em>.</li>
<li> Stork, D. G., Wolff, G., &amp; Levine, E. (1992) Neural network lipreading system for improved speech recognition. <em>Proceedings of the 1992 International Joint Conference on Neural Networks</em>, Baltimore, MD.</li>
</ul>
<p>* Bibliography information contributed by Dr. Matthew J Memmott using reference material from the Ricoh Innovations website (rii.ricoh.com/~stork/). Dr. Memmott matriculated with his PhD from MIT, and may be contacted through <a href="http://www.online--degree.com/">online college degrees</a>, a website that he currently manages.</p>
]]></content:encoded>
			<wfw:commentRss>http://mambo.ucsc.edu/stork.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Petajan</title>
		<link>http://mambo.ucsc.edu/petajan.html</link>
		<comments>http://mambo.ucsc.edu/petajan.html#comments</comments>
		<pubDate>Wed, 18 May 2011 02:34:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Speechreading (Lipreading)]]></category>

		<guid isPermaLink="false">http://mambo.ucsc.edu/?p=287</guid>
		<description><![CDATA[Petajan Brooke, N.M. &#38; Petajan E.D. (1986), Seeing Speech : Investigations into the Synthesis and Recognition of Visible Speech Movements Using Automatic Image Processing and Computer Graphics, Proceedings of the International Conference on Speech Input and Output : Techniques and [...]]]></description>
			<content:encoded><![CDATA[<h1><img src="../psl/lucent.jpg" alt="" /> <a href="http://www.lucent.com/work/family/docs/petajan.html"> Petajan</a></h1>
<ul>
<li> <a href="../psl/brooke.html">Brooke, N.M.</a> &amp; Petajan E.D. (1986), Seeing Speech : Investigations into the Synthesis and Recognition of Visible Speech Movements Using Automatic Image Processing and Computer Graphics, <em>Proceedings of the International Conference on Speech Input and Output : Techniques and Applications</em>, 24-26.</li>
<li> Garcia, O., <a href="../psl/goldschen.html">Goldschen, A.J.</a>, &amp; Petajan, E.D. (1992) Feature extraction for optical automatic speech recognition or automatic lipreading. <em>George Washington University: IIST-92-32,</em>, November.</li>
<li> Petajan, E.D. (1984) <em>Automatic lipreading to enhance speech recognition</em>, Ph.D. Dissertation, University of Illinois at Urbana-Champaign.</li>
<li> Petajan, E.D. (1984) Automatic lipreading to enhance speech recognition, <em>Proceedings of the IEEE Communication Society Global Telecommunications Conference, November 26-29</em>, Atlanta, Georgia.</li>
<li> Petajan, E.D. (1985) Automatic lipreading to enhance speech recognition. <em>IEEE Computer society conference on computer vision and pattern recognition. June 19-23</em>. 40-47.</li>
<li> Petajan, E.D., Bischoff, B. &amp; Bodoff, D. (1988), An improved automatic lipreading system to enhance speech recognition, <em>ACM SIGCHI-88</em>, 19-25.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://mambo.ucsc.edu/petajan.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finn</title>
		<link>http://mambo.ucsc.edu/finn.html</link>
		<comments>http://mambo.ucsc.edu/finn.html#comments</comments>
		<pubDate>Wed, 18 May 2011 02:32:54 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Speechreading (Lipreading)]]></category>

		<guid isPermaLink="false">http://mambo.ucsc.edu/?p=284</guid>
		<description><![CDATA[Finn Dr. Kathleen E. Finn is an expert in video media and optically based speech recognition theory. She matriculated with a Ph.D. from Georgetown University in 1986, with a thesis titled An Investigation of Visible Lip Information to be Used [...]]]></description>
			<content:encoded><![CDATA[<h1><img src="../psl/finn.gif" alt="" /> Finn</h1>
<p>Dr. Kathleen E. Finn is an expert in video media and optically based speech recognition theory. She matriculated with a Ph.D. from Georgetown University in 1986, with a thesis titled An Investigation of Visible Lip Information to be Used in Automated Speech Recognition. She is also the primary editor of the highly esteemed book “Video Mediated Communication (Computers, Cognition, and Work)”, CRC Press; 1 edition, (April 1, 1997) ISGN: 0805822887. This is a well recognized textbook utilized in teaching about the potential functions, advantages, and challenges of utilizing video-mediated communication. The publisher’s note, as included below, summarizes the focus of this book:</p>
<p>“Decades after their introduction, video communication systems are beginning to realize their potential in supporting working from home, teaching and learning at a distance, conferencing, and interpersonal communication. In the face of an upsurge in interest, important questions are being asked: What function does video really serve, and what advantages over the telephone does it provide? How and why is video-mediated interaction different from face-to-face interaction? How can we best configure video technology to support different kinds of work at a distance? What is the role of video technology in the future?</p>
<p>People from a variety of disciplines have now produced a substantial body of research addressing these issues from a wide range of analytic perspectives. Their results and conclusions are scattered through journals, conference proceedings, and corporate technical papers. Drawing together the ideas and findings of the major researchers in the field, this volume offers the first comprehensive overview of what is currently known about video-mediated communication.</p>
<p>Written by psychologists, sociologists, anthropologists, engineers, and computer scientists, this book is an essential resource for all those who design and study systems for teaching, learning, and working. It is divided into four sections as follows:</p>
<p>* Foundations surveys the literature, constructs a foundational framework, introduces common vocabulary, and helps explain technical aspects and terms.<br />
* Findings presents empirical work of types ranging from psychological laboratory-based studies to ethnographic field studies.<br />
* Design explores various aspects of the design and evaluation of new kinds of video systems.<br />
* The Future comments on new and innovative applications of video technology and points out the ways in which its use may be tied to broader technological trends.” *</p>
<ul>
<li> Finn, K. E. (1986) <em>An Investigation of Visible Lip Information to be Used in Automated Speech Recognition</em> Ph.D. thesis, Georgetown University.</li>
<li> Finn, K.E. &amp; <a href="../psl/montgomery.html">Montgomery A.A.</a> (1988) Automatic optically based recognition of speech, <em>Pattern Recognition Letters, 8</em>, 3, 159 &#8211; 164.</li>
</ul>
<p> * Contributed by Dr. Matthew J Memmott , including the publisher’s note from “Video Mediated Communication (Computers, Cognition, and Work)”. Dr. Memmott matriculated with his PhD from MIT, and he often writes articles for educational websites such as <a href="http://findonlinecolleges.net">Online colleges</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://mambo.ucsc.edu/finn.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Vroomen</title>
		<link>http://mambo.ucsc.edu/vroomen.html</link>
		<comments>http://mambo.ucsc.edu/vroomen.html#comments</comments>
		<pubDate>Wed, 18 May 2011 02:31:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Speechreading (Lipreading)]]></category>

		<guid isPermaLink="false">http://mambo.ucsc.edu/?p=281</guid>
		<description><![CDATA[Vroomen Vroomen, J.M.H. (1992) Hearing Voices and Seeing Lips Ph.D. Dissertation, Katholieke Universiteit Brabant.]]></description>
			<content:encoded><![CDATA[<h1><img src="../psl/vroomen.jpg" alt="" /> Vroomen</h1>
<p>Vroomen, J.M.H. (1992) <em>Hearing Voices and Seeing Lips</em> Ph.D. Dissertation, Katholieke Universiteit Brabant.</p>
]]></content:encoded>
			<wfw:commentRss>http://mambo.ucsc.edu/vroomen.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Braida</title>
		<link>http://mambo.ucsc.edu/braida.html</link>
		<comments>http://mambo.ucsc.edu/braida.html#comments</comments>
		<pubDate>Wed, 18 May 2011 02:29:17 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Speechreading (Lipreading)]]></category>

		<guid isPermaLink="false">http://mambo.ucsc.edu/?p=278</guid>
		<description><![CDATA[Braida Braida L.D. (1991) Crossmodal integration in the identification of consonant segments. Quarterly Journal of Experimental Psychology. a, Human Experimental Psychology, 43(3), 647-77.Although speechreading can be facilitated by auditory or tactile supplements, the process that integrates cues across modalities is [...]]]></description>
			<content:encoded><![CDATA[<h1><img src="../psl/braida.jpg" alt="" /> <a href="http://www.cs.indiana.edu/finger/gateway?braida@mit.edu"> Braida </a></h1>
<ul>
<li> Braida L.D. (1991) Crossmodal integration in the identification of consonant segments. <em>Quarterly Journal of Experimental Psychology. a, Human Experimental Psychology, 43(3),</em> 647-77.Although speechreading can be facilitated by auditory or tactile supplements, the process that integrates cues across modalities is not well understood. This paper describes two &#8220;optimal processing&#8221; models for the types of integration that can be used in speechreading consonant segments and compares their predictions with those of the Fuzzy Logical Model of Perception (FLMP, <a href="../psl/dwm.html">Massaro</a>, 1987). In &#8220;pre-labelling&#8221; integration, continuous sensory data is combined across modalities before response labels are assigned. In &#8220;post-labelling&#8221; integration, the responses that would be made under unimodal conditions are combined, and a joint response is derived from the pair. To describe pre-labelling integration, confusion matrices are characterized by a multidimensional decision model that allows performance to be described by a subject&#8217;s sensitivity and bias in using continuous-valued cues. The cue space is characterized by the locations of stimulus and response centres. The distance between a pair of stimulus centres determines how well two stimuli can be distinguished in a given experiment. In the multimodal case, the cue space is assumed to be the product space of the cue spaces corresponding to the stimulation modes. Measurements of multimodal accuracy in five modern studies of consonant identification are more consistent with the predictions of the pre-labelling integration model than the FLMP or the post-labelling model.</li>
<li> Durlach N.I., Tan H.Z., Macmillan N.A., Rabinowitz W.M., Braida L.D. (1989) Resolution in one dimension with random variations in background dimensions. <em>Perception and Psychophysics, 46(3),</em> 293-6.</li>
<li> Grant K.W. &amp; Braida L.D. (1991) Evaluating the articulation index for auditory-visual input [published erratum appears in <em>J Acoust Soc Am 1991, 90(4 Pt 1),</em> 2202]. <em>Journal of the Acoustical Society of America, 1991 Jun, 89(6),</em> 2952-60.An investigation of the auditory-visual (AV) articulation index (AI) correction procedure outlined in the ANSI standard [ANSI S3.5-1969 (R1986)] was made by evaluating auditory (A), visual (V), and auditory-visual sentence identification for both wideband speech degraded by additive noise and a variety of bandpass-filtered speech conditions presented in quiet and in noise. When the data for each of the different listening conditions were averaged across talkers and subjects, the procedure outlined in the standard was fairly well supported, although deviations from the predicted AV score were noted for individual subjects as well as individual talkers. For filtered speech signals with AIA less than 0.25, there was a tendency for the standard to underpredict AV scores. Conversely, for signals with AIA greater than 0.25, the standard consistently overpredicted AV scores. Additionally, synergistic effects, where the AIA obtained from the combination of different bandpass-filtered conditions was greater than the sum of the individual AIA&#8217;s, were observed for all nonadjacent filter-band combinations (e.g., the addition of a low-pass band with a 630-Hz cutoff and a high-pass band with a 3150-Hz cutoff). These latter deviations from the standard violate the basic assumption of additivity stated by Articulation Theory, but are consistent with earlier reports by Pollack [I. Pollack, J. Acoust. Soc. Am. 20, 259-266 (1948)], Licklider [J. C. R. Licklider, Psychology: A Study of a Science, Vol. 1, edited by S. Koch (McGraw-Hill, New York, 1959), pp. 41-144], and Kryter [K. D. Kryter, J. Acoust. Soc. Am. 32, 547-556 (1960)].</li>
<li> Grant K.W., Braida L.D., &amp; Renn R.J. (1991) Single band amplitude envelope cues as an aid to speechreading. <em>Quarterly Journal of Experimental Psychology. a, Human Experimental Psychology, 43(3),</em> 621-45.Amplitude envelopes derived from speech have been shown to facilitate speech-reading to varying degrees, depending on how the envelope signals were extracted and presented and on the amount of training given to the subjects. In this study, three parameters related to envelope extraction and presentation were examined using both easy and difficult sentence materials: (1) the bandwidth and centre frequency of the filtered speech signal used to obtain the envelope; (2) the bandwidth of the envelope signal determined by the lowpass filter cutoff frequency used to &#8220;smooth&#8221; the envelope fluctuations; and (3) the carrier signal used to convey the envelope cues. Results for normal hearing subjects following a brief visual and auditory-visual familiarization/training period showed that (1) the envelope derived from wideband speech does not provide the greatest benefit to speechreading when compared to envelopes derived from selected octave bands of speech; (2) as the bandwidth centred around the carrier frequency increased from 12.5 to 1600 Hz, auditory-visual (AV) performance obtained with difficult sentence materials improved, especially for envelopes derived from high-frequency speech energy; (3) envelope bandwidths below 25 Hz resulted in AV scores that were sometimes equal to or worse than speechreading alone; (4) for each filtering condition tested, there was at least one bandwidth and carrier condition that produced AV scores that were significantly greater than speechreading alone; (5) low-frequency carriers were better than high-frequency or wideband carriers for envelopes derived from an octave band of speech centred at 500 Hz; and (6) low-frequency carriers were worse than high-frequency or wideband carriers for envelopes derived from an octave band centred at 3150 Hz. These results suggest that amplitude envelope cues can provide a substantial benefit to speechreading for both easy and difficult sentence materials, but that frequency transposition of these signals to regions remote from their &#8220;natural&#8221; spectral locations may result in reduced performance.</li>
<li> Picheny M.A., Durlach N.I., Braida L.D. (1989) Speaking clearly for the hard of hearing. III: An attempt to determine the contribution of speaking rate to differences in intelligibility between clear and conversational speech. <em>Journal of Speech and Hearing Research, 1989 Sep, 32(3),</em> 600-3.Previous studies (Picheny, Durlach, &amp; Braida, 1985, 1986) have demonstrated that substantial intelligibility differences exist for hearing-impaired listeners for speech spoken clearly compared to speech spoken conversationally. This paper presents the results of a probe experiment intended to determine the contribution of speaking rate to the intelligibility differences. Clear sentences were processed to have the durational properties of conversational speech, and conversational sentences were processed to have the durational properties of clear speech. Intelligibility testing with hearing-impaired listeners revealed both sets of materials to be degraded after processing. However, the degradation could not be attributable to processing artifacts because reprocessing the materials to restore their original durations produced intelligibility scores close to those observed for the unprocessed materials. We conclude that the simple processing to alter the relative durations of the speech materials was not adequate to assess the contribution of speaking rate to the intelligibility differences; further studies are proposed to address this question.</li>
<li> Reed C.M., Durlach N.I., Braida L.D., &amp; Schultz M.C. (1989) Analytic study of the Tadoma method: effects of hand position on segmental speech perception. <em>Journal of Speech and Hearing Research, 32(4),</em>921-9.In the Tadoma method of communication, deaf-blind individuals receive speech by placing a hand on the face and neck of the talker and monitoring actions associated with speech production. Previous research has documented the speech perception, speech production, and linguistic abilities of highly experienced users of the Tadoma method. The current study was performed to gain further insight into the cues involved in the perception of speech segments through Tadoma. Small-set segmental identification experiments were conducted in which the subjects&#8217; access to various types of articulatory information was systematically varied by imposing limitations on the contact of the hand with the face. Results obtained on 3 deaf-blind, highly experienced users of Tadoma were examined in terms of percent-correct scores, information transfer, and reception of speech features for each of sixteen experimental conditions. The results were generally consistent with expectations based on the speech cues assumed to be available in the various hand positions.</li>
<li> Reed C.M., Power M.H., Durlach N.I., Braida L.D., Foss K.K., Reid J.A., &amp; Dubois S.R.  (1991) Development and testing of artificial low-frequency speech codes. <em>Journal of Rehabilitation Research and Development, 28(3),</em> 67-82.In a new approach to the frequency-lowering of speech, artificial codes were developed for 24 consonants (C) and 15 vowels (V) for two values of lowpass cutoff frequency F (300 and 500 Hz). Each individual phoneme was coded by a unique, nonvarying acoustic signal confined to frequencies less than or equal to F. Stimuli were created through variations in spectral content, amplitude, and duration of tonal complexes or bandpass noise. For example, plosive and fricative sounds were constructed by specifying the duration and relative amplitude of bandpass noise with various center frequencies and bandwidths, while vowels were generated through variations in the spectral shape and duration of a ten-tone harmonic complex. The ability of normal-hearing listeners to identify coded Cs and Vs in fixed-context syllables was compared to their performance on single-token sets of natural speech utterances lowpass filtered to equivalent values of F. For a set of 24 consonants in C-/a/ context, asymptotic performance on coded sounds averaged 90 percent correct for F = 500 Hz and 65 percent for F = 300 Hz, compared to 75 percent and 40 percent for lowpass filtered speech. For a set of 15 vowels in /b/-V-/t/ context, asymptotic performance on coded sounds averaged 85 percent correct for F = 500 Hz and 65 percent for F = 300 Hz, compared to 85 percent and 50 percent for lowpass filtered speech. Identification of coded signals for F = 500 Hz was also examined in CV syllables where C was selected at random from the set of 24 Cs and V was selected at random from the set of 15 Vs. Asymptotic performance of roughly 67 percent correct and 71 percent correct was obtained for C and V identification, respectively. These scores are somewhat lower than those obtained in the fixed-context experiments. Finally, results were obtained concerning the effect of token variability on the identification of lowpass filtered speech. These results indicate a systematic decrease in percent-correct score as the number of tokens representing each phoneme in the identification tests increased from one to nine.</li>
<li> Reed C.M., Rabinowitz W.M., Durlach N.I., Delhorne L.A., Braida L.D., Pemberton J.C., Mulcahey B.D., &amp; Washington D.L. (1992) Analytic study of the Tadoma method: improving performance through the use of supplementary tactual displays. <em>Journal of Speech and Hearing Research, 35(2)</em>, 450-65.Although results obtained with the Tadoma method of speechreading have set a new standard for tactual speech communication, they are nevertheless inferior to those obtained in the normal auditory domain. Speech reception through Tadoma is comparable to that of normal-hearing subjects listening to speech under adverse conditions corresponding to a speech-to-noise ratio of roughly 0 dB. The goal of the current study was to demonstrate improvements to speech reception through Tadoma through the use of supplementary tactual information, thus leading to a new standard of performance in the tactual domain. Three supplementary tactual displays were investigated: (a) an articulatory-based display of tongue contact with the hard palate; (b) a multichannel display of the short-term speech spectrum; and (c) tactual reception of Cued Speech. The ability of laboratory-trained subjects to discriminate pairs of speech segments that are highly confused through Tadoma was studied for each of these augmental displays. Generally, discrimination tests were conducted for Tadoma alone, the supplementary display alone, and Tadoma combined with the supplementary tactual display. The results indicated that the tongue-palate contact display was an effective supplement to Tadoma for improving discrimination of consonants, but that neither the tongue-palate contact display nor the short-term spectral display was highly effective in improving vowel discriminability. For both vowel and consonant stimulus pairs, discriminability was nearly perfect for the tactual reception of the manual cues associated with Cued Speech. Further experiments on the identification of speech segments were conducted for Tadoma combined with Cued Speech. The observed data for both discrimination and identification experiments are compared with the predictions of models of integration of information from separate sources.</li>
</ul>
<p>Bibliography information provided by Andy who uses charity bingo to  educated students through <a href="http://www.topbingoonline.co.uk">bingo sites</a> about the math of the bingo game matrix  and the psychology of such games.</p>
]]></content:encoded>
			<wfw:commentRss>http://mambo.ucsc.edu/braida.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

