Goldschen

Goldschen Goldschen, Alan J (1993) Continuous Automatic Speech Recognition by Lipreading, Ph.D. Dissertation, George Washington University, Washington, D.C., September 1993.This study describes the design and implementation of a novel continuous speech recognizer that uses optical information from the oral-cavity shadow [...]

Read More

goldschen

Watanabe and Kohda

Watanabe & Kohda Watanabe, T. & Kohda, M. (1990) Lip-reading of Japanese vowels using neural networks.

Read More

watanabe

Tamura

Tamura Shinichi Tamura is currently working at Morooka Orthopedic Hospital at Seimeikai Medical Corporation. He attended Kyushu Electric Technology vocational college from 1989 – 1991, with an interest in focusing on AV equipment. Shinichi matriculated from the College of Electrical [...]

Read More

tamura

Bregler

Bregler Bregler, C., Hild, H., Manke, S., & Waibel, A. (1993) Improving connected letter recognition by lipreading. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (IEEE-ICASSP), Minneapolis, MN.In this paper we show how recognition performance in automated [...]

Read More

bregler

NTT

NTT Akimoto, T., Suenega, Y., & Wallace, R.S. (1993) Automatic creation of 3D facial models. IEEE Computer Graphics & Applications, 13, 5, 16-22. Mase, K. (1990). An application of optical flow – extraction of facial expression. IAPR Workshop on Machine [...]

Read More

NTT

Stork

Stork Dr. Daniel Stork obtained a B.S. degree in Physics from Massachusetts Institute of Technology (MIT) and a M.S. in Physics from the University of Maryland, College Park. He matriculated from the University of Maryland, College Park in 1984 with [...]

Read More

stork

Petajan

Petajan Brooke, N.M. & Petajan E.D. (1986), Seeing Speech : Investigations into the Synthesis and Recognition of Visible Speech Movements Using Automatic Image Processing and Computer Graphics, Proceedings of the International Conference on Speech Input and Output : Techniques and [...]

Read More

petajan

Finn

Finn Dr. Kathleen E. Finn is an expert in video media and optically based speech recognition theory. She matriculated with a Ph.D. from Georgetown University in 1986, with a thesis titled An Investigation of Visible Lip Information to be Used [...]

Read More

finn

Vroomen

Vroomen Vroomen, J.M.H. (1992) Hearing Voices and Seeing Lips Ph.D. Dissertation, Katholieke Universiteit Brabant.

Read More

vroomen

Braida

Braida Braida L.D. (1991) Crossmodal integration in the identification of consonant segments. Quarterly Journal of Experimental Psychology. a, Human Experimental Psychology, 43(3), 647-77.Although speechreading can be facilitated by auditory or tactile supplements, the process that integrates cues across modalities is [...]

Read More

braida

R Campbell

R Campbell Campbell, R. (1986) The lateralisation of lipread sounds: A first look. Brain and Cognition, 5, 1-21. Campbell, R. (1987) The cerebral lateralization of lip-reading. In B. Dodd and R. Campbell (Eds.) Hearing by eye: The psychology of lip-reading. [...]

Read More

rcampbell

HW Campbell

HW Campbell Campbell, H. W. (1970) Hierarchical ordering of phonetic features as a function of input modality. In G.B. Flores d’Arcais and W. J. M. Levelt (Eds.) Advances in Psycholonguistics, Amsterdam: North-Holland. Campbell, H. W. (1974) Phoneme Recognition by Ear [...]

Read More

hcampbell

LE Bernstein

LE Bernstein Auer, E. T., Jr., & Bernstein, L. E. (1995). Lexical distinctiveness in lipreading: Effects of phoneme equivalence classes on the structure of the lexicon. Submitted for presentation at the Spring Meeting, Acoustical Society of America. Auer, E. T., [...]

Read More

bernstein

Brooke

Brooke Brooke, N. M. (1989) Visible speech signals: Investigating their analysis, synthesis, and perception. In M. M. Taylor, F. Neel, & D. G. Bouwhuis (Eds.), The Structure of Multimodal Dialogue. Holland: Elsevier Science Publishers. Brooke, N. M. (1992) Computer graphics [...]

Read More

brooke

Summerfield

Summerfield Brooke, N. M. & Summerfield, A. Q. (1983) Analysis, synthesis, and perception of visible articulatory movements. Journal of Phonetics, 11, 63-76. MacLeod A. & Summerfield A.Q. (1990) A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in [...]

Read More

summerfield

McGurk and MacDonald

McGurk & MacDonald MacDonald, J. & McGurk, H. (1978) Visual influences on speech perception process. Perception and Psychophysics, 24, 253-257. McGurk, H. (1981). Listening with eye and ear (paper discussion). In T. Myers, J. Laver, & J. Anderson (Eds.) The [...]

Read More

mcgurk

Walden

Walden Walden B.E., Busacco D.A., & Montgomery A.A. (1993) Benefit from visual cues in auditory-visual speech recognition by middle-aged and elderly persons. Journal of Speech and Hearing Research, 36(2), 431-6.The benefit derived from visual cues in auditory-visual speech recognition and [...]

Read More

walden

Montgomery

Montgomery Finn, E.K. & Montgomery A.A. (1988) Automatic optically based recognition of speech, Pattern Recognition Letters, 8, 3, 159 – 164. Montgomery, A. A. (1980) Development of a model for generating synthetic animated lip shapes. Journal of the Acoustical Society [...]

Read More

montgomery

Erber

Erber Erber, N. P. (1969) Interaction of audition and vision in the recognition of oral speech stimuli. Journal of Speech and Hearing Research, 12, 423-425. Erber, N. P. (1972) Auditory, visual and auditory-visual recognition of consonants by children with normal [...]

Read More

erber

John Bulwer

John Bulwer Bulwer, J. (1648) Philocopus, or the Deaf and Dumbe Mans Friend , London: Humphrey and Moseley.”Exhibiting the philosophical verity of that subtle art, which may enable one with an observant eye, to hear what any man speaks by [...]

Read More

bulwers-feat

Welcome to the Perceptual Science Laboratory

You can learn about our research and technology by surfing the links on this page.

A few highlights include the following. Gregg Oden and I collaborated to formulate a fuzzy logical model of perception, which has served as a framework for our research to this day. The success of this approach is perhaps best summed up by the title of a recent article, The Morton-Massaro Law of Information Integration: Implications for Models of Perception. Movellan, J., and McClelland, J. L. (2001). Psychological Review, 108, 113-148.

To create synthetic visible speech, Michael Cohen and I developed visible speech synthesis using computer animation and psychological testing. We have the most accurate synthetic talking head in the world and this technology has been central to a broad range of studies of speech perception and emotion perception, which were published in Perceiving talking faces: From speech perception to a behavioral principle. Cambridge, Massachusetts: MIT Press, 1998.

It soon became apparent that our embodied conversational agent, Baldi, had practical value well beyond the presentation of speech in experimental inquiry. We have proven that our principles of speech perception and the technology and pedagogy of Baldi are effective in the learning of vocabulary and grammar by children with language challenges due to hearing loss or autism (Symbiotic Value of an Embodied Agent in Language Learning. In Sprague, R.H., Jr. (Ed.), IEEE Proceedings of 37th Annual Hawaii International Conference on System Sciences (CD-ROM), Computer Society Press, 10 pages. Best paper in Emerging Technologies).

Another project is aimed at helping persons with hearing loss by adding an additional channel of speech information on eyeglasses. This non-obtrusive device will perform continuous real-time acoustic analysis of his or her interlocutor's speech and transform several continuous acoustic features of the talker's speech into continuous visual cues displayed on the eyeglasses (http://www.speechspecs.org/).

Finally, Baldi is now on the iPhone in a variety of applications (http://itunes.apple.com/us/app/ibaldi/id365360515?mt=8/).