Research Overview

Research Overview of The Perceptual Science Lab

Embodied Conversational Agents And Speech Science:

Speech and language science and technology evolved under the assumption that speech was a solely auditory event. However, a burgeoning record of research findings reveals that our perception and understanding are influenced by a speaker’s face and accompanying gestures, as well as the actual sound of the speech. Perceivers expertly use these multiple sources of information to identify and interpret the language input. An overview of multimodal language processing is given in a Handbook chapter, see: PDF.


Given the value of face-to-face interaction, our persistent goal has been to develop, evaluate, and apply animated agents to produce realistic and accurate speech. Baldi is an accurate three-dimensional animated talking head appropriately aligned with either synthesized or natural speech. Baldi has a realistic tongue and palate, which can be displayed by making his skin transparent. A detailed description, evaluation, and potential applications of Baldi can be found in our book: Perceiving Talking Faces

Client/Server Architecture System:

To implement multilingual agents, we have developed a client/server architecture system. The client is the application controlling Baldi. It sends text from a variety of languages including Arabic, Mandarin, and many European languages as well as English to a general speech synthesis server. The server generates the appropriate phonemes in the appropriate language with all the information needed by the client (phonemes, duration, pitches, word boundaries, etc.) and the acoustic speech waveform, and then it sends them back to the client. Using this information, the client generates the appropriate language-specific visible phonemes synchronized with the synthesized speech. A description of this system is given in our HICSS paper(s): PDF, PDF and demonstrations can be found at: Demo

The Fuzzy Logical Model of Perception (FLMP):

According to integration models, multiple sensory influences are combined before the perceptual experience. The Fuzzy Logical Model of Perception (FLMP) is a formalization of an integration model. Consider the case in which the perceiver is watching the face and listening to the speaker. Although both the visible and audible speech signals are processed, each source is evaluated independently of the other source. The evaluation process consists of determining how much that source supports various alternatives. The integration process combines these sources and outputs how much their combination supports the various alternatives. The perceptual outcome for the perceiver will be a function of the relative degree of support among the competing alternatives. Recent tests of the FLMP include cross-linguistic experiments with Mandarin speakers: PDF, hearing-impaired adults and children: Massaro, D.W. & Cohen, M.M. (1999). Speech perception in hearing-impaired perceivers: Synergy of multiple modalities. Journal of Speech, Language, and Hearing Research, 42, 21-41. and children: ,studies of prosody: PDF, and studies of face recognition: PDF and emotion: Massaro, D.W. (2000). Multimodal emotion perception: Analogous to speech processes. In R.Cowie, E. Douglas-Cowie & M. Schroder (Eds.), Proceedings of the ISCA Workshop on Speech and Emotion, Newcastle, Northern Ireland, September 5-7, 114-121.

Computer-Assisted Speech And Language Tutors

Based on this research and technology, we have implemented computer-assisted speech and language tutors for children with language challenges and persons learning a second language. Our language-training program utilizes Baldi as the conversational agent, who guides students through a variety of exercises designed to teach vocabulary and grammar, to improve speech articulation, and to develop linguistic and phonological awareness. This technology and pedagogy has proven successful with hard of hearing children: PDF PDF, autistic children: PDF, PDF, PDF. Some of the advantages of the Baldi pedagogy and technology include the popularity and effectiveness of computers and embodied conversational agents, the perpetual availability of the program, and individualized instructionThe science and technology of Baldi holds great promise in language learning, dialog, human-machine interaction, education, and edutainment.