From Wednesday May 6, 1998

Did you see what I said?
People hear best when they both listen and look. Max Glaskin reports on a study of visual speech.

When George Bush uttered his most memorable instruction, "Read my lips", he instinctively knew that people hear best when they both listen and look. But which of the two senses dominate our understanding of speech?
Psychologists are using sophisticated computer graphics and dubbing techniques to find the answer. Their work could result in animated "talking books" for the hard of hearing and a new form of electronic video mail.
Little is known about how people comprehend speech. Auditory cues clearly play a role, with variables such as pitch, duration, and loudness contributing to the overall understanding of verbal messages.
But a team of researchers at the University of California Santa Cruz, is exploring another aspect - visual speech. Led by a psychology professor, Dominic Massaro, the team has investigated the critical role played by what we see when we listen, as well as what we hear.
The ability of the hearing impaired to augment their hearing with lip reading is an everyday indication of visible speech, which is important to those who have suffered hearing loss.
"You've probably heard elderly friends say that they hear the television with their glasses on." Professor Massaro says. "That's because visual cues are important to our ability to understand speech. We're constantly processing signals we pick up visually, as well as those we pick up aurally."
The research reveals that listeners who rely solely on lip reading have a comprehension rate of 25 percent;
those who receive only audio signals in a noisy environment such as a cocktail party have a similar rate of comprehension. However, when the same listeners lip read and receive audio messages, the rate of comprehension jumps to about 80 percent.
""We take it for granted, but speech comprehension is an amazing accomplishment," says Professor Massaro, who began exploring the link between auditory and visible speech 13 years ago.
No computer has been programmed to understand speech as well as a three-year old child." His team has developed software to study how people perceive and recognize speech by eye and how they combine these perceptions with what they hear. Along with research associate Michael Cohen he has created a computerized "talking head" that reproduces synthetic speech, enabling researchers to isolate visual and auditory cues received by listeners.
The three dimensional image resembles a mannequin, with moving eyes, brows, and a mouth. In full color, the face is shaded to look more realistic and the features move in real time. The underlying grid allows researchers to control about 60 parameters to animate the face and create the movements of speech.
Using the computer to reproduce auditory synthetic speech gives the researchers control that is not always possible with natural speech. The animated face allows them to move precise elements of the face - including the jaw, lips, and tongue - that make up the visible components of speech.
Synthetic speech also allows researchers to produce novel sounds or ambiguous syllables
- precisely halfway between "ba" and "da", for example - which can aid them in their investigations.
For example, researchers can program the talking head to say "doll" and dub it with an auditory recording of the word "ball". The result? Most people watching the talking head on a television monitor will hear "wall". Similarly, if a researcher makes an auditory recording of the nonsense sentence "My bab pop me poo brive". and dubs it on to a video of the talking head saying "My gag kok me koo grive", viewers will hear "My dad taught me to drive".
The system allows researchers to type in English text, which the computer produces as spoken language complete with corresponding facial movements. Professor Massaro envisions his "talking head" being used to provide the hearing impaired with the same visual cues that are produced during natural speech.
He sees the potential for translating printed books automatically into visual speech for the hearing impaired, just as the books are translated by auditory speech synthesizers for the visually impaired.
Ultimately, the technology may also be used in the learning of second languages, in speech therapy for brain-injury patients, and in the next generation of computer communication as a "face-to-face" form of electronic mail, for example.
"We can make the head transparent so students and patients can see through the cheeks to see the precise position of the tongue during the formation of sounds they don' t have in their native language or that they've lost through injury," says Professor Massaro. " I could see computerized talking heads capable of expressing emotion becoming commonplace at home and work."