|
|
When George Bush uttered his most memorable
instruction, "Read my lips", he instinctively knew that people
hear best when they both listen and look. But which of the two senses
dominate our understanding of speech?
Psychologists are using sophisticated
computer graphics and dubbing techniques to find the answer. Their work
could result in animated "talking books" for the hard of
hearing and a new form of electronic video mail.
Little is known about how people comprehend
speech. Auditory cues clearly play a role, with variables such as pitch,
duration, and loudness contributing to the overall understanding of
verbal messages.
But a team of researchers at the University
of California Santa Cruz, is exploring another aspect - visual speech.
Led by a psychology professor, Dominic Massaro, the team has
investigated the critical role played by what we see when we listen, as
well as what we hear.
The ability of the hearing impaired to
augment their hearing with lip reading is an everyday indication of
visible speech, which is important to those who have suffered hearing
loss.
"You've probably heard elderly friends
say that they hear the television with their glasses on." Professor
Massaro says. "That's because visual cues are important to our
ability to understand speech. We're constantly processing signals we
pick up visually, as well as those we pick up aurally."
The research reveals that listeners who rely
solely on lip reading have a comprehension rate of 25 percent;
|
|
those who receive only audio signals in a
noisy environment such as a cocktail party have a similar rate of
comprehension. However, when the same listeners lip read and receive
audio messages, the rate of comprehension jumps to about 80 percent.
""We take it for granted, but
speech comprehension is an amazing accomplishment," says Professor
Massaro, who began exploring the link between auditory and visible
speech 13 years ago.
No computer has been programmed to
understand speech as well as a three-year old child." His team has
developed software to study how people perceive and recognize speech by
eye and how they combine these perceptions with what they hear. Along
with research associate Michael Cohen he has created a computerized
"talking head" that reproduces synthetic speech, enabling
researchers to isolate visual and auditory cues received by listeners.
The three dimensional image resembles a
mannequin, with moving eyes, brows, and a mouth. In full color, the face
is shaded to look more realistic and the features move in real time. The
underlying grid allows researchers to control about 60 parameters to
animate the face and create the movements of speech.
Using the computer to reproduce auditory
synthetic speech gives the researchers control that is not always
possible with natural speech. The animated face allows them to move
precise elements of the face - including the jaw, lips, and tongue -
that make up the visible components of speech.
Synthetic speech also allows researchers to
produce novel sounds or ambiguous syllables
|
|
- precisely halfway between "ba"
and "da", for example - which can aid them in their
investigations.
For example, researchers can program the
talking head to say "doll" and dub it with an auditory
recording of the word "ball". The result? Most people watching
the talking head on a television monitor will hear "wall".
Similarly, if a researcher makes an auditory recording of the nonsense
sentence "My bab pop me poo brive". and dubs it on to a video
of the talking head saying "My gag kok me koo grive", viewers
will hear "My dad taught me to drive".
The system allows researchers to type in
English text, which the computer produces as spoken language complete
with corresponding facial movements. Professor Massaro envisions his
"talking head" being used to provide the hearing impaired with
the same visual cues that are produced during natural speech.
He sees the potential for translating
printed books automatically into visual speech for the hearing impaired,
just as the books are translated by auditory speech synthesizers for the
visually impaired.
Ultimately, the technology may also be used
in the learning of second languages, in speech therapy for brain-injury
patients, and in the next generation of computer communication as a
"face-to-face" form of electronic mail, for example.
"We can make the head transparent so
students and patients can see through the cheeks to see the precise
position of the tongue during the formation of sounds they don' t have
in their native language or that they've lost through injury," says
Professor Massaro. " I could see computerized talking heads capable
of expressing emotion becoming commonplace at home and work."
|