|
|
A research tool was all that psychologist
Dominic Massaro and his team at the University of California - Santa
Cruz had wanted. They needed something to help them do basic perceptual
laboratory work on how people perceive and recognize speech by eye and
how, as listeners, they combine their visual perceptions of speakers
with what they hear.
But now the research tools the developed,
computerized talking heads, are achieving national
attention. These talking heads appear to have many more potential
applications in multimedia communications. And in the nearer term, maybe
as near as five years, they could help deliver important help to persons
who have hearing impairments and those learning a second language.
A License to Call
New Jerseys AT&T Bell laboratories
have perked up their sensitive antennas to the work Massaro and his
research associate, Michael Cohen, have been doing. A five-person
AT&T team visited Santa Cruz lab late last year. One outcome has
been a licensing agreement that provides Massaros group with
AT&T software for speech synthesizers. AT&T is also somewhat
beefing up the programs overall funding which has come almost entirely
from the National Institute of Deafness and Communicative Disorders
since 1980. And Now NIDCD has extended its support to cover the project
for four more years.
Massaro says all this goes well beyond anything
he and Cohen had in mind in the mid-1980s when they first conceived of
their talking heads. At that time, Massaro and Cohen were
using video clips of natural faces in their efforts to isolate
visual speech - the visual perception of speech - from
auditory speech.
Natural Isnt Everything
But normal faces couldnt give all the
information that perceivers need, Massaro says. Moreover, he and his
team wanted stimuli that could be controlled more rigorously than
natural faces. This would allow the researchers greater ability to
precisely manipulate facial and lip movement as well as movement of the
tongue and jaw. (The current underlying computer controllable grid
allows control of as many as 60 parameters.) They could even also
present complicated sounds -- even sounds that contradict facial stimuli
- while measuring subjects perceptions. Their goal was to develop a tool
that would do as much for research on visible speech as synthetic speech
was already doing for investigators into auditory speech perception.
That's how the talking heads
developed, Massaro says, as a tool to do perceptual work.
But then, thanks to the serendipity in science, we soon saw there was a
lot of value in talking heads, not just as an experimental tool but as a
device that could help the hearing impaired and people learning a second
language, and also in multimedia, human-machine interaction and many
other applications in education and entertainment. The
possibilities are practically endless, he says.
|
|
Everyday Importance
One everyday indication of the value of speech
is the fact that the hearing impaired can significantly augment their
speech comprehension through lip reading, something that is also
important to people with normal hearing. I'm sure you've probably
heard elderly friends or relatives say that they hear the television
better with their glasses on, Massaro says.
But in the 1980s Massaro looked hard and long to
find research funding programs unwilling to support his development of
an animated head that talks.
We applied for money from the National
Science Foundation and several other funding agencies in 1985 and in
1986, Massaro said. Receiving good reviews, they were unsuccessful
there and elsewhere, however.
They finally received support in 1990 from NIDCD
and four the next four years from the same institute, so we will
be able to continue this work, refine , and go in some slightly broader
directions, Massaro said.
Hearing the PIcture
With NIDCDs past support, today's
state-of-the-art talking head is a computerized image that resembles a
highly expressive mannequin. An underlying grid allows researchers to
control about 60 parameters to animate the face and create other
movements in speech. Researchers can manipulate the jaw, mouth, lips,
and tongue to mimic the visible component of speech. (Massaro emphasizes
that the basic design came from the 1970s doctoral dissertation of Fred
Parke, a computer scientist; but Massaro didn't have the computers
required to start the project until the mid-1980s.)
To start a session, researchers can type in
English text of almost any length into the computer. It then produces
the text as spoken language, complete with corresponding facial
movements, pausing for a second or two between sentences.
But investigators can also program novel or
ambiguous sounds, halfway between ba and da, for
example. They can also program the talking head to say doll
visually, for example, while the word ball is sounded
audibly. The result in this case is that most people watching the
talking head hear wall. Similarly, if a researcher makes an
audible recording of the nonsense sentence, My bab pop me poo
brive and dubs it into a video of the head saying My gag
kock me koo grive, most viewers will report having heard My
dad taught me to drive.
Massaro sees this as evidence that
people are always trying to impose meaning [on stimuli] at the
highest level, even when they're given conflicting information,
Massaro explains. Although you might expect people to ignore
either the sound or the visible speech, in fact they use all the
evidence and come up with the best solution. When there is inconsistent
or ambiguous information, people will try to put all the pieces together
in the way that makes the most sense.
|
|
Research seems to support this contention, with
some studies showing that listeners who rely soully on lip reading have
a comprehension rate of about 25 percent. THose who receive only audio
signals in an environment like a noisy cocktail party have a similar
rate of comprehension. However, when the same listeners both lip read
and receive audio messages, the rate of comprehension jumps to about 80
percent.
Prosopagnosia
Massaros talking heads have now started to
appear in psychology laboratories in a few other parts of the world, for
example in London with Ruth Campbell and at the University of Western
Ontario with Mel Goodale. Both are using the tapes with prosopagnosic
subjects, persons who have difficulty recognizing faces, even those of
close relatives.
As to their interest in talking heads, AT&T
laboratory heads haven't been talking much. One of its representatives
who visited Massaros laboratory told the press, I think this
is an important technology for the future, but I don't think the future
is quite here yet.
Massaro himself is more open and sanguine. He
sees talking heads being useful in computing so if, for example
you are using Microsoft Windows, a talking head could give you
instructions. When you click on a menu, a talking head could read you
the menu, or it could serve as an alerting device.
In the Future
With four more years of funding now assured by
NIDCD, Massaro says a priority is further work on a talking head that
will give the hearing-impaired more information than they can get from
normal heads in visible speech.
Another goal is to bring the affect and
emotional expression to the talking heads, manipulating the eyes,
eyebrows, and corners of the mouth - in part to determine if people can
discriminate between emotions on the basis of cues in the face. Graduate
student John Ellison is involved in most work on emotion.
Basically, Massaro has been working in speech
perception for 20 years, striving to uncover fundamental rules about the
way the mind works with language. His general approach is to identify
how people perceive and recognize patterns. One of the themes of this
approach is how people use many different sources of information to
perceive and recognize patterns. The sources may be ambiguous, but a
perceiver pieces them together to interpret what the situation actually
is , Massaro notes.
This general theoretical framework for
describing the process of perception and pattern recognition also works
in other language domains as well, Massaro says - in reading and
sentence interpretation, for example. But pattern recognition also
functions in situations like natural object recognition, cues to depth
perception, and memory. The memory research is done by putting several
cues together - like doing a crossword puzzle in which you work with a
definition plus some letters from other words already written in.
|