|
|
Submerged in an acoustically padded,
subterranean lab - appropriately dubbed the perceptual Science
Laboratory - Dominic W. Massaro intently taps the keys of a Silicon
Graphics system and invokes his computer generated brainchild. Within
seconds, a 3-D Head appears on the screen, it's mouth carefully
enunciating the phrase "Peter Piper picked a peck of pickled
peppers."
The UC Santa Cruz
psychology professor is visibly pleased with the effect, sitting back in
his swivel chair to allow his one person audience a glimpse of
next-generation technology. The on-screen dark haired, bearded image - a
cycling buddy's photograph scanned onto the computer and overlaid onto a
skeletal polygon framework - is nearly lifelike as it speaks.
Though not as charming as
the fictional Max Headroom - a computer-generated wannabe trapped in
1980s special-video-effects technology - Massaro's talking head is far
more intriguing than its predecessor. The professor's compu-guy
technically outstrips it's bald-pated archetype created 20 years ago by
computer graphics pioneer Fred Parke.
Massaro's innovative new
noggin may regularly opt for a facelift, so to speak. Bearing the image
of his best buddy today, the image peering at Massaro from the screen
may be his best gal tomorrow. Photographs and other images may be
wrapped like cellophane around the head's framework.
Cut to Jan. 7, 2101. The
location: Telecom channel 46. "Good audience, why partake in this
antiquated pastime of scientific inquiry?" asks the talking head in
Massaro's synthetic future vision. "Existence opened in mystery and
will close in mystery," the head continues. "Our telecom
channels have been designated to please, not puzzle. For Those into
delectation, the other telecom channels offer instantaneous deserts.
Virtual Reality 3 presents Marilyn Monroe's rendezvous with Madonna III.
If this tryst is too boring, there is the multi-stimulation of Bach's
Brandenburg concerti guaranteed to bombard all sensory stations -
sensory overload at its finest. Julia Child's gastronomic channels is
serving up Stegosaurus, as reconstructed from simulations of the fossil
record."
So begins Massaro's
neurotic chapter in
The Science of the Mind: 2001 and
Beyond,
in which Massaro and Bert L. Solso compile
and co-edit the futuristic imaginings of the country's leading
|
|
psychologists. The two commit to print what
they believe may one day result from ongoing research into visible
speech - or what we
see
when we listen.
The harbinger of such 22nd
century techno-experience may be in the offing. Massaro thinks that in
as few as five years the computer using populace could deliver commands
to computer generated talking heads that respond with appropriate
conversation. Such three-dimensional computerized heads, their mouths
enunciating to perfection, could guide youngsters and the hearing
impaired in precise linguistic pronunciation. Their synthetic potential
is nearly unlimited, Massaro extols, conjuring images of talking heads
in everything from education to showbiz.
Researchers speculate that
such technology could be easily be applied in the entertainment
industry. giving new life to long dead personalities and celebrities.
Maybe Marilyn will tryst with Madonna in Another 100 years. "It's
easy to synthetically create something that never happened," says
Santa Cruz computer graphics specialist Gregory MacNicol, adding that
such technology will no doubt prompt a spate of ethical issues.
Imposing Meaning at the Highest
Level
The new head on the block, however, isn't
without it's quirks. Despite his cellophanelike realism, the talking
head's eyes, lips, teeth and tongue remain largely caricature. Until
recently, the pair limited their research to paralinguistic speech, such
as eye and other facial expressions, in favor of research combining
synthetic speech with corresponding mouth movements. "That's the
weakest part of the synthesis," Massaro readily admits.
Massaro's talking head now
boosts increased facial controls and a tongue, allowing it to better
mimic speech. In the future, language students may better understand
their linguistic endeavor with an inside view of syllable articulation -
the facial "skin" may be removed to reveal the tongue
articulating within the head's framework.
And through Massaro's
text-to-speech system, the talking head may be programmed to early
discourse. Researchers type English text into the computer which, in
turn, produces the text as spoken
|
|
language complete with corresponding facial
movements. The innovative technology earned the recognition of the
National Institute of Deafness and Communicative Disorders, which
granted Massaro's project four years funding in 1990.
A high-tech computer
program, created by Massaro and research associate Michael Cohen,
produces the head's synthetic speech and allows the face to be
manipulated with corresponding movements. The duo uses the sophisticated
program to study how people perceive and recognize speech through sight
and how they combine such perceptions with what they hear.
For example, the talking
head can be programmed to mouth the word "doll" with an
auditory recording of the word "ball" dubbed in. The result?
Most people will hear the word "wall." The same is true for
the nonsensical phrase "My bab pop me poo brive" is dubbed
onto a video of the talking head mouthing, "My gag kok me koo
grive." Viewers will hear "My dad taught me to drive."
"People are always
trying to impose meaning at the highest possible level, even when
they're given conflicting information," Massaro explains. "We
take it for granted, but speech comprehension is an amazing
accomplishment," adds Massaro, who first explored the link between
visible and auditory speech 113 years ago. "No computer has been
programmed to understand speech as well as a 3-year-old child."
Massaro and Cohen's
program may prove a proverbial step in the right direction as they
continue to isolate the visual and auditory cues that listeners receive.
Their research has already
attracted national attention. Just weeks ago, five researchers from New
Jersey's AT&T Bell Laboratories landed in the Golden State to
investigate the technology underway at UCSC. Already, both sides are
discussing a possible collaborative effort, though they readily admit
the applications of such technology remain undetermined. "We are
definitely interested," says Steve Levinson, head of AT&T's
linguistics research development. "But it was just an exploratory
visit."
AT&T's primary
interest in the technology is its potential to create synthetic speech
that sounds more human he says. But, Levinson cautions, "I think
it's important technology of the future but I don't think the future is
quite here yet."
KIM
MALANCZUK
|