From September 29-October 5, 1994

Screen Presence
UCSC researchers give a facelift to pioneering talking head with a future in showbiz and higher education.
Submerged in an acoustically padded, subterranean lab - appropriately dubbed the perceptual Science Laboratory - Dominic W. Massaro intently taps the keys of a Silicon Graphics system and invokes his computer generated brainchild. Within seconds, a 3-D Head appears on the screen, it's mouth carefully enunciating the phrase "Peter Piper picked a peck of pickled peppers."
   The UC Santa Cruz psychology professor is visibly pleased with the effect, sitting back in his swivel chair to allow his one person audience a glimpse of next-generation technology. The on-screen dark haired, bearded image - a cycling buddy's photograph scanned onto the computer and overlaid onto a skeletal polygon framework - is nearly lifelike as it speaks.
   Though not as charming as the fictional Max Headroom - a computer-generated wannabe trapped in 1980s special-video-effects technology - Massaro's talking head is far more intriguing than its predecessor. The professor's compu-guy technically outstrips it's bald-pated archetype created 20 years ago by computer graphics pioneer Fred Parke.
   Massaro's innovative new noggin may regularly opt for a facelift, so to speak. Bearing the image of his best buddy today, the image peering at Massaro from the screen may be his best gal tomorrow. Photographs and other images may be wrapped like cellophane around the head's framework.
   Cut to Jan. 7, 2101. The location: Telecom channel 46. "Good audience, why partake in this antiquated pastime of scientific inquiry?" asks the talking head in Massaro's synthetic future vision. "Existence opened in mystery and will close in mystery," the head continues. "Our telecom channels have been designated to please, not puzzle. For Those into delectation, the other telecom channels offer instantaneous deserts. Virtual Reality 3 presents Marilyn Monroe's rendezvous with Madonna III. If this tryst is too boring, there is the multi-stimulation of Bach's Brandenburg concerti guaranteed to bombard all sensory stations - sensory overload at its finest. Julia Child's gastronomic channels is serving up Stegosaurus, as reconstructed from simulations of the fossil record."
   So begins Massaro's neurotic chapter in The Science of the Mind: 2001 and Beyond, in which Massaro and Bert L. Solso compile and co-edit the futuristic imaginings of the country's leading
psychologists. The two commit to print what they believe may one day result from ongoing research into visible speech - or what we see when we listen.
   The harbinger of such 22nd century techno-experience may be in the offing. Massaro thinks that in as few as five years the computer using populace could deliver commands to computer generated talking heads that respond with appropriate conversation. Such three-dimensional computerized heads, their mouths enunciating to perfection, could guide youngsters and the hearing impaired in precise linguistic pronunciation. Their synthetic potential is nearly unlimited, Massaro extols, conjuring images of talking heads in everything from education to showbiz.
   Researchers speculate that such technology could be easily be applied in the entertainment industry. giving new life to long dead personalities and celebrities. Maybe Marilyn will tryst with Madonna in Another 100 years. "It's easy to synthetically create something that never happened," says Santa Cruz computer graphics specialist Gregory MacNicol, adding that such technology will no doubt prompt a spate of ethical issues.

Imposing Meaning at the Highest Level

The new head on the block, however, isn't without it's quirks. Despite his cellophanelike realism, the talking head's eyes, lips, teeth and tongue remain largely caricature. Until recently, the pair limited their research to paralinguistic speech, such as eye and other facial expressions, in favor of research combining synthetic speech with corresponding mouth movements. "That's the weakest part of the synthesis," Massaro readily admits.
   Massaro's talking head now boosts increased facial controls and a tongue, allowing it to better mimic speech. In the future, language students may better understand their linguistic endeavor with an inside view of syllable articulation - the facial "skin" may be removed to reveal the tongue articulating within the head's framework. 
   And through Massaro's text-to-speech system, the talking head may be programmed to early discourse. Researchers type English text into the computer which, in turn, produces the text as spoken

language complete with corresponding facial movements. The innovative technology earned the recognition of the National Institute of Deafness and Communicative Disorders, which granted Massaro's project four years funding in 1990.  
    A high-tech computer program, created by Massaro and research associate Michael Cohen, produces the head's synthetic speech and allows the face to be manipulated with corresponding movements. The duo uses the sophisticated program to study how people perceive and recognize speech through sight and how they combine such perceptions with what they hear.
   For example, the talking head can be programmed to mouth the word "doll" with an auditory recording of the word "ball" dubbed in. The result? Most people will hear the word "wall." The same is true for the nonsensical phrase "My bab pop me poo brive" is dubbed onto a video of the talking head mouthing, "My gag kok me koo grive." Viewers will hear "My dad taught me to drive."
   "People are always trying to impose meaning at the highest possible level, even when they're given conflicting information," Massaro explains. "We take it for granted, but speech comprehension is an amazing accomplishment," adds Massaro, who first explored the link between visible and auditory speech 113 years ago. "No computer has been programmed to understand speech as well as a 3-year-old child."
   Massaro and Cohen's program may prove a proverbial step in the right direction as they continue to isolate the visual and auditory cues that listeners receive.
   Their research has already attracted national attention. Just weeks ago, five researchers from New Jersey's AT&T Bell Laboratories landed in the Golden State to investigate the technology underway at UCSC. Already, both sides are discussing a possible collaborative effort, though they readily admit the applications of such technology remain undetermined. "We are definitely interested," says Steve Levinson, head of AT&T's linguistics research development. "But it was just an exploratory visit."
   AT&T's primary interest in the technology is its potential to create synthetic speech that sounds more human he says. But, Levinson cautions, "I think it's important technology of the future but I don't think the future is quite here yet."
                       KIM MALANCZUK