|
|
By Oz Hopkins Koglin
The new teaching assistant in George
Fortiers Classroom is a talking head named Baldi.
This guy teaches for free, never tires or
takes breaks, and never goes home. Students love him; he's patient and
is a good listener.
If Baldi seems a bit too dedicated, its
because hes, well, a tool a marvel of spoken language
technology.
Baldi is a listening and speaking three
dimensional, computerized talking head. When he speaks, his jaw, lips,
tongue, and facial movements are manipulated to mimic human speech.
But dont mistake Baldi for just
another animated computer image. Behind his self-assured exterior is a
tool kit that meshes voice recognition and text-to speech synthesis
programs with his talking head.
In research circles, Baldi is known as a
conversational agent. His teaching assignment at Portland's Tucker-Maxon
Oral School is to help 8 to 12 year old deaf children speed their use
and understanding of the auditory and visual mechanisms associated with
speech. Baldi is the key figure in a study by the Oregon Graduate
Institute of Science and Technologys Center for Spoken Language
and Understanding in Hillsboro. The study is financed by a 1.8 million
dollar grant from the National Science Foundation.
Ronald a Cole, professor and director of the
Center for Spoken Language and Understanding and principal investigator
in the study, said his vision is to provide the average person with
language technologies that allow people to talk to computers.
If our students from grade school to
graduate school are going to be able to play with and understand and
become developers of tomorrow's technology, then we have to put it in
their hands and make it widely available, Cole said.
Tucker-Maxon, founded in 1947 by four
participants who wanted their deaf children to grow up speaking, now has
55 students who wear powerful hearing aids, cochlear implants or both.
Cochlear implants allow deaf students to be aware of sounds by sending
electrical signals to the auditory nerve. Tucker-Maxon offers a standard
elementary school curriculum, and its goal is to help students
transfer to school for hearing children as soon as they are ready.
|
|
We think all children can learn t
talk, said Patrick S. Stone, executive director of Tucker-Maxon.
We dont use sign language.
The school is recruiting four hearing fifth
graders next fall to participate in the classroom will Baldi.
It will give deaf children an
opportunity to be in a classroom with hearing children, and hearing
children will have the experience of being in a small classroom of 10
and they will have access to state-of-the-art technology, Stone
said.
Last fall, researchers trained Tucker-Maxon
teachers to use the CSLU Toolkit that operates Baldi. Intel Corp.
donated five top-of-the-line Pentium II computer platforms to the
project.
To create conversation with Baldi, all the
teacher has to do is use programs from the CSLU Toolkit. Teachers can
type in the words they want Baldi to say and the words they want Baldi
to recognize in response. Baldis speech comes from the
text-to-speech Festival system, developed by a team at the University of
Edinburgh in Scotland. It turns any english or Spanish text into
intelligible speech. A facial animation program developed at the
University of California, Santa Cruz, takes speech segments produced by
the festival system called phonemes, and uses them to move Baldis
lips, tongue and jaw, and synchronizes movement to speech.
Nobody has ever had an animated face
like this, that accurately produces words that can be lip read,
said Fortier
At Tucker-Maxon, Baldi asks questions as
part of a word game, listens to the children's answers and tells them
whether they are correct. When the answer is wrong, Baldi takes them
through a training exercise until they arrive at right answer. The
topics can include whatever the students are studying, such as
geography, science, or history.
In the past, students have used computer
programs that presented information with pictures and words, typed
questions and answers, but they weren't always able to use what they
learned in conversation, Fortier said.
What I see now is that when we sit
down and discuss concepts, the practice Baldi gives children enables
them to understand others when they are talking about these
concepts, Fortier said. The people recognize the words
better and people understand their speech better.
|
|
Baldi started out as a wire frame head model
that Dominic W. Massaro, a professor and chairman of psychology at the
University of California, Santa Cruz, and Michael M. Cohen, research
associate, refines and used to measure how people put together
information from a face, independent of voice.
We can think of Baldi now as a puppet
on about 60 strings and we control those stings over time so that Baldi
says appropriate things, makes the appropriate mouth movements,
Massaro said.
Using texture mapping, Massaro and Cohen can
wrap any still video picture over the framework to produce a more
natural of familiar image. So, in the future, students might see their
own faces on the screen, for instance.
In feedback to the Oregon graduate
institute's researchers, Fortier has passed along his students
desire to have a pause button so they can stop Baldi and return to where
they were without starting him again. And they would like to have real
speech, rather than synthesized speech, which is something the
researchers are working on.
Massaro, a cognitive psychologist who
studies speech perception and comprehension, heads one of the few
laboratories in the world using facial animation in the quest for
understanding. He is the author of a new book Perceiving Talking
Faces: From Speech Perception to a Behavioral Principle.
The value of visible speech and talking
heads extends well beyond therapy for the deaf, Massaro said. Auditory
cues play a large role in comprehension, but we also rely on what we see
when we hear. For example, some people dont like to talk on the
telephone because they dont get the visual cues from people on the
other end. And many elderly people say they hear the
television better with their glasses on.
In laboratory studies, people with normal
hearing are able to comprehend about 25 percent of a message when they
rely solely on lip-reading. Those who receive only audio signals in a
noisy environment, such as a cocktail party, do as poorly. But when the
same research subjects lip-read and receive audio messages, the rate of
comprehension jumps to about 80 percent.
Traditionally, people thought about
spoken language as simply being auditory, and what our research along
with others has revealed is that people are very good at putting
together many sources of information to make sense of a situation,
Massaro said.
|