From: malcolm@interval.com (Malcolm Slaney)
Date: Tue, 16 May 1995 05:21:23 +0000
Subject: Binaural Separation at CCRMA
Message-Id: <abdddcb90102100414fe@[192.203.7.230]>


Don't forget.... Unto Laine will be speaking on auditory representations at
a special Hearing Seminar/DSP Seminar meeting Wednesday (tomorrow) at
CCRMA.  His talk will be at 4:15 in the Ballroom.

We're also blessed with a second seminar on Friday at 11AM.  (There will be
no seminar on Thursday this week.)

Nakatani-san will be describing work that he and his colleagues have done
to model sound with agents and segregate them.  The work that will be
described on Friday is based on binaural input.

        Who:    Tomohiro Nakatani (NTT)
        What:   Multi-Agent Based Binaural Sound Stream Segregation
        When:   Friday (!!!) May 19th at 11AM
        Where:  CCRMA Library (Top Floor of the Knoll)

Modeling sound separation remains one of the hardest auditory problems.
Come see the latest work in this area.

-- Malcolm


---------------------
Multi-Agent Based Binaural Sound Stream Segregation

Tomohiro Nakatani,
with Masataka Goto*, Takatoshi Ito**, Hiroshi G. Okuno

(NTT Basic Research Laboratories, *Waseda University,
 **Toyohashi University of Technology)

I will present a multi-agent based sound stream segregation
system for binaural input.  Sound stream segregation is a
technology to extract individual sounds from a sound mixture,
and it is considered as a primary processing step for
computationally understanding sounds (Computational Auditory
Scene Analysis) in the real-world.  I will first propose the
multi-agent based system, called the Residue-Driven
Architecture, as a computational model for a sound stream
segregation.  The features of this system are bottom-up
approach, dynamic and incremental segregation, and open-ended
system.  Then, I will discuss the design and implementation
of two subsystems, called agencies, the Bi-HBSS agency and
Bi-Grouping agency, based on this architecture.  The Bi-HBSS
agency segregates fragments of streams, while the Bi-Grouping
agency sequencially groups the fragments to extract streams.
The two agencies uses harmonic structure and direction information
as segregation clues.  They first extracts harmonic information
in both channels, and extracts the direction of the harmonics.
Once the direction information is extracted, the system
utilizes it as a segregation clue for the successive harmonic
sound stream.  I will also show segregation results of two
voices whose fundamental frequencies cross each other.

In this talk, I will talk about recent works developed after
my colleague, Okuno, talked at the Hearing seminar last
summer.

Bio

Tomohiro Nakatani is a member of NTT Basic Research
Laboratories.  He studied on parameter adaptation methods
for learning control and for composite neural networks, and
received M.S. from Division of Applied System Science,
University of Kyoto, in 1991.  Then, he joined NTT Basic
Research Labs.  He started to research on a multi-agent system
and auditory scene analysis at 1992 with Hiroshi G. Okuno
and Takeshi Kawabata.  At present, he is constructing a
multi-agent based speech stream segregation system.