From: malcolm@interval.com (Malcolm Slaney)
Date: Fri, 3 Nov 1995 07:26:33 -0800
Subject: Psychophysics of Auditory Scene Analysis
Message-Id: <199511031526.HAA08087@interval.interval.com>


The CCRMA Hearing Seminar continues with a series of three talks on attention
and the cocktail party effect.  One of our field's holy grails is understanding
how we perceive a cacaphony of sounds.  Our next three seminars talk about 
the problem at three different levels.  Unlike the CASA workshop and the review
seminar I gave earlier this quarter, all three of these talks address the
performance of human listeners on tasks related to the cocktail party.

Next Thursday we start with the psychophysics of sound separation.  At the 
very lowest levels of perception, what are the cues that we use to separate
two different sounds?   Pierre Divenyi of the VA Medical Center at Martinez
will be describing the three most important factors and describing his 
experiments in the area.

	Who:	Pierre Divenyi (VA Martinez)
	What:	Psychophysics of Auditory Scene Analysis
	When:	Thursday November 9 at 11AM
	Where:	CCRMA Library (Top Floor of the Knoll at Stanford)

In the following weeks, Erv Hafter will be talking about the role of attention
in auditory listening experiments (11/16) and Diane Schiano will be summarizing
what cognitive scientists know about the cocktail party (11/30).

Be sure to attend these next three seminars to find out more about how we
attention and primitive sound grouping processes allow us to understand our
noisy acoustic environment.

-- Malcolm



Psychophysical bases of auditory scene analysis
--Pierre L. Divenyi
  Speech and Hearing Research
  V.A. Medical Center
  Martinez, CA

Abstract

Human auditory scene analysis (HASA) has been defined (by Albert Bregman) as 
the perceptual function that segregates components of an ensemble of N
multiple simultaneous sounds, such that each of the perceived component
sounds will be recognized as the output of the N sources that generated
it.  The segregation process is accomplished by attributing to a
separate stream spectral components having similar temporal envelope
characteristics, temporal events having similar spectral characteristics, as
well as spectrally and temporally similar components consistent with
a specific spatial location.

Due to the inherent acoustic complexity of ensembles of simultaneous
sounds emitted by different sources, it is quite difficult to come up with
any fully satisfactory psychophysical definition and measurement procedure
for HASA.  Following a reductionist approach, the multiple-sound situation
may be reduced to the analysis of sound pairs.  When each of the two
sounds is given a description in terms of a set of specific values
with respect to a number of physical dimensions, then HASA may be defined
as the consequence of a listener (1) resolving the difference between the
two sounds with respect to the specific values along all dimensions, and
(2) choosing one resolved value on each dimension and correctly assigning
the set of these values to one of the sound components.

As a further simplification of the paradigm, three vectors representing three
"cardinal" dimensions of audition have been defined: one representing
pitch (both spectral and residue), one temporal structure (envelope
characteristics), and one spatial position.  Using experienced listeners,
experiments have been conducted on the resolution of (quasi-) steady-state
sound pairs differing along two or three of the three dimensions.  When
resolution is expressed as just-noticeable difference, HASA may be evaluated
in terms of the relative contribution of the three vectors to perceptual
separability of simultaneous sounds.

After describing the psychophysical theory, discussing its limitations, and
presenting right-off-the-press data, HASA will be briefly compared to
C(omputational) ASA.