From: malcolm@interval.com (Malcolm Slaney) Date: Fri, 3 Nov 1995 07:26:33 -0800 Subject: Psychophysics of Auditory Scene Analysis Message-Id: <199511031526.HAA08087@interval.interval.com>
The CCRMA Hearing Seminar continues with a series of three talks on attention and the cocktail party effect. One of our field's holy grails is understanding how we perceive a cacaphony of sounds. Our next three seminars talk about the problem at three different levels. Unlike the CASA workshop and the review seminar I gave earlier this quarter, all three of these talks address the performance of human listeners on tasks related to the cocktail party. Next Thursday we start with the psychophysics of sound separation. At the very lowest levels of perception, what are the cues that we use to separate two different sounds? Pierre Divenyi of the VA Medical Center at Martinez will be describing the three most important factors and describing his experiments in the area. Who: Pierre Divenyi (VA Martinez) What: Psychophysics of Auditory Scene Analysis When: Thursday November 9 at 11AM Where: CCRMA Library (Top Floor of the Knoll at Stanford) In the following weeks, Erv Hafter will be talking about the role of attention in auditory listening experiments (11/16) and Diane Schiano will be summarizing what cognitive scientists know about the cocktail party (11/30). Be sure to attend these next three seminars to find out more about how we attention and primitive sound grouping processes allow us to understand our noisy acoustic environment. -- Malcolm Psychophysical bases of auditory scene analysis --Pierre L. Divenyi Speech and Hearing Research V.A. Medical Center Martinez, CA Abstract Human auditory scene analysis (HASA) has been defined (by Albert Bregman) as the perceptual function that segregates components of an ensemble of N multiple simultaneous sounds, such that each of the perceived component sounds will be recognized as the output of the N sources that generated it. The segregation process is accomplished by attributing to a separate stream spectral components having similar temporal envelope characteristics, temporal events having similar spectral characteristics, as well as spectrally and temporally similar components consistent with a specific spatial location. Due to the inherent acoustic complexity of ensembles of simultaneous sounds emitted by different sources, it is quite difficult to come up with any fully satisfactory psychophysical definition and measurement procedure for HASA. Following a reductionist approach, the multiple-sound situation may be reduced to the analysis of sound pairs. When each of the two sounds is given a description in terms of a set of specific values with respect to a number of physical dimensions, then HASA may be defined as the consequence of a listener (1) resolving the difference between the two sounds with respect to the specific values along all dimensions, and (2) choosing one resolved value on each dimension and correctly assigning the set of these values to one of the sound components. As a further simplification of the paradigm, three vectors representing three "cardinal" dimensions of audition have been defined: one representing pitch (both spectral and residue), one temporal structure (envelope characteristics), and one spatial position. Using experienced listeners, experiments have been conducted on the resolution of (quasi-) steady-state sound pairs differing along two or three of the three dimensions. When resolution is expressed as just-noticeable difference, HASA may be evaluated in terms of the relative contribution of the three vectors to perceptual separability of simultaneous sounds. After describing the psychophysical theory, discussing its limitations, and presenting right-off-the-press data, HASA will be briefly compared to C(omputational) ASA.