From: Malcolm Slaney <>
Date: Mon, 25 May 1998 12:17:33 -0700
Subject: Production models and Audio Perception
Message-Id: <v04003a00b18f6db4ffb4@[]>

Two great talks this week for those of you interested in audio perception.
I've already announced Kristin Precoda's talk about how we perceive audio
coders at CCRMA on Thursday.  I'll come back to that in a minute.

On Wednesday as part of Interval's Signal Computation Lecture series, Sam
Roweis from Hopfield's group at Princeton and Caltech will be addressing
the issue of whether we can hear the shape of the mouth. There is a
long-running debate in the world of speech about whether we organize our
perception by somehow mimicing the muscle movements that produce a speech
sound. This is the production model of speech perception and is championed
by the Haskins Laboratory and others.  The alternative approach, which has
been well described at many Hearing Seminars, is based on physical
properties of the auditory system; critical bands and all that wonderful
theory.  This will be a chance to hear the other side.

	Who:	Sam Roweis (Princeton and Caltech)
	When:	Wednesday, May 27th at 11AM
	What:	Speech Production Models for Speech Perception
	Where:	Interval Research Corporation
		1801 Page Mill Road; Building C
		Palo Alto, CA 94304 ( for a map)

Please RSVP to if you're coming to Interval for this
talk.  The abstract is attached.

On Thursday at the CCRMA Hearing Seminar, Kristin Precoda will be talking
about how people make judgements about audio coders.  These tests are very
important.  Major decisions about the way our audio is processed are based
on listening tests comparing different ways of compressing audio.  The
winning approach is built into audio equipment we'll all be using in the
future.  But how do people rate audio coders?  Are some people sensitive to
some factors and not to others?  How reliable are these judgements?  What
can we tell about a coder from the way that people hear it?  Come to CCRMA
to find out more on Thursday.

	Who:	Kristin Precoda (Stanford)
	When:	Thursday May 21 at 11AM
	What:	Model for Perceptual Evaluation of Audio Codecs
	Where	CCRMA Ballroom ******  note larger room   *****

Both abstracts are included with this message.  Both Kristin and Sam are
very bright researchers with a great message.  I think you'll enjoy them

-- Malcolm
P.S.  The new mailing list software is working great: Already stopped one
SPAM last week.  Hurray!  Send all requests to subscribe and unsubscribe to

********  WEDNESDAY ***********  WEDNESDAY ***********  WEDNESDAY ***

              Can you hear the shape of the mouth?
      Using speech production models in speech processing
		Wednesday May 27th at Interval

                        Sam Roweis
     Hopfield Group, Princeton University and Caltech

Traditional speech recognition systems generally work as follows: extract
spectral features from the incoming speech signal and build classifiers
which work directly on these observation sequences. However, we know a lot
about the production of speech; in particular we know that it is produced
by a physical system with few degrees of freedom whose components move
slowly and smoothly. If we have a simple model of this production process,
we can use the observation sequences to do inference on the underlying
states of such a model.  Inferred state trajectories can then provide
additional information to the classifiers.

This inference-through-a-model step is present in many other time-series
analysis fields; the celebrated Kalman filter does such inference for
linear dynamical systems. I will describe a simple model which tries to
infer the movements of the mouth from an acoustic signal and discuss how
such a model might be used to extend current speech recognition systems.

********  THURSDAY ***********  THURSDAY ***********  THURSDAY ***

                   A Multidimensional Model for
                Perceptual Evaluation of Audio Codecs
		     Thursday May 28th at CCRMA
		Kristin Precoda <precoda@leland.Stanford.EDU>

In subjective perceptual evaluations of audio codec quality, lack of
agreement between listeners or groups of listeners is a common problem
arising even in the most carefully conducted tests, under tightly
controlled conditions. Examples can be found in listening tests on the
MPEG-2 Non-Backwards Compatible algorithm (1996), tests of MPEG-2
algorithms (1994), and tests performed for the FCC Advisory Committee on
Advanced Television Service (1993). In particular, in MPEG testing,
listener groups at different test sites have repeatedly been found to
generate sufficiently statistically different ratings that combining their
results would obscure the effects of the codecs and musical excerpts being
judged. Because listening conditions at the test sites were fairly
strictly controlled, rating differences were likely caused by differences
among the listeners in their sensitivity and amount of attention paid to
various kinds of artifacts. Therefore, to examine listener differences, we
designed an evaluation procedure which would yield information both about
codec quality and about the strategies used by each listener in making
judgements. The analysis used a multivariate statistical technique to
build listener-specific models which generate (a generalization of)
perceptual quality ratings monotonically related to the ratings given by a
listener. Reliable differences between listeners are thus captured as
differences in the listener models. A model of the perceptions of an
"average listener" or an "expert's expert" can then be created and used to
evaluate codec outputs. The models can also be interpreted to give
information about the acoustic attributes upon which listeners base their
judgements, hence can guide further development of an algorithm. In
addition, application of this evaluation procedure in an evaluation
experiment produced results concerning within-listener stability,
cross-listener agreement or divergence, and the influence of the musical
excerpt under test; these data will also be discussed.