Braida

Braida

  • Braida L.D. (1991) Crossmodal integration in the identification of consonant segments. Quarterly Journal of Experimental Psychology. a, Human Experimental Psychology, 43(3), 647-77.Although speechreading can be facilitated by auditory or tactile supplements, the process that integrates cues across modalities is not well understood. This paper describes two “optimal processing” models for the types of integration that can be used in speechreading consonant segments and compares their predictions with those of the Fuzzy Logical Model of Perception (FLMP, Massaro, 1987). In “pre-labelling” integration, continuous sensory data is combined across modalities before response labels are assigned. In “post-labelling” integration, the responses that would be made under unimodal conditions are combined, and a joint response is derived from the pair. To describe pre-labelling integration, confusion matrices are characterized by a multidimensional decision model that allows performance to be described by a subject’s sensitivity and bias in using continuous-valued cues. The cue space is characterized by the locations of stimulus and response centres. The distance between a pair of stimulus centres determines how well two stimuli can be distinguished in a given experiment. In the multimodal case, the cue space is assumed to be the product space of the cue spaces corresponding to the stimulation modes. Measurements of multimodal accuracy in five modern studies of consonant identification are more consistent with the predictions of the pre-labelling integration model than the FLMP or the post-labelling model.
  • Durlach N.I., Tan H.Z., Macmillan N.A., Rabinowitz W.M., Braida L.D. (1989) Resolution in one dimension with random variations in background dimensions. Perception and Psychophysics, 46(3), 293-6.
  • Grant K.W. & Braida L.D. (1991) Evaluating the articulation index for auditory-visual input [published erratum appears in J Acoust Soc Am 1991, 90(4 Pt 1), 2202]. Journal of the Acoustical Society of America, 1991 Jun, 89(6), 2952-60.An investigation of the auditory-visual (AV) articulation index (AI) correction procedure outlined in the ANSI standard [ANSI S3.5-1969 (R1986)] was made by evaluating auditory (A), visual (V), and auditory-visual sentence identification for both wideband speech degraded by additive noise and a variety of bandpass-filtered speech conditions presented in quiet and in noise. When the data for each of the different listening conditions were averaged across talkers and subjects, the procedure outlined in the standard was fairly well supported, although deviations from the predicted AV score were noted for individual subjects as well as individual talkers. For filtered speech signals with AIA less than 0.25, there was a tendency for the standard to underpredict AV scores. Conversely, for signals with AIA greater than 0.25, the standard consistently overpredicted AV scores. Additionally, synergistic effects, where the AIA obtained from the combination of different bandpass-filtered conditions was greater than the sum of the individual AIA’s, were observed for all nonadjacent filter-band combinations (e.g., the addition of a low-pass band with a 630-Hz cutoff and a high-pass band with a 3150-Hz cutoff). These latter deviations from the standard violate the basic assumption of additivity stated by Articulation Theory, but are consistent with earlier reports by Pollack [I. Pollack, J. Acoust. Soc. Am. 20, 259-266 (1948)], Licklider [J. C. R. Licklider, Psychology: A Study of a Science, Vol. 1, edited by S. Koch (McGraw-Hill, New York, 1959), pp. 41-144], and Kryter [K. D. Kryter, J. Acoust. Soc. Am. 32, 547-556 (1960)].
  • Grant K.W., Braida L.D., & Renn R.J. (1991) Single band amplitude envelope cues as an aid to speechreading. Quarterly Journal of Experimental Psychology. a, Human Experimental Psychology, 43(3), 621-45.Amplitude envelopes derived from speech have been shown to facilitate speech-reading to varying degrees, depending on how the envelope signals were extracted and presented and on the amount of training given to the subjects. In this study, three parameters related to envelope extraction and presentation were examined using both easy and difficult sentence materials: (1) the bandwidth and centre frequency of the filtered speech signal used to obtain the envelope; (2) the bandwidth of the envelope signal determined by the lowpass filter cutoff frequency used to “smooth” the envelope fluctuations; and (3) the carrier signal used to convey the envelope cues. Results for normal hearing subjects following a brief visual and auditory-visual familiarization/training period showed that (1) the envelope derived from wideband speech does not provide the greatest benefit to speechreading when compared to envelopes derived from selected octave bands of speech; (2) as the bandwidth centred around the carrier frequency increased from 12.5 to 1600 Hz, auditory-visual (AV) performance obtained with difficult sentence materials improved, especially for envelopes derived from high-frequency speech energy; (3) envelope bandwidths below 25 Hz resulted in AV scores that were sometimes equal to or worse than speechreading alone; (4) for each filtering condition tested, there was at least one bandwidth and carrier condition that produced AV scores that were significantly greater than speechreading alone; (5) low-frequency carriers were better than high-frequency or wideband carriers for envelopes derived from an octave band of speech centred at 500 Hz; and (6) low-frequency carriers were worse than high-frequency or wideband carriers for envelopes derived from an octave band centred at 3150 Hz. These results suggest that amplitude envelope cues can provide a substantial benefit to speechreading for both easy and difficult sentence materials, but that frequency transposition of these signals to regions remote from their “natural” spectral locations may result in reduced performance.
  • Picheny M.A., Durlach N.I., Braida L.D. (1989) Speaking clearly for the hard of hearing. III: An attempt to determine the contribution of speaking rate to differences in intelligibility between clear and conversational speech. Journal of Speech and Hearing Research, 1989 Sep, 32(3), 600-3.Previous studies (Picheny, Durlach, & Braida, 1985, 1986) have demonstrated that substantial intelligibility differences exist for hearing-impaired listeners for speech spoken clearly compared to speech spoken conversationally. This paper presents the results of a probe experiment intended to determine the contribution of speaking rate to the intelligibility differences. Clear sentences were processed to have the durational properties of conversational speech, and conversational sentences were processed to have the durational properties of clear speech. Intelligibility testing with hearing-impaired listeners revealed both sets of materials to be degraded after processing. However, the degradation could not be attributable to processing artifacts because reprocessing the materials to restore their original durations produced intelligibility scores close to those observed for the unprocessed materials. We conclude that the simple processing to alter the relative durations of the speech materials was not adequate to assess the contribution of speaking rate to the intelligibility differences; further studies are proposed to address this question.
  • Reed C.M., Durlach N.I., Braida L.D., & Schultz M.C. (1989) Analytic study of the Tadoma method: effects of hand position on segmental speech perception. Journal of Speech and Hearing Research, 32(4),921-9.In the Tadoma method of communication, deaf-blind individuals receive speech by placing a hand on the face and neck of the talker and monitoring actions associated with speech production. Previous research has documented the speech perception, speech production, and linguistic abilities of highly experienced users of the Tadoma method. The current study was performed to gain further insight into the cues involved in the perception of speech segments through Tadoma. Small-set segmental identification experiments were conducted in which the subjects’ access to various types of articulatory information was systematically varied by imposing limitations on the contact of the hand with the face. Results obtained on 3 deaf-blind, highly experienced users of Tadoma were examined in terms of percent-correct scores, information transfer, and reception of speech features for each of sixteen experimental conditions. The results were generally consistent with expectations based on the speech cues assumed to be available in the various hand positions.
  • Reed C.M., Power M.H., Durlach N.I., Braida L.D., Foss K.K., Reid J.A., & Dubois S.R. (1991) Development and testing of artificial low-frequency speech codes. Journal of Rehabilitation Research and Development, 28(3), 67-82.In a new approach to the frequency-lowering of speech, artificial codes were developed for 24 consonants (C) and 15 vowels (V) for two values of lowpass cutoff frequency F (300 and 500 Hz). Each individual phoneme was coded by a unique, nonvarying acoustic signal confined to frequencies less than or equal to F. Stimuli were created through variations in spectral content, amplitude, and duration of tonal complexes or bandpass noise. For example, plosive and fricative sounds were constructed by specifying the duration and relative amplitude of bandpass noise with various center frequencies and bandwidths, while vowels were generated through variations in the spectral shape and duration of a ten-tone harmonic complex. The ability of normal-hearing listeners to identify coded Cs and Vs in fixed-context syllables was compared to their performance on single-token sets of natural speech utterances lowpass filtered to equivalent values of F. For a set of 24 consonants in C-/a/ context, asymptotic performance on coded sounds averaged 90 percent correct for F = 500 Hz and 65 percent for F = 300 Hz, compared to 75 percent and 40 percent for lowpass filtered speech. For a set of 15 vowels in /b/-V-/t/ context, asymptotic performance on coded sounds averaged 85 percent correct for F = 500 Hz and 65 percent for F = 300 Hz, compared to 85 percent and 50 percent for lowpass filtered speech. Identification of coded signals for F = 500 Hz was also examined in CV syllables where C was selected at random from the set of 24 Cs and V was selected at random from the set of 15 Vs. Asymptotic performance of roughly 67 percent correct and 71 percent correct was obtained for C and V identification, respectively. These scores are somewhat lower than those obtained in the fixed-context experiments. Finally, results were obtained concerning the effect of token variability on the identification of lowpass filtered speech. These results indicate a systematic decrease in percent-correct score as the number of tokens representing each phoneme in the identification tests increased from one to nine.
  • Reed C.M., Rabinowitz W.M., Durlach N.I., Delhorne L.A., Braida L.D., Pemberton J.C., Mulcahey B.D., & Washington D.L. (1992) Analytic study of the Tadoma method: improving performance through the use of supplementary tactual displays. Journal of Speech and Hearing Research, 35(2), 450-65.Although results obtained with the Tadoma method of speechreading have set a new standard for tactual speech communication, they are nevertheless inferior to those obtained in the normal auditory domain. Speech reception through Tadoma is comparable to that of normal-hearing subjects listening to speech under adverse conditions corresponding to a speech-to-noise ratio of roughly 0 dB. The goal of the current study was to demonstrate improvements to speech reception through Tadoma through the use of supplementary tactual information, thus leading to a new standard of performance in the tactual domain. Three supplementary tactual displays were investigated: (a) an articulatory-based display of tongue contact with the hard palate; (b) a multichannel display of the short-term speech spectrum; and (c) tactual reception of Cued Speech. The ability of laboratory-trained subjects to discriminate pairs of speech segments that are highly confused through Tadoma was studied for each of these augmental displays. Generally, discrimination tests were conducted for Tadoma alone, the supplementary display alone, and Tadoma combined with the supplementary tactual display. The results indicated that the tongue-palate contact display was an effective supplement to Tadoma for improving discrimination of consonants, but that neither the tongue-palate contact display nor the short-term spectral display was highly effective in improving vowel discriminability. For both vowel and consonant stimulus pairs, discriminability was nearly perfect for the tactual reception of the manual cues associated with Cued Speech. Further experiments on the identification of speech segments were conducted for Tadoma combined with Cued Speech. The observed data for both discrimination and identification experiments are compared with the predictions of models of integration of information from separate sources.

Comments are closed.