Michael M. Cohen

Michael M. Cohen is a research associate in the Program in Experimental Psychology at the University of California - Santa Cruz. His research interests include speech perception and production, speechreading, information integration, learning, and computer facial animation. He received a BS in Computer Science and Psychology (1975) and an MS in Psychology (1979) from UW-Madison, and a PhD in Experimental Psychology (1984) from UC-Santa Cruz.

Consulting referee for Behavioral Research Methods and Instrumentation, Cognitive Science, Free Speech Journal, Journal of the Acoustical Society of America, Journal of Phonetics, Journal of Experimental Psychology, Perception & Psychophysics, SIGGRAPH, Journal of Visualization and Computer Animation, IEEE Transactions on Visualization and Computer Graphics, and NSF. NSF review panelist.



Email: mmcohen at ranx.ucsc.edu
Perceptual Science Laboratory
429G Social Sciences II, UC-Santa Cruz, Santa Cruz CA 95064, USA.
(831) 459-2655, (831) 459-3519 FAX

Demos:

Internationalization of Synthetic Visual Speech

News:

PSL on ABC Primetime

Michael Cohen and Dominic Massaro from the University of California-Santa Cruz and Ron Cole, From the Univerity of Colorado, received the first InSTIL Prize for "Outstanding Lifetime Contribution to the Integration of Speech Technology in Language Learning".


Selected Publications:

Massaro D. W. and Cohen, M. M. (1983) Integration of visual and auditory information in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 9, 753-771.

Massaro D. W. and Cohen, M. M. (1983) Phonological constraints in speech perception. Perception and Psychophysics, 34, 338-348.

Massaro D. W. and Cohen, M. M. (1983) Consonant/vowel ratio: An improbable cue in speech. Perception and Psychophysics, 33, 501-505.

Massaro D. W. and Cohen, M. M. (1983) Categorical or continuous speech perception: A new test. Speech Communication, 2, 15-35.

Tseng C.-Y., Massaro D. W. and Cohen, M. M. (1985) Lexical tone perception: Evaluation and integration of acoustic features in Mandarin Chinese. Journal of Chinese Linguistics, 13, 267-289.

Massaro, D. W. and Cohen, M. M. (1987) Process and connectionist models of pattern recognition. Proceedings of the ninth annual conference of the Cognitive Science Society (pp. 258-264). Hillsdale, N. J.: Erlbaum.

Cohen, M. M. & Massaro, D. W. (1990) Synthesis of visible speech. Behavioral Research Methods and Instrumentation, 22, 260-263.

Massaro, D. W. and Cohen, M. M. (1990) Perception of synthesized audible and visible speech. Psychological Science, 1, 55-63.

Massaro, D. W., & Cohen, M. M. (1991) Integration versus interactive activation: The joint influence of stimulus and context in perception. Cognitive Psychology, 23.

Gesi, A. T., Massaro, D. W. and Cohen, M. M. (1992) Discovery and expository methods in teaching visual consonant and word identification. Journal of Speech and Hearing Research, 35, 1180-1188.

Cohen, M. M. and Massaro, D. W. (1992) On the similarity of categorization models. In F. G. Ashby (Ed.), Probabilistic Multidimensional Models of Perception and Cognition. Hillsdale, NJ: Lawrence Erlbaum Associates.

Cohen, M. M., & Massaro, D. W. (1993) Modeling coarticulation in synthetic visual speech. In N. M. Thalmann & D. Thalmann (Eds.) Models and Techniques in Computer Animation. Tokyo: Springer-Verlag, 139-156.

Massaro, D. W., & Cohen, M. M. (1993). The paradigm and the fuzzy logical model of perception are alive and well. Journal of Experimental Psychology: General, 122, 155-124.

Massaro, D. W., Tsuzaki, M., Cohen, M. M., Gesi, A., & Heredia, R. (1993). Bimodal speech perception: An examination across languages. Journal of Phonetics, 445-478.

Benoît, C., Massaro, D. W., & Cohen, M. M. (1994) Modality Integration: Facial Movement & Speech Synthesis. in Survey of the State of the Art in Human Language Technology (R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, V. Zue, Eds), NSF/European Commission/CSLU.

Cohen, M. M., & Massaro, D. W. (1994) Development and Experimentation with Synthetic Visible Speech Behavioral Research Methods and Instrumentation, 26, 260-265.

Angola, O., Le Goff, B., Guiard-Marigny, T., Adjoudani, A., Benoît, C., & Cohen, M.M. (1994), Analyse-Synthèse de visages parlants, Actes des 20èmes Journées d'Étude sur la Parole, Tregastel, France, Juin 1994, France, 147-150.

Cohen, M. M. (1994) Validation. In Final Report to NSF of the Standards for Facial Animation Workshop (C. Pelachaud, N. I. Badler, M-L. Viaud, Eds). October, 1994.

Le Goff, B., Guiard-Marigny, T., Cohen, M.M., & Benoît, C., (1994) Real-time analysis-synthesis and intelligibility of talking faces' Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis, New Paltz, New York, U.S.A., Sept. 1994

Cohen, M. M., & Massaro, D. W. (1994) What can visual speech synthesis tell visual speech recognition? 28th Annual Asilomar Conference on Signals, Systems, and Computers, (Oct 31-Nov 2, 1994, Pacific Grove, CA).

Massaro, D. W. & Cohen, M. M. (1994) Auditory/visual speech in multimodal human interfaces. Paper for special session on "Science and technology for multimodal human interfaces" at International Conference of Spoken Language Processing (September 18-22, 1994, Yokohama, Japan).

Massaro, D. W., & Cohen, M. M. (1994). Visual, orthographic, phonological, and lexical influences in reading. Journal of Experimental Psychology: Human Perception and Performance, 20, 1107- 1128.

Benoît, C., Beskow, J., Cohen, M.M., Granstrom, B., Le Goff, B. & Massaro, D.W. (1995) Text-to-audio-visual speech synthesis over the world. Advanced topics for Speech Mapping Speech Maps Workshop, Grenoble.

Cohen, M. M., & Massaro, D. W. (1995) Discover Magazine Awards Finalists: Computer Software

Cohen, M. M., & Massaro, D. W. (1995) Auditory/visual speech synthesis for lifelike computer characters. Lifelike Computer Characters '95. Snowbird Utah, Sept 26-29, 1995.

Cohen, M. M., & Massaro, D. W. (1995) Perceiving visual and auditory information in consonant-vowel and vowel syllables. In C. Sorin, J. Mariani, H. Meloni, & J. Schoentgen (Eds.) Tribute to Max Wajskop: Levels in speech communication: Relations and interactions., Amsterdam: Elseiver.

Cohen, M. M., Walker, R. L., & Massaro, D. W. (1995) Perception of Synthetic Visual Speech, Speechreading by Man and Machine: Models, Systems and Applications, NATO Advanced Study Institute 940584, (Aug 28-Sep 8, 1995, Chateau de Bonas, France).

Friedman, D., Massaro, D. W., Kitzis, S.N. & Cohen, M. M. (1995). A comparison of learning models. Journal of Mathematical Psychology, 164-178.

Massaro, D. W., & Cohen, M. M. (1995). Perceiving talking faces. Current Directions in Psychology, 4, 104-109.

Massaro, D. W., & Cohen, M. M. (1995). Continuous versus discrete information processing in pattern recognition. Acta Psychologica, 90, 193-209.

Massaro, D. W., & Cohen, M. M. (1995). Modeling the perception of bimodal speech. ICPhS '95, Stockholm.

Massaro, D.W., Cohen, M.M., & Smeele, P.M.T. (1995) Cross-linguistic comparisons in the integration of visual and auditory speech. Memory & Cognition, 23, 113-131.

Massaro, D. W., & Cohen, M. M. (1996). Perceiving speech from inverted faces. Perception & Psychophysics.

Massaro, D. W., Cohen, M. M., & Smeele, P.M.T. (1996). Perception of asynchronous and conflicting visible and auditory speech. Journal of the Acoustical Society of America, 100, 1777-1786.

Campbell, R., Whittingham, A., Firth, U., Massaro, D.W. and Cohen, M.M., (1997) Audiovisual speech perception in dyslexics: Impaired unimodal perception but no audiovisual integration deficit. AVSP '97, Rhodes, Greece, 26-27 Sept 1997, 85-88.

Campbell, R., Zihl, J., Massaro, D.W., Munhall, K., and Cohen, M.M., (1997) Speechreading in the akinetopsic patient (LM), Brain, 120, 1793-1803.

Cohen, M.M. (1997) Modeling and Evaluation of Synthetic Visual Speech. Panel on Facial Animation, Past, Present, and Future. SIGGRAPH '97, Los Angeles, August 3-8.

Cohen, M.M. & Massaro, D.W. (1997) Speech for Virtual Humans. Virtual Humans 2 Conference, Hollywood, California, June 17-19.

Smeele, P.M.T., Massaro, D.W., Cohen, M.M., and Sittig, A.C. (1998). Laterality in visual speech perception. Journal of Experimental Psychology: Human Perception and Performance 24, 1232-1242.

Massaro, D.W. and Cohen, M.M. (1998) Visible speech and its potential for speech training for hearing impaired perceivers. STiLL - ESCA Workshop on Speech Technology in Language Learning Stockholm, Sweden, May 25-27, 1998.

Cole, R.A. Carmell, T., Conners, P., Macon, M., Wouters, J., de Villiers, J., Tarachow, A., Massaro, D.W., Cohen, M.M., Beskow, J., Yang, J., Meier, U., Waibel, A., Stone, P., Fortier, G., Davis, A., Soland, C. (1998) Intelligent Animated Agents for Interactive Language Training STiLL - ESCA Workshop on Speech Technology in Language Learning Stockholm, Sweden, May 25-27, 1998.

Cohen, M.M., Beskow, J., and Massaro, D.W. (1998). Recent developments in facial animation: An inside view. AVSP '98, Sydney, Australia, Dec 4-6, 1998

Massaro, D.W., Cohen, M.M., Beskow, J., Daniel, S. and Cole, R.A. (1998). Developing and Evaluating Conversational Agents, In Proceedings of Workshop on Embodied Conversation Characters (WECC), Lake Tahoe, 1998.

Sutton, S., Cole, R.A., de Villiers, J., Schalkwyk, J., Vermeulen, P.M., Yan, Y., Kaiser, E., Rundle, B., Shobaki, K., Hosom, P., Kain, A., Wouters, J., Massaro, D.W. and Cohen, M.M. (1998) Universal Speech Tools: the CSLU Toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 3221-3224, Sydney, Australia, Nov, 1998.

Massaro, D.W., Beskow, J., Cohen, M. M., Fry, C. L., & Rodriguez, T. (1999). Picture my voice: Audio to visual speech synthesis using artificial neural networks. In D. W. Massaro (Ed.) Proceedings of AVSP '99, International Conference on Auditory-Visual Speech Processing., 133-138, Santa Cruz, CA., August, 1999.

Massaro, D.W. and Cohen, M.M. (1999). Speech Perception in Perceivers with hearing loss: Synergy of multiple modalities. Journal of Speech and Hearing Research, 42, 21-41.

Massaro, D.W., Cohen, M.M., Daniel, S. and Cole, R.A. (1999). Developing and evaluating conversational agents In P.A. Hancock (Ed.) Human Performance and Ergonomics. Handbook of Perception and Cognition, 17, San Diego: Academic Press.

Cole, R.A., Massaro, D.W., de Villiers, J., Rundle, B., Shobaki, K., Wouters, J., Cohen, M.M., Beskow, J.E., Stone, P., Connors, P., Tarachow, A., and Solcher, D. (1999) New tools for interactive speech and language training: Using animated conversational agents in the classrooms of profoundly deaf children. In Proceedings of ESCA/SOCRATES Workshop on Method and Tool Innovations for Speech Science Education, London, UK, Apr 1999.

Massaro, D.W., Cohen, M.M., Beskow, J.E. (1999). From Theory to Practice: Rewards and Challenges. In Proceedings of the International Conference of Phonetic Sciences, San Francisco, CA, Aug 1999.

Campbell, C.S., Cohen, M.M., Rodriguez, T. and Massaro, D.W. (1999) Model selection and Occam's razor: A Two-edged sword 32nd Annual meeting of the Society for Mathematical Psychology, Santa Cruz, CA: July 1999.

Massaro, D.W. and Cohen, M. M. (2000). Tests of auditory-visual integration efficiency within the framework of the fuzzy logical model of perception. Journal of the Acoustical Society of America, 108, 784-789.

Massaro, D.W., Cohen, M. M., Beskow, J., & Cole, R. A. (2000). Developing and evaluating conversational agents. In J. Cassell, J. Sullivan, S. Prevost, & E. Churchill (Eds.) Embodied conversational agents. Cambridge, MA: MIT Press.

Massaro, D.W. and Cohen, M.M. (2000). Fuzzy logical model of bimodal emotion perception: Comment on "The perception of emotions by ear and by eye" by de Gelder and Vroomen. Cognition and Emotion, 14 (3), 313-320.

Cohen, M.M., Clark, R., & Massaro, D.W. (2001). Animated Speech: Research progress and applications. AVSP 2001, Scheelsminde, Denmark, Sep 7-9, 2001.

Massaro, D.W., Cohen, M. M., Campbell, C.S., and Rodriguez, T. (2001). Bayesian method of model selection validates FLMP. Psychonomic Bulletin & Review, 8(1), 1-17.

Cosi, P., Cohen, M.M. and Massaro, D.W. (2002) Baldini: Baldi speaks Italian. ICSLP 2002, 7th International Conference on Spoken Language Processing. September 16-20, Denver Colorado.

Cohen, M.M. and Massaro, D.W., and Clark R. (2002) Training a talking head ICMI'02, IEEE Fourth International Conference on Multimodal Interfaces. October 14-16, Pittsburgh Pennsylvannia.

Ouni, S. Massaro, Cohen, M.M., Young, K. & Jesse, A. (2003) Internationalization of a Talking Head. 15th International Congress of Phonetic Sciences, August 3-9, Barcelona.

Ouni, S., Cohen, M. M., & Massaro, D. W. (2005). Training Baldi to be multilingual: A case study for an Arabic Badr. Speech Communication, 45(2), 115-137. PDF

Massaro, D.W., Ouni, S., Cohen, M.M., & Clark, R. (2005). A Multilingual Embodied Conversational Agent. In R.H. Sprague, R.H.(Ed.), Proceedings of 38th Annual Hawaii International Conference on System Sciences (HICCS~R05) (CD-ROM, 10 pages). Los Alimitos, CA: IEEE Computer Society Press.

Ouni, S., Cohen, M.M., Ishak, H., & Massaro, D.W. (2007). Visual Contribution to Speech Perception: Measuring the Intelligibility of Animated Talking Heads. EURASIP Journal on Audio, Speech, and Music Processing, Volume 2007, Article ID 47891.

Massaro, D.W., Cohen, M.M., Tabain, M., Beskow, J., & Clark, R. (in press). Animated speech: Research progress and applications. In E. Vatiokis-Bateson, G. Bailly, & P. Perrier (Eds.), Audiovisual Speech Processing. Cambridge, MA: MIT Press.

Massaro, D.W. & Cohen, M.M. (2007). US Patent 7,225,129: Visual display methods for use in computer-animated speech production models.