Mark A. Hasegawa-Johnson

Description

Address

  • 2011 Beckman Institute
  • 405 North Mathews Avenue
  • Urbana, Illinois 61801

Biography

Mark Hasegawa-Johnson received his Ph.D. from MIT in 1996. He is a professor in the University of Illinois department of Electrical and Computer Engineering and a full-time faculty member in the Artificial Intelligence group at the Beckman Institute. His field of interest is speech production and recognition by humans and computers, including landmark-based speech recognition, integration of prosody in speech recognition and understanding, audiovisual speech recognition, computational auditory scene analysis, and biomedical imaging of the muscular and neurological correlates of speech production and perception.

Honors

Member, Articulograph International Steering Committee; CLSP Workshop leader, "Landmark-Based Speech Recognition" (2004), Invited paper; NAACL workshop on Linguistic and Higher-Level Knowledge Sources in Speech Recognition and Understanding (2004); List of faculty rated as excellent by their students (2003); NSF CAREER award (2002); NIH National Research Service Award (1998).

Research

Human speech perception brings together abilities that have evolved either biologically or culturally over very long time periods in order to simultaneously extract semantic, phonemic, and paralinguistic information from a robust but complicated time-frequency code. Machine learning techniques excel at finding the optimum setting of parameters for a pre-specified speech recognition model structure, but machine learning techniques are not very good at choosing the right model structure. Dr. Hasegawa-Johnson's research seeks to apply higher-level knowledge from linguistics and psychology in order to specify the structure of machine learning models for automatic speech recognition. For example, machine learning models are capable of learning the class-conditional distributions of acoustic parameter vectors, but in speech recognition, it is not always clear how the "class" should be defined. The landmark-based speech recognition theory of Ken Stevens, based on several decades of linguistics research, suggests that phoneme boundaries form more acoustically invariant classes than do phoneme segments. Based on Stevens' theory, one of Dr. Hasegawa-Johnson's current research programs seeks to develop large vocabulary speech recognition algorithms using phoneme boundaries rather than phoneme segments as the fundamental phonological class. Likewise, several centuries of linguistic research clearly demonstrate that prosody (the melody and rhythm of natural language) influences the acoustic implementation of speech, but use of prosody in automatic speech recognition has been difficult because of the vast number of variables that have been proposed to bear salient information. By collaborating with University of Illinois linguists (Jennifer Cole and Chilin Shih), Dr. Hasegawa-Johnson has been able to select two binary prosodic distinctions considered by linguists to have the most dramatic acoustic and syntactic impact, and to show that explicit encoding of these prosodic distinctions into an automatic speech recognizer leads to reduced word error rate.

Publications

  • 2016
    • Chen, W.; Hasegawa-Johnson, M.; Chen, N. F., Mismatched Crowdsourcing Based Language Perception for under-Resourced Languages. Procedia Computer Science 2016, 81, 23-29, DOI:10.1016/j.procs.2016.04.025.
    • Kong, X.; Jyothi, P.; Hasegawa-Johnson, M., Performance Improvement of Probabilistic Transcriptions with Language-Specific Constraints. Procedia Computer Science 2016, 81, 30-36, DOI:10.1016/j.procs.2016.04.026.
    • Livescu, K.; Rudzicz, F.; Fosler-Lussier, E.; Hasegawa-Johnson, M.; Bilmes, J., Speech Production in Speech Technologies: Introduction to the CSL Special Issue. Computer Speech and Language 2016, 36, 165-172.
  • 2015
    • Hasegawa-Johnson, M.; Cole, J.; Jyothi, P.; Varshney, L. R., Models of Dataset Size, Question Design, and Cross-Language Speech Perception for Speech Crowdsourcing Applications. Laboratory Phonology 2015, 6, (3-4), 381-432.
    • Huang, P. S.; Kim, M.; Hasegawa-Johnson, M.; Smaragdis, P., Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation. IEEE-ACM Transactions on Audio Speech and Language Processing 2015, 23, (12), 2136-2147.
  • 2014
    • Kim, K.; Lin, K. H.; Walther, D. B.; Hasegawa-Johnson, M. A.; Huang, T. S., Automatic Detection of Auditory Salience with Optimized Linear Filters Derived from Human Annotation. Pattern Recognition Letters 2014, 38, 78-85, DOI: 10.1016/j.patrec.2013.11.010.

    • Khasanova, A.; Cole, J.; Hasegawa-Johnson, M., Detecting Articulatory Compensation in Acoustic Data through Linear Regression Modeling, Proceedings of Interspeech 2014, Singapore.

    • Jyothi, P.; Cole, J.; Hasegawa-Johnson, M.; Puri, V., An Investigation of Prosody in Hindi Narrative Speech, Proceedings of Speech Prosody 2014, Volume 7. Dublin, Ireland.

    • Huang P.-S.; Kim, M.; Hasegawa-Johnson, M.; Smaragdis, P., Singing-Voice Separation from Monaural Recordings Using Deep Recurrent Neural Networks, Proceedings of the International Symposium of Music Information Retrieval, 2014, Taipei, Taiwan.

    • Huang, P. S.; Kim, M.; Hasegawa-Johnson, M.; Smaragdis, P., Deep Learning for Monaural Speech Separation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014, Florence, Italy.

    • Chen, A.; Hasegawa-Johnson, M. A., Mixed Stereo Audio Classification Using a Stereo-Input Mixed-to-Panned Level Feature. IEEE-ACM Transactions on Audio Speech and Language  Processing 2014, 22, (12), 2025-2033, DOI:10.1109/Taslp.2014.2359628.

  • 2013
    • Bharadwaj, S.; Hasegawa-Johnson, M.; Ajmera, J.; Deshmukh, O.; Verma, A.; Sparse Hidden Markov Models for Purer Clusters, In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, New York, 2013, 3098-3102.

    • Huang, P. S.; Deng, L.; Hasegawa-Johnson, M.; He, X. D.; Random Features for Kernel Deep Convex Network, In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, New York, 2013, 3143-3147.

    • King, S.; Hasegawa-Johnson, M., Accurate Speech Segmentation by Mimicking Human Auditory Processing, In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, New York, 2013, 8096-8100.

    • Lin, K. H.; Zhuang, X. D.; Goudeseune, C.; King, S.; Hasegawa-Johnson, M.; Huang, T. S., Saliency-Maximized Audio Visualization and Efficient Audio-Visual Browsing for Faster-Than-Real-Time Human Acoustic Event Detection. ACM Transactions on Applied Perception 2013, 10, (4), DOI: 10.1145/2536764.2536773.

    • Mertens, R.; Huang, P.-S.; Gottlieb, L.; Friedland, G.; Divakaran, A.; Hasegawa-Johnson, M., On the Application of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks. International Journal of Multimedia Data Engineering and Management 2013, 3, (3), 1-19.

    • Sharma, H. V.; Hasegawa-Johnson, M., Acoustic Model Adaptation Using in-Domain Background Models for Dysarthric Speech Recognition. Computer Speech and Language 2013, 27, (6), 1147-1162, DOI: 10.1016/j.csl.2012.10.002.

  • 2012
    • Mathur, S.; Poole, M. S.; Feniosky, P. M.; Hasegawa-Johnson, M.; Contractor, N., Detecting Interaction Links in a Collaborating Group Using Manually Annotated Data. Social Networks 2012, DOI: doi:10.1016/j.socnet.2012.04.002.

    • Ozbek, I. Y.; Hasegawa-Johnson, M.; Demirekler, M., On Improving Dynamic State Space Approaches to Articulatory Inversion with Map-Based Parameter Estimation. IEEE Transactions on Audio Speech and Language Processing 2012, 20, (1), 67-81.

    • Tang, H.; Chu, S. M.; Hasegawa-Johnson, M.; Huang, T. S., Partially Supervised Speaker Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 2012, 34, (5), 959-971.

    • Mahrt, T.; Cole, J.; Fleck, M.; Hasegawa-Johnson, M. F0 and the Perception of Prominence, Proceedings of Interspeech 2012, Portland, Oregon, 2012.

    • Mahrt, T.; Cole, J.; Fleck, M.; Hasegawa-Johnson, M. Modeling Speaker Variation in Cues to Prominence Using the Bayesian Information Criterion, Proceedings of Speech Prosody 2012, Shanghai, 2012.

    • Mathur, S.; Poole, M. S.; Pena-Mora, F.; Hasegawa-Johnson, M.; Contractor, N., Detecting Interaction Links in a Collaborating Group Using Manually Annotated Data. Social Networks 2012, 34, (4), 515-526.

    • Nam, H.; Mitra, V.; Tiede, M.; Hasegawa-Johnson, M.; Espy-Wilson, C.; Saltzman, E.; Goldstein, L., A Procedure for Estimating Gestural Scores from Speech Acoustics. Journal of the Acoustical Society of America 2012, 132, (6), 3980-3989.

    • Rong, P. Y.; Loucks, T.; Kim, H.; Hasegawa-Johnson, M., Relationship between Kinematics, F2 Slope and Speech Intelligibility in Dysarthria Due to Cerebral Palsy. Clinical Linguistics & Phonetics 2012, 26, (9), 806-822.

  • 2011
    • Zhuang, X. D.; Zhou, X.; Hasegawa-Johnson, M. A.; Huang, T. S., Efficient Object Localization with Variation-Normalized Gaussianized Vectors, In Intelligent Video Event Analysis and Understanding; Zhang, J., Shao, L., Zhang, L., Jones, G. A., Eds. 2011; Vol. 332, 93-109.

    • Ozbek, I. Y.; Hasegawa-Johnson, M.; Demirekler, M., Estimation of Articulatory Trajectories Based on Gaussian Mixture Model (Gmm) with Audio-Visual Information Fusion and Dynamic Kalman Smoothing. IEEE Transactions on Audio Speech and Language Processing 2011, 19, (5), 1180-1195.

    • Lobdell, B. E.; Allen, J. B.; Hasegawa-Johnson, M. A., Intelligibility predictors and neural representation of speech. Speech Communication 2011, 53, (2), 185-194.

    • Kim, H.; Hasegawa-Johnson, M.; Perlman, A., Vowel Contrast and Speech Intelligibility in Dysarthria. Folia Phoniatrica Et Logopaedica 2011, 63, (4), 187-194.

  • 2010
    • Kim, H.; Martin, K.; Hasegawa-Johnson, M.; Perlman, A., Frequency of Consonant Articulation Errors in Dysarthric Speech. Clinical Linguistics & Phonetics 2010, 24, (10), 759-770.

    • Tang, H.; Hasegawa-Johnson, M.; Huang, T., A Novel Vector Representation of Stochastic Signals Based on Adapted Ergodic HMMs. IEEE Signal Processing Letters 2010, 17, (8), 715-718.

    • Tang, H.; Hasegawa-Johnson, M.; Huang, T. S., Non-frontal View Facial Expression Recognition Based on Ergodic Hidden Markov Model Supervectors, IEEE International Conference on Multimedia & Expo, Singapore, 2010.

    • Zhuang, X. D.; Zhou, X.; Hasegawa-Johnson, M. A.; Huang, T. S., Real-World Acoustic Event Detection. Pattern Recognition Letters 2010, 31, (12), 1543-1551.

    • Zu, Y. H.; Hasegawa-Johnson, M.; Perlman, A.; Yang, Z., A Mathematical Model of Swallowing. Dysphagia 2010, 25, (4), 397-398.

  • 2009
    • Yoon, P.; Huensch, A.; Juul, E.; Perkins, S.; Sproat, R.; Hasegawa-Johnson, M., Construction of a rated speech corpus of L2 learners' speech. CALICO Journal 2009, 26, (3), 662-673.

    • Huang, T. S.; Hasegawa-Johnson, M. A.; Chu, S. M.; Zeng, Z.; Tang, H., Sensitive Talking Heads. IEEE Signal Processing Magazine 2009, 26, (4), 67-72.

  • 2008
    • Tang, H.; Fu, Y.; Tu, J. L.; Hasegawa-Johnson, M.; Huang, T. S., Humanoid Audio-Visual Avatar With Emotive Text-to-Speech Synthesis. IEEE Transactions on Multimedia 2008, 10, (6), 969-981.

    • Yoon, T.; Cole, J.; Hasegawa-Johnson, M. Detecting non-modal phonation in telephone speech, In Proceedings of Speech Prosody 2008, Campinas, Brazil, 2008.

    • Chang, S. E.; Erickson, K. I.; Ambrose, N. G.; Hasegawa-Johnson, M. A.; Ludlow, C. L., Brain anatomy differences in childhood stuttering. Neuroimage 2008, 39, (3), 1333-1344.

    • Kim, L. H.; Hasegawa-Johnson, M.; Lim, J. S.; Sung, K. M., Acoustic model for robustness analysis of optimal multipoint room equalization. Journal of the Acoustical Society of America 2008, 123, (4), 2043-2053.

  • 2007
    • Yoon, T.; Cole, J.; Hasegawa-Johnson, M. On the edge. Acoustic cues to layered prosodic domains, In Proceedings of the International Conference on Phonetic Sciences, Saarbrucken, Germany, 2007.

    • Cole, J.; Kim, H.; Choi, H.; Hasegawa-Johnson, M., Prosodic effects on acoustic cues to stop voicing and place of articulation: Evidence from Radio News speech. Journal of Phonetics 2007, 35, (2), 180-209.

    • Chen, K.; Hasegawa-Johnson, M.; Cole, J., A Factored Language Model for Prosody-Dependent Speech Recognition. In Speech Synthesis and Recognition, Kordic, V., Ed. Advanced Robotic Systems: 2007.

  • 2006
    • Zhang, T.; Hasegawa-Johnson, M.; Levinson, S. E., Cognitive state classification in a spoken tutorial dialogue system. Speech Communication 2006, 48, (6), 616-632.

    • Zhang, T.; Hasegawa-Johnson, M.; Levinson, S. E., Extraction of pragmatic and semantic salience from spontaneous spoken English. Speech Communication 2006, 48, (3-4), 437-462.

Press

The Communications Office maintains the information included in Beckman Institute's online directory listings. In order to update your directory listing, please submit the following information to directoryupdates@beckman.illinois.edu:

  • a short bio including information on your educational background and your field
  • any honors and awards you may have received
  • a description of your research (approximately 200-400 words)
  • a list of recent representative publications
  • a photo of yourself (you can submit one or we can take one for you)