Speech processing is one of the most important enabling technologies that underly the development of fluent human-machine interfaces. Carnegie Mellon University (CMU) has been a long-term leader in the development of unlimited-vocabulary speech recognition and in spoken language systems, and the CMU speech group has developed a broad range of speech processing technologies including robust speech recognition, text-to-speech synthesis, speaker identification and verification, and a broad range of applications based on speech processing. While the capabilities of automatic speech recognition systems have improved dramatically over the past decade, commercially-successful speech-based applications that exploit these technologies are still appearing relatively slowly. Some of the factors that have limited the deployment of speech-based applications include lack of acoustical robustness, difficulty in handling out-of-vocabulary words and concepts, and difficulty in automating the process of semantic interpretation for new domains. This talk will provide a review and discussion of the current state-of-the art of speech recognition technology, including examples of recent work by the CMU speech group in a variety of application areas. We describe and demonstrate these applications, and we discuss some of the significant new problems that have been encountered in transferring core speech technology to practical applications.
Biodata: Richard M. Stern received the S.B. degree from the Massachusetts Institute of Technology in 1970, the M.S. from the University of California, Berkeley, in 1972, and the Ph.D. from MIT in 1977, all in electrical engineering. He has been on the faculty of Carnegie Mellon University since 1977, where he is currently a Professor in the Electrical and Computer Engineering, Computer Science, and Biomedical Engineering Departments, the Language Technologies Institute, and the School of Music. Much of Dr. Stern's current research is in spoken language systems, where he is particularly concerned with the development of techniques with which automatic speech recognition can be made more robust with respect to changes in environment and acoustical ambience. He has also developed sentence parsing and speaker adaptation algorithms for earlier CMU speech systems. In addition to his work in speech recognition, Dr. Stern has worked extensively in psychoacoustics, where he is best known for theoretical work in binaural perception. Dr. Stern is a Fellow of the Acoustical Society of America, the 2008-2009 Distinguished Lecturer of the International Speech Communication Association, a recipient of the Allen Newell Award for Research Excellence in 1992, and he served as General Chair of Interspeech 2006. He is also a member of the IEEE and the Audio Engineering Society.