Computer speech technology has seen great advances in the last decade. Amazon has a wireless reading device called Kindle that can read books out loud, while almost every business has an automated speech program for incoming calls, and many new cars come with a GPS device that talks to the driver.
Even with the advances, however, one obstacle has proven difficult for researchers in this area to overcome: recreating the human elements of speech. Sometimes, researchers have found, it is the “umms” and the “ahhs”, the pauses, or the voice inflections and tones in our speech that often convey as much meaning as the words themselves. Linguists use the word prosody to describe speech elements like intonation and the vocal rhythms produced when we speak; pauses and uses of umms and ahhs are called disfluencies.
These are the aspects of speech that are missing from the software programs, and it is elements such as prosody and disfluency that are the focus of Beckman Institute faculty member Jennifer Cole’s research. Cole is a Professor in the Department of Linguistics and a member of Beckman’s Cognitive Science Group who studies how the brain acquires and processes the spoken word, and how people communicate using elements of speech like prosody.
“In written language that can be communicated through punctuation, or the use of fonts or capital letters; in speech we don’t have those mechanisms but we have many other things that we can do to modulate our speech,” Cole said. “I’m interested in what speakers do to provide the punctuation and emphasis in speech and, to what extent listeners pay attention to that kind of information that speakers are communicating; what kind of meaning do listeners extract from the prosody of speech.”
One of her current main research projects – in collaboration with Beckman colleague Mark Hasegawa-Johnson – looks at prosody as a way of signaling information about the form and meaning of an utterance. Cole said finding the answers to those questions will not only advance linguistic science, it will help in the development of speech programs for everyday applications like GPS devices, and also for technologies benefitting those with speech disorders.
“There are a many people who are limited in their use of language,” Cole said. “Maybe they’ve suffered a stroke, or they have a learning disability, or they have a physical disability that impedes smooth articulation so they can’t use spoken language as fully as most people.”
Cole said that in order to develop any kind of speech technology that incorporates prosody, researchers first have to understand speech prosody as it occurs in everyday language use, in the absence of disability, and gaining that understanding is at the heart of her collaboration with Hasegawa-Johnson. Since the research is aimed at development of technologies with more natural speech patterns, they explore ordinary conversational speech to see how speakers communicate meaning through prosody in a way that researchers haven’t been done before.
“Our goal is to characterize prosody as it is used in everyday speech communication, and to build statistical and computational models of how prosody is processed in speech communication,” Cole said. “These models can then serve as the basis for computer systems that can speak and be understood the way humans can and also computer systems that can understand speech, and also the meaning that goes beyond the words.”
Cole said that while advances have been made, much of computer-generated speech still has a robotic sound, and that is one reason why understanding prosody is important.
“It’s gotten a lot better than it used to be but it’s still a far cry from human quality speech,” she said. “A lot of what’s missing from computer-synthesized speech is prosody. Also, the phonetic detail, the richness of the speech isn’t there. Even though sometimes this information may not be so critical for the listener, it is part of what makes the speech sound human and it’s missing. What we’re doing is basic science that will support these technologies and then engineers and computer scientists, including my collaborators, are actually building the technology with this knowledge.”
Such interdisciplinary collaborations, especially with an engineer like Hasegawa-Johnson, were rare when Cole earned her Ph.D. in linguistics from MIT in 1987. Technology as an integral part of the discipline is also a fairly recent development in the field.
Cole’s career in linguistics has often mirrored the path of this young academic discipline. She studied at MIT under two founding fathers of the field, Noam Chomsky and Morris Halle, and has been an active participant as linguistics has undergone a revolution in both perspective and technology in the last few years.
“Linguistics is a fairly young academic discipline in the United States – the University of Chicago opened the first Linguistics Department in the 1930’s, and the field underwent a radical change shortly after, highlighted by Noam Chomsky’s influential theory.” Cole said. “My teachers were among the first linguists to develop the new paradigm of linguistic science, and the University of Illinois’ Department of Linguistics, founded in 1965, was home to some of the key figures in the development of linguistics as a cognitive science. And while in the early decades modern linguistics was focused inward, defining the fundamental research questions, in just the last 20 years or so it has really opened up to other disciplines.”
Cole said linguistics is increasingly an interdisciplinary science that includes collaborations with neuroscientists, computer engineers and scientists, psychologists, and speech scientists, among others.
“Linguistics is a highly interdisciplinary field because of the complexity of language,” she said. “It’s not enough to know about the languages of the world. A linguist also needs to draw on knowledge from a lot of different fields to understand how language is processed in the brain and through the mechanisms of the vocal tract and auditory system, and how it affects and is affected by human interaction at the scale of the individual, community, and nation.”
Cole added that linguistics used to be very insular and said linguists haven’t always been good at talking to people outside of their discipline about their discoveries involving language.
“When I was a graduate student in the 1980’s, it was the rare breed of linguist who was looking beyond the boundaries of the discipline,” Cole said. “Now it’s quite commonplace; the walls have come down and there is more cross-fertilization of ideas.”
That is why she works with engineers like Hasegawa-Johnson and psychologists like Beckman colleagues Gary Dell and Cynthia Fisher. One important current research topic for Cole includes a collaboration with Dell and Fisher on a project that studies how people learn the sound patterns of language from their experiences as speakers and listeners. Cole said she approaches this question as a phonologist, which is a linguist who studies the sound patterns of languages, and that the role of her and her students in the project is to test theories about speech behavior derived from phonological theory.
“For a long time linguists have thought that you acquired the basic grammatical knowledge of your language when you were a child and then by the time you hit puberty that knowledge is pretty much complete,” Cole said. “You may add new words but the structures are finished and then you just use them for the rest of your life.
“In recent years there has been increasing evidence that in fact we continue to update our grammatical knowledge. So our competence as speakers of a language, the knowledge that we have that allows us to communicate with language, is continuously changing, and reflecting our experience as we use the language – as we speak and as we listen.”
Advances in technology have played a big role in research methods for linguists like Cole, making it much easier to process statistical analyses of large databases. She and Hasegawa-Johnson are working to automate even more linguistic inquiries through software development. But even with the advances in computing, trying to study and classify complex datasets like those found with recorded conversational speech is a daunting challenge.
“Speech ‘in the wild’ turns out to be a hard thing to measure,” Cole said. “For instance, we want to know if speakers change the way they pronounce words to make their speech clearer for the listener, to help listeners latch on to those words that are really contributing important information.”
Cole said there is mixed evidence for this in ordinary, conversational speech.
“To a small measure, speakers adapt their pronunciation in ways that should help cue the structure and meaning of the sentence for the listener – important words get lengthened and are hyper-articulated, but the effect is small and variable across speakers,” she said. “We’ve also done tests with listeners, to find out how listeners perceive the structure and information content of an utterance, and we find variability there too. To some degree these judgments are subjective, and listeners vary in their sensitivity to the cues that speakers are sending.
“Once you realize the complexity of the message and the signaling system, it’s pretty amazing that successful communication takes place at all!”
Cole said that variability in pronunciation is vast, but at the same time, “listeners are exquisitely sensitive to this variability, and fine-tune or ‘normalize’ their perception of the speech signal to accommodate the voice inflections of an individual speaker, or the communication context.”
Cole says the concept of variation “is the big umbrella over all of my work.” As an example, Cole offered, if she were to follow someone around with a microphone and a tape recorder and recorded everything they said for months, no two pronunciations of any word would be exactly the same, even if it had been said 500 times.
“There would be no two instances from one speaker’s voice that would be identical at a level of detail that we can easily measure,” she said. “Everybody who studies speech knows that. This is what makes computer speech recognition so insanely difficult.
“We’re interested, first of all, in this massive variation, trying to map it out, and then to understand how is it that humans can process this information and seemingly ignore the variation that is not meaningful and latch onto those patterns that are meaningful.”
Cole said the driving force behind her work is a love for languages.
“Often times you will find linguists are people who study a lot of languages, so they have a passion for languages,” she said. “Once you are on your fifth or sixth foreign language, you begin to notice things, ways in which languages are similar to one another, even if they are unrelated. Through linguistic study you can develop that understanding more precisely to see the underlying structures of language, the structures that the brain operates with, regardless of whether you are speaking Swahili or Thai or English.”
Cole’s interest in languages academically is complemented by a project she undertook in her off hours. For years she has been studying a language called Sindhi that has 40 million speakers, the majority of them in Pakistan. She is the only linguist outside of South Asia who studies Sindhi, making her a go-to expert for those seeking information on the language.
“I get e-mail from government people who need translation and all sorts of different people when they want to know about that language,” Cole said.
Cole has been working with a computer scientist in Pakistan to make digital tools that will enable computer processing of Sindhi. They have created an electronic dictionary and made it freely available online for a worldwide audience, an important feat since dictionaries of Sindhi aren’t readily found outside of Pakistan. Cole said she has also developed web-based materials for learning Sindhi. Many people, including those who were separated from their homeland and family members when Pakistan was created through partition from India in 1947, have accessed these sites.
“Now, two generations after partition, there are a lot of young people who feel a link to a place that they have never seen, a place that their parents or grandparents talk about with great passion but they don’t know the language, have never been to Sindh and can’t find any information about it,” Cole said. “I’ve had a lot of people say that gaining information about the language of their grandparents was like unlocking a huge mystery for them.
“This is a linguist’s version of public service. I get e-mail, actually fan mail, from people all over the world who have Sindhi heritage, and more serious inquiries from South Asia scholars. Beyond the satisfaction of doing basic linguistic work for the public good, I recommend this kind of work as an excellent and pleasurable way to gain international public recognition.”