To address this, Beckman Institute researchers have developed a computational solution that lets analysts “see” audio files in a visual form.
Mark Hasegawa-Johnson and Thomas Huang of the Human-Computer Intelligent Interaction research theme led a collaboration that developed new computational methods for creating graphical visualizations of large audio files. The visualizations allow the user to scan an audio recording at 200 times that of real-time, enabling them to discover unexpected, or anomalous, events.
Using an analogy to items sometimes hidden in video games by manufacturers called “Easter eggs”, the researchers employed the term to refer to these unexpected events. Hasegawa-Johnson said the software is designed to free up the analyst by having the computer perform certain tasks, and render the data visually, such as with a spectrogram. The technology is able, for example, to analyze thousands of sound sources in an urban environment.
“The idea is to let the computer do what computers are good at and have the humans do what humans are good at,” Hasegawa-Johnson said. “So humans are good at inference, big picture, and anomaly detection. Computers are really good at processing hundreds of hours of data all at once and then compressing it into some format, into some image.”
In order to turn sound into an image, the researchers developed an efficient algorithm for simultaneously computing Fast Fourier Transforms (FFTs), a common computing method. The method employs efficient, simultaneous multiscale computation of FFTS at multiple “window” sizes. The windows contain frequency information that gives a specific snapshot of the input signal. To test the method, they applied the technology to an audio book.
“If you try to skim an audio book, if you try to speed it up by four times, you really can’t understand what it’s saying most times,” Hasegawa-Johnson said. “But if you take the entire thing and plot it as a spectrogram you can actually plot it as some kind of signal summary of the entire three hours and get some information from one screen of data. From that one screen of data you can figure out what in the three hours you want to zoom into.”
The audio visualization research is part of a project funded by the National Science Foundation and Department of Homeland Security called FODAVA (foundation of data analysis and visual analytics). Hasegawa-Johnson and Huang worked with researcher Camille Goudeseune at Beckman’s Illinois Simulator Laboratory to develop the technology. The Illinois researchers have dubbed it “milliphone” because it represents turning a thousand sources of audio into a single visualization.
The work has been reported in Pattern Recognition Letters. To read a paper click here.