Beyond Speech: Towards an Interdisciplinary Study of Sound

Strategic Research Initiatives

Paris Smaragdis, Rob A. Rutenbar: Computer Science

Mark Hasegawa-Johnson: Electrical and Computer Engineering

Stephen Downie: Graduate School of Library and Information Science

Heinrich K. Taube: School of Music

Addressing the Problem

The University of Illinois has a long history of studying sound—pioneering sound-on-film, computer music, bioacoustics, hearing research, and speech studies within the The Grainger College of Engineering and across campus. As a multifaceted science that is still not fully understood by any one discipline yet, the academic study and exploitation of sound is extremely broad, as are new applications.

Industrial giants such as Bell, Apple, Sony, and Motorola can each trace their biggest successes to audio products. Similarly, the future holds significant promise in areas as audio recognition for computers and robotics, universal language applications, signal processing, and new music, as well as recordings and audio databases for applications such as sound and music retrieval, biomedical diagnosis, ocean monitoring, geophysical activity, and mechanical operations.

Research Goals

This research initiative provides a common home for sound-related research that focuses on Illinois’ existing expertise and interdisciplinary capabilities in the field.

Current Activities

Since this project’s inception in spring 2013, the research team has attracted $500,000 in funding from the National Science Foundation (NSF) to work on big-data audio problems and ways to consolidate audio streams from potentially thousands of recordings. The team is currently working on novel efficient algorithms to deal with such problems.

The researchers are also developing a computation framework that will allow for deployment of audio analytics systems. The interface and machine learning core have been completed, and the group is working on creating extra features. Development of this software is the basis for a separate NSF proposal.

Additionally, the group has designed a joint course of study between Music and Computer Science which they plan to use as an inspiration for new interdisciplinary problems.

Published Papers as a Result of this Work

Paris Smaragdis and Minje Kim (2013), "Non-Negative Matrix Factorization for Irregularly-Spaced Transforms," in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, Oct. 20-23, 2013.
Minje Kim and Paris Smaragdis (2013), "Manifold Preserving Hierarchical Topic Models for Quantization and Approximation," in Proceedings of the International Conference on Machine Learning (ICML), Atlanta, GA, Jun. 16-21, 2013.
Minje Kim and Paris Smaragdis (2013), “Collaborative Audio Enhancement Using Probabilistic Latent Component Sharing,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, BC, Canada, May 26-31, 2013 Winner of the Google ICASSP Student Travel Grants, Best Student Paper Award finalist.
Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis (2014). "Deep Learning for Monaural Speech Separation", in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence Italy.
Johannes Traa, Minje Kim, Paris Smaragdis (2014). "Phase and level difference fusion for robust multichannel source separation", in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence Italy.
Johannes Traa, Paris Smaragdis (2014). "A Wrapped Kalman Filter for Azimuthal Speaker Tracking", in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence Italy.
Smaragdis, P., C. Fevotte, G. Mysore, N. Mohammadiha, M. Hoffman (2014). "A Unified View of Static and Dynamic Source Separation Using Non-Negative Factorizations", in IEEE Signal Processing Magazine to appear.
Virtanen, T., J. Gemmeke, B. Raj and P. Smaragdis. (2014). "Compositional models for audio processing", in IEEE Signal Processing Magazine, to appear.

Contact Us