Center for Computational Audio and Acoustics
Paris Smaragdis, Andrew Singer, Mark Hasegawa-Johnson: Computer Science
Ivan Dokmanic: Electrical and Computer Engineering
Addressing the problem
Acoustics research has had significant impact in our lives lately. Devices like smart speakers, headphones, microphones, teleconferencing software, home theaters, and cell phones, rely on sophisticated algorithms that are tuned to removing noise and room echo effects, recognize where speakers are and who they are, detect telltale sounds (e.g. breaking glass), and more. Driving all this innovation are acoustics data sets that are used to fine-tune and train new algorithms. Unfortunately, the capability to create realistic data for acoustics purposes is not cheap. Expensive recording studios and equipment are necessary, and not easy to come by, especially in academia. Although synthetic data sets can be useful at early stages of research, they are often not realistic enough to be helpful later on. This creates a significant imbalance when it comes to the ability to perform acoustics research, which mostly disadvantages academics.
Our solution to this problem is the design of a robotically controlled recording room that anyone can use remotely to script specific recording settings. For example, a student in a developing country might need to create training recordings of native speakers in a meeting room for the purposes of developing a meeting diarization algorithm. That would be hard to do unless one has access to such a room and the recording equipment. Instead this student can upload recordings of speech to the proposed system, and use a script to describe how the virtual speakers in this room would move, what the acoustic characteristics of that room should be, and specify how many microphones will be used and where they would be located. This information will then be used to robotically move speakers and microphones in a room according to the specified script. Additionally the acoustical properties to the room will be adjusted to reflect the desired characteristics. By doing this researchers can obtain real recordings from high-quality equipment without requiring an investment towards a real-recording studio. In a similar manner, researchers can recreate existing datasets while changing certain aspects to suit their research needs. Or others can re-render data sets in new languages or room types, to extend existing data sets.
Our goal is to use such a service to democratize acoustics data set creation, and to facilitate reproducibility, and the creation of new benchmarks and challenges. Simultaneously this tool can also be used for the verification and evaluation of existing acoustics algorithms.
Research goals
This being a construction that has not been done before, there are multiple research aspects that need to be addressed. Designing such an automated recording studio will require the development of new tools on both the hardware and the software side. There are open design questions on how one can simultaneously (and silently) move multiple microphones and speakers inside an enclosed space, and what the potential limitations of such a system would be. Likewise, variable acoustics rooms have been around for a long time, but they have always been manually reconfigured. Automating this process in an efficient way will require the design of novel acoustical elements that can produce this effect.
Our goal at the moment is to design such a recording device at a small scale, and produce a proof of concept that solves the basic mechanical, movement, and software front-end problems that this endeavor poses. Once such a prototype will be done we will move on towards the construction of a larger and higher-quality implementation of a robotic recording room.