Resynthesizing volumetric soundscapes : low-rank subspace methods for soundfield estimation and reconstruction

Sound and space are fundamentally intertwined, at both a physical and perceptual level. Sound radiates from vibrating materials, filling space and creating a continuous field through which a listener moves. Despite a long history of research in spatial audio, the technology to capture these sounds in space is currently limited. Egocentric (binaural or ambisonic) recording can capture sound from all directions, but only from a limited perspective. Recording individual sources and ambience is labor-intensive, and requires manual intervention and explicit localization.

In this work I propose and implement a new approach, where a distributed collection of microphones captures sound and space together, resynthesizing them for a (now-virtual) listener in a rich volumetric soundscape. This approach offers great flexibility to design new auditory experiences, as well as giving a much more semantically-meaningful description of the space. The research is situated at the Tidmarsh Wildlife Sanctuary, a 600-acre former cranberry farm that underwent the largest-ever freshwater restoration in the northeast. It has been instrumented with a large-scale (300 by 300 m2) distributed array of 10-18 microphones which has been operating (almost) continuously for several years.

This dissertation details methods for characterizing acoustic propagation in a challenging high-noise environment, and introduces a new method for correcting for clock skew between unsynchronized transmitters and receivers. It also describes a localization method capable of locating sound-producing wildlife within the monitored area, with experiments validating the accuracy to within 5m.

The scale of the array provides an opportunity to investigate classical array processing techniques in a new context, with nonstationary signals and long interchannel delays. We propose and validate a method for location-informed signal enhancement using a rank-1 spatial covariance matrix approximation, achieving 11dB SDR improvements with no source signal modeling.

These components are brought together in an end-to-end demonstration system that resynthesizes a virtual soundscape from multichannel signals recorded in situ, allowing users to explore the space virtually. Positive feedback is reported in a user survey.

Citation

© 2020 Massachusetts Institute of Technology. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created.

Authors

Spencer Russell

Institutions

MIT Media Lab