The Boombox: Visual Reconstruction from Acoustic Vibrations

We introduce The Boombox, a container that uses acoustic vibrations to reconstruct an image of its inside contents. When an object interacts with the container, they produce small acoustic vibrations. The exact vibration characteristics depend on the physical properties of the box and the object. We demonstrate how to use this incidental signal in order to predict visual structure. After learning, our approach remains effective even when a camera cannot view inside the box. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multi-modal data enables us to transform cheap acoustic sensors into rich visual sensors. Due to the ubiquity of containers, we believe integrating perception capabilities into them will enable new applications in human-computer interaction and robotics.


(With audio narrations and subtitles.)


Latest version: arXiv:2105.08052 [cs.CV] or here

Code and Dataset

We release the code at Boombox. You can follow the instructions on our GitHub page to use the code and the dataset. If you just want to download the dataset, you can click here (~2G).


 Columbia University


