Plug-and-play activity recognition in an ecosystem of microphones

PI: Mayank Goel, Professor, Human-Computer Interaction Institute

Co PI: Chris Harrison, Assistant Professor, Human-Computer Interaction Institute

We have received funding from Carnegie Bosch Institute for Plug-and-Play Activity Recognition in an Ecosystem of Microphones. Future smart homes, offices, stores and many other environments will increasingly be monitored by sensors, supporting rich, context-sensitive applications. Today, the most prevalent way to achieve such high-fidelity sensing is to buy “smart” devices or retrofit existing objects of interest with sensor tags, though this carries a significant social, aesthetic, maintenance and financial cost. We propose to investigate if a similar level of sensing richness can be unlocked simply by listening to environments with commodity microphones. We plan to develop a real-time system that uses data from various microphones in a user’s environment to interpret actions and activities. We plan to leverage recently available sound data sets (such as Google AudioSet) to create a deep learning classifier capable of real-time recognition on low-cost computing platforms. Importantly, users and developers need not think of these “virtualized” sensor feeds (e.g., “door opened” or “blender running”) any differently than their electromechanical counterparts – APIs can be identical. Following development, we plan to validate and characterize the strengths and weaknesses of our acoustic approach through several real-world deployment studies.

Our overarching theoretical basis of this work comes from the observation that most actions performed by users generate sound as a byproduct (e.g., faucet running, door knock, microwave heating). These sounds are easily picked up by commodity. Even more importantly, operational sounds are highly characteristic of the device being employed: blenders sound like blenders, which sound nothing like dishwashers or microwaves, which in turn are unique.