Researchers at the University of Washington say they can now silence different parts of a noisy room, or isolate a conversation in a cluttered environment, thanks to a swarm of small audio robots that can autonomously locate, pinpoint, and track multiple moving sound sources.
We humans can locate sound sources with our eyes closed, thanks to the slightly distributed dual-microphone array and audio shielding provided by our ears. But when the audio environment becomes complex, things can get very confusing - something that goes against our idiosyncratic tendency to seek out noisy, crowded and high-energy spaces (like a Sunday morning coffee shop) and then try to hold a conversation in them.
In these more cluttered audio spaces, the only way to isolate individual sound sources and mute others is to deploy larger microphone arrays and then process all the audio streams together to create a map of the space that triangulates the location of each sound and measures the tiny time differences in the time it takes for the sound to travel through the air and reach each microphone. You can then use elusive deep learning algorithms to reprocess all audio streams, creating independent audio streams for each sound source and removing all noise from other sound sources.
The idea itself is not new, but researchers at the University of Washington have now put a new spin on the concept, using a swarm of seven small wheeled microphone robots, each about the size of a chocolate truffle, that deploy autonomously from a charging station and create a self-optimizing array within the available space.
The robots use built-in microphones and speakers to navigate the surface of the table via sonar, avoiding obstacles and spreading out as widely as possible to maximize the time difference between microphones. Unfortunately, this does mean they have to be moved one by one, but once in place they perform quite amazingly, as you can see in the video below.
So what is the ultimate goal? The research team believes that robotic arrays like this could be used as portable, automatically deployed, sound-isolating microphone arrays for conference room live broadcasts and the like, theoretically dispersing their own voices better than humans.
The team says it won't be of much use in two-way video calls because while it works efficiently, it currently takes about 1.82 seconds to process each three-second block of sound. Latency also means it won't be able to stream clean audio from a conversation partner to headphones in a noisy cafe in short order - although both applications are possible as computing power and speed improve.
Of course, it can also become a very convenient monitoring tool, eliminating the masking effect of crowd noise and recording private conversations. Interestingly, the University of Washington research team says it may do exactly the opposite.
"It has the potential to really benefit privacy beyond what current smart speakers allow," said doctoral student Malek Itani, co-first author of the study. "I could say 'Don't record anything around my desk,' and our system would record everything around me." ft. (0.9 meters). Anything in this bubble will not be recorded. Or, if there are two groups of people talking nearby, one group is having a private conversation while the other group is recording, the conversation of one group can be placed in a mute zone and remain private.
In reality, static distributed microphone arrays may start to be used in smart room or smart home designs, where they can easily isolate voice control commands to different areas. For example, you can control the TV just by listening to sounds from the couch, or even pick out drink orders from the person standing at the bar in a noisy venue.
The paper was published in the journal Nature Communications.