Sensor recognizes cry for help among the noise

“Hey Google, turn on the table lamp.” “TURN…THE…TABLE…LAMP…AAA!” A smart speaker that responds to speech is useful, but in practice the thing often does not recognize your voice. Because you are standing a little too far away, the dishwasher is humming or your children are making noise.

Microphone technology has improved considerably in recent years, but recognizing and understanding voices with background noise remains difficult. The modern sound sensors that distinguish voices during a conference call often get confused by background rumbles or murmurs.

The problems arise when different sounds reach the microphone at about the same time, causing the signals to get mixed up. It is difficult to separate them in digital signal processing.

A Chinese research group has now developed a sensor to improve this, with a smartly designed so-called metamaterial. In a recently published publication in Science Advances, they show that this sensor can recognize speech with 96 percent accuracy. Furthermore, the device can distinguish the voices of two people sitting next to each other talking. It can also track the position of someone walking around calling for help in a simulated factory with noisy pumps and machine alarms blaring.

wobbly posts

The sensor they use to do this is the size of a melon. The surface consists of eleven pentagonal surfaces of metamaterial and one bare surface with which it can be attached somewhere. The metamaterial on each surface consists of an aluminum plate with thirty 10 millimeters high, wobbly posts made of stainless steel on silicone rubber. In the center of each pentagon is a small cavity with a device that converts the incoming sounds into electrical signals. The metamaterial ensures that the incoming sound is concentrated in the cavities. By looking at which of the surfaces the signal is strongest, the direction can be determined.

The metamaterial is designed in such a way that it responds most strongly to sound with frequencies that correspond to the frequencies of voice, e-mails Lei Shao of China’s Shanghai Jiao Tong University. “As a result, the sensor automatically filters out the lower-frequency background noise.”

The sensor can distinguish different voices. When everyone takes turns talking and sitting far apart, the sensor can simply see when and where the voice sounds come from. That doesn’t work if people talk at the same time and sit close to each other. To this end, the researchers have developed a computer system that learns to distinguish the pitch and, for example, the timbre of different voices.

“We hope that our sensor can be used in new smart speakers,” says Shao. “Then while washing dishes with running water in the kitchen, you could yell at the smart speaker in the dining room for some children’s music to calm your crying baby. Our experience is that this is not yet possible with the current smart speakers.”

ttn-32