Doctoral dissertation

Can machines learn to listen like humans?

Tohtori
The human auditory system can easily detect the different sound events around them, and further, locate their respective positions in space. They can continue to do this even in a complex sound scene. This detection and localization of sound events enable humans to interact with their surroundings. For example, imagine walking across a street, humans can detect vehicles approaching them with just the audio and take the necessary actions seamlessly. Teaching machines to listen in a similar fashion will bring them one step closer to human capabilities.

In this regard, as part of the dissertation, Sharath Adavanne developed a novel method to teach machines to listen. The proposed method is a generic one and can be trained to learn different sound events of interest. Additionally, the method can detect the position in space where the respective sound event is active and further track their position with respect to time. To perform this, the proposed method expects machines to have more than one microphone, where each microphone provides separate audio signals.

The applications of such machine listening are numerous. One of the key applications is that it enables smart devices to be acoustically-context-aware, i.e., it can automatically recognize and understand the sound events happening around it without human input, hence enabling the smart devices to interact with the world more naturally. For instance, humanoid robots can recognize the sound events of interest and navigate in their direction. Smart teleconferencing hardware can recognize and localize the active speaker and track their motion around the room with respect to time.

According to the World Health Organization (WHO), 5% of the world population suffers from hearing disability. With the help of the proposed method, we can enable the existing digital personal assistants, such as smart-glasses, to visualize sounds and enable the hearing impaired to interact with the world more naturally.

Sharath Adavanne was born and bought up in Mysore, India. He currently works at ZAPR Media Labs in Bangalore, India as a Senior Research Scientist developing machine listening methods.

The doctoral dissertation of MSc (Tech) Sharath Adavanne in the field of Signal Processing titled Sound Event Localization, Detection, and Tracking by Neural Networks will be publicly examined in the Faculty of Information Technology and Communication Sciences at Tampere University at 12:00 PM on Wednesday 4.3.2020 on Hervanta campus in Auditorium TB214 of Tietotalo building (Korkeakoulunkatu 1, Tampere). The opponents will be Professor Emanuël Habets from International Audio Laboratories Erlangen, Germany and Doctor of Science (Tech) Toni Hirvonen from Yousician Finland. The Custos will be Professor Tuomas Virtanen, Tampere University.

The dissertation is available online at http://urn.fi/URN:ISBN:978-952-03-1462-0