Toni Heittola: Making sense of everyday environments with computational audio content analysis
Automatic sound event detection means recognizing what is happening and when it is happening, in terms of sounds produced by different objects or beings. We humans understand a lot about the world around us based on the sounds we hear and recognize, but for machines, such ability to understand their acoustic surroundings is still very limited. The last decade has seen accelerated development in the field of computational analysis of sound scenes and events, taking it all the way to real-life applications.
The thesis gives a wide perspective to the development in the research field by covering research work spanning over a decade. The thesis covers largely the beginning of sound event detection as a research topic before powerful computational models and machine learning methods such as deep neural networks were available. The core contributions of this work to the development of the research field are related to sound event detection using a large set of sound classes in real-life environments, where multiple sounds occur simultaneously.
“This was the first work to extend the methods available at the time with approaches capable of recognizing multiple events at the same time. The ability to recognize a large number of different sounds in a wide variety of environments, from a quiet office to a noisy street, is very important for well-performing real applications.”, says Heittola.
Another key contribution of this thesis is the determined support for open research. Along with the development of methods, an important part of research is the comparison with other methods produced by the research community. For a fair comparison, methods need to be tested on the same data and measured using the same performance indicators. Towards this goal, the thesis contributed with open datasets and open-source tools, standardization of evaluation protocols and metrics, and benchmark systems for sound event detection. The open datasets released within the research work have been the basis for over 150 international research papers in the last 5 years, and the majority of these works have also used the uniform performance measurement protocol proposed in this thesis. The drive for open science and reproducible research has been supported to a great extent by the evaluation campaigns on Detection and Classification of Acoustic Scenes and Events (DCASE), to which the work presented in this thesis has contributed directly.
“When I started expanding the sound event detection research to real-life environments, I had a very complex problem on my hands. There was no research community that could collectively work on this, only isolated research attempts, because data and source code were proprietary. One researcher cannot achieve much alone. I think the greatest contribution of my thesis work was, in the end, to help organize the annual DCASE campaigns, because they have changed all this. We have now a thriving research community based on open science principles, and this is pushing research forward with tremendous leaps,” Toni Heittola says.
The doctoral dissertation of M.Sc. (Tech) Toni Heittola in the field of audio signal processing titled Computational Audio Content Analysis in Everyday Environments will be publicly examined in the Faculty of Information Technology and Communication Sciences of Tampere University at 12 o’clock on Friday 18th of June 2021. The venue is TB109, Tietotalo, Korkeakoulunkatu 1. Assoc. Prof. Romain Serizer from Université de Lorraine and Assoc. Prof. Dan Stowell from Tilburg University will be the Opponents. The Custos will be Prof Tuomas Virtanen from the Faculty of Information Technology and Communication Sciences of Tampere University.
The event can be followed via Zoom remote connection.
The dissertation is available online at http://urn.fi/URN:ISBN:978-952-03-2006-5