
While binaural audio offers a rich, human-like perspective and a flexible recording setup, it suffers from inherent limitations such as front-back ambiguities and the "cone of confusion" effect, which can hinder accurate sound source localization.
Conventional approaches often struggle with these challenges, particularly in static scenarios where the listener is not moving. Furthermore, complex auditory tasks like identifying a sound event, classifying the environment, and determining the sound's direction and distance are often handled as separate problems.
M.Sc. Daniel Aleksander Krause’s research introduces a multi-task approach to acoustic scene analysis, using deep neural networks to jointly perform sound event detection, acoustic scene classification, direction-of-arrival estimation, and sound distance estimation.
A key innovation of this work is the integration of listener motion cues to overcome the physical limitations of binaural audio.
“By incorporating data from head rotations and the listener's movement through space, the proposed systems mimic human strategies for resolving auditory ambiguities. This approach leads to significant performance gains, reducing localization errors multiple fold compared to static systems,” Krause explains.
The dissertation presents novel methods for 3D sound event localization and detection (SELD) that, for the first time, incorporate distance estimation into a unified framework.
“The results underscore the potential of dynamic, movement-aware systems to create more robust and accurate machine listening, paving the way for advanced applications in augmented reality, robotics, modern hearing aids, and other assistive technologies,” Krause says.
To support reproducibility and foster future work, several new datasets created for this research have been made publicly available.
Public defence on 24 October 2025
The doctoral dissertation of MSc Daniel Aleksander Krause titled Binaural audio for multi-task acoustic scene analysis will be publicly examined at the Tampere University, Faculty of Information Technology and Communication Sciences in Hervanta campus, Tietotalo, at the auditorium TB109 (Korkeakoulunkatu 1, Tampere) on 24 October 2025, 12:00. The opponents will be Professor Nilesh Madhu from Ghent University, Belgium and Professor Jung-woo Choi from Korea Advanced Institute of Science & Technology (KAIST), South Korea. The Custos will be Associate Professor Annamaria Mesaros from The Faculty of Information Technology and Communication Sciences, Tampere University.
The doctoral dissertation is available online.
The public defence can be followed via remote connection.
