Neural network technology improves the intelligibility of hearing-aid-processed speech
Sound source separation is one of the applications of machine learning and is often achieved using deep neural networks (DNNs), especially recurrent neural networks. Pyykkönen carried out the study to isolate vocals from music with the help of a depthwise separable convolutional neural network (DWSCNN).
“The convolutions of the neural network we studied are a faster lightweight variant of typical convolutions and therefore require only a fraction of the parameters compared to networks that employ ordinary convolutions,” Pyykkönen notes.
The new solution has a broad range of potential applications, according to Pyry Pyykkönen. For one, the technology can improve sound quality on mobile phones and hearing aids. Conventional hearing aids use directional microphones to reduce unwanted background noise, but this does not work in an environment where multiple sources of sound are placed close to each other.
“Neural network technology facilitates the separation of background noise from speech and thereby improves sound quality and the intelligibility of hearing-aid-processed speech. The method I studied works on ordinary microphones and does not depend on the direction of incoming sounds,” Pyykkönen says.
“Lightweight machine learning solutions are also in great demand in the smartphone industry as mobile devices have less computing power than desktop computers. With the new technology, you could, for example, easily remove the vocals from your favourite song to create an instrumental karaoke version,” Pyykkönen adds.
Pyry Pyykkönen, who is working towards a master’s degree in electrical engineering, has also studied neural networks in his bachelor’s thesis that explored the denoising of motion capture data.
Research paper nominated for Best Paper Award at MMSP 2020
Pyykkönen’s research paper was nominated for a Best Paper Award at the 2020 IEEE International Workshop on Multimedia Signal Processing (MMSP 2020), the leading international conference on signal processing that took place in September. His co-authors on the paper titled Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation include Styliannos Mimilakis from the Fraunhofer Institute for Digital Media Technology, Germany, and Konstantinos Drossos and Tuomas Virtanen from Tampere University.
The virtual MMSD 2020 workshop brought together around 600 world-leading experts in signal processing and multimedia from academia and industry. The event focused on technologies and applications relating to immersive audio-visual experiences with both industrial and commercial potential.
MMSP 2020 was organised by IEEE (Institute of Electrical and Electronics Engineers), the world’s largest professional association for electronic and electrical engineering, together with the IEEE Signal Processing Society, Tampere University and the Centre for Immersive Visual Technologies (CIVIT) at Tampere University. The event was sponsored by Huawei, YouTube, Nokia and Xiaomi.
Tel. +358 44 278 4858
pyry.pyykkonen [at] tuni.fi (pyry[dot]pyykkonen[at]tuni[dot]fi)
Text: Anna Aatinen
Photo: Sari Laapotti
What is convolutional neural network?
A Convolutional Neural Network (CNN) is deep neural network that was originally designed for image analysis. Besides detecting and identifying visual objects, CNN can be used to process non-visual data, such as natural language and audio. A neural network is a computational system that mimics the human brain.