A computer can recognise burglars on the basis of sounds
The goal of the Everysound study led by Professor Tuomas Virtanen at Tampere University is simple: acomputer is taught to understand sounds. Although achieving this goal might seem easy, the implementation will require arduous work and expertise from different fields of technology.
“So far, such studies on everyday soundscapes are rare. Our initial aim was to teach a computer to identify all everyday sounds, which is a highly ambitious goal,” Virtanen says.
Virtanen received the €1.5 million ERC Starting Grant from the European Research Council for the period of five years from 2015 to 2020.
Artificial intelligence processes an enormous amount of data
The study applies computational analysis methods. A computer is played a huge number of audio recordings, from which algorithms learn to recognise different sounds. However, computers cannot learn the sounds without help from people.
“Our researchers must annotate all the material, which means that they produce metadata on the sounds on the computer. This requires a lot of slow and painstaking work,” Virtanen explains.
Compared to computers, people are naturally good at distinguishing sounds. For example, we relatively easily identify the voice of family members in a throng of people. We may also transfer attention to different sounds very quickly.
“We are good at focusing on just one sound even when we are surrounded by several sources of sound. Computers must be taught this skill,” Virtanen says.
Faster development enabled by more extensive sound databases
The challenge of sound recognition lies in the simultaneous presence of different sounds in the normal soundscape: people speak and walk, dogs bark, the air conditioner whirs, cars accelerate and brake, and winds blow in the trees.
Multiple sound recordings are required before algorithms learn to distinguish different sounds. When Virtanen’s research project started, suitable data barely existed.
“When we started, there were hardly any proper databases. However, the situation has continuously improved. For example, Google recently published an extensive audio database. International research cooperation is also increasing in the field,” Virtanen says.
Quality assurance requires many recordings in different places
The computer is only able to learn the sounds that are played to it. To ensure reliable identification, the data must be sufficiently versatile.
“We have recorded soundscapes across Europe, which we then assign to different categories,” Virtanen says.
The categories include, for example, train stations, and traffic and animal sounds. While making the recordings, privacy issues must be kept in mind.
“We diligently go through all the material to ensure that there are no privacy violations. Fortunately, our multilingual work community helps with language issues, but sometimes we have also needed help from language experts outside the University,” Virtanen says.
Sound recognition may help catch a burglar
High-quality technological sound recognition enables a wide range of applications. One example is acoustic surveillance.
“A burglar breaking and entering a building inevitably makes noise that computers can distinguish from the usual soundscape. Compared to cameras, the benefit of sound recognition is that sound carries: we can hear a window breaking around the corner, but a camera cannot see there,” Virtanen explains.
Another interesting application are context-based devices that are capable of changing their behaviour when changes occur in the environment. For example, a self-driving vehicle that listens to the surrounding soundscape could reduce speed when it hears children talking nearby.
Other potential applications include multimedia searches, noise control and noise reduction, and better hearing aids.
Better algorithms know how to ask people for advice
At present, computers programmed at Tampere University are capable of recognising prominent sounds both indoors and outdoors. However, the goal is to get more detailed sound recognition.
“We are currently able to recognise the sound of a car, for example. In the future, we want to be able to distinguish different cars from each other. We are also able to recognise steps walked in the corridor, but we want to be able to identify people based on the sound of their steps,” Virtanen says.
The development of algorithms makes it possible to reduce human work. Newer algorithms already know how to ask people for help when they find something interesting in mass data.
“Today, newer methods are able to ask people for help when they find something unknown to them. When it doesn't understand something, the computer asks what this sound is. This greatly reduces the amount of work done by people,” Virtanen says.
Text: Jaakko Kinnunen
Picture: Jonne Renvall