Skip to main content
Research

Olli Kuparinen develops technology that converts even the strongest accent into text

Published on 20.1.2025
Tampere University
Olli Kuparinen, Suomen Akatemia tutkija
Photo: Eelis Berglund
Academy Research Fellow Olli Kuparinen from Tampere University has received funding from the Research Council of Finland for a study on dialects and temporal language variation and change. He is working to develop a model for the phonetic transcription of speech containing non-standard features.

While speech recognition technology has rapidly advanced in recent years, phonetic transcription – especially when it needs to accurately capture the sounds of spoken language – still requires significant time and effort. 

In his study titled “Speech as speech – acoustic modelling in variational linguistics”, 33-year-old Olli Kuparinen examines the conversion of speech to text while preserving dialectal variation. He received a €387,097 grant for his study from the Research Council of Finland in 2024. 

When studying spoken language, speech is typically transcribed on a phonetic level to provide an accurate visual representation of the sounds of speech. However, the process is extremely time-consuming for researchers. 

– Recording a large amount of people’s speech is the easy part; it is the manual transcription into written text that takes time. Transcribing one hour of audio recording can easily take all day,” Kuparinen says. 

In his current study, Kuparinen aims to use automatic speech recognition to output phonetic transcriptions, with each sound represented by a specific symbol. 

In addition, his study will cover two further topics: the second examines what speech recognition models can learn about language variation and change, while the third delves into the phonetic transcription of Finno-Ugric languages and their conversion into the International Phonetic Alphabet (IPA).  

Olli Kuparinen, Suomen Akatemia tutkijaPhoto: Eelis Berglund

Kuparinen hails from Lempäälä, Finland, and began his academic journey by studying Finnish at Tampere University. 

After working for a while as a teacher of Finnish and literature, he started writing his PhD thesis at Tampere University in a project funded by Kone Foundation. His PhD thesis explored linguistic changes in the Finnish spoken in Helsinki from the 1970s to the 2010s. 

At the time of publication, his longitudinal research data collected at three time points was unique, even on a global scale. This data is also included in Kuparinen’s current study. 

After completing his PhD, Kuparinen dedicated a couple of years to studying machine translation models at the University of Helsinki, an area of interest that he will also explore in his ongoing research.  

People often begin to adopt a more standardised version of language once they enter the professional world. The speech recognition models developed by Kuparinen may yield new insights, as they facilitate the study of language change in individuals through speech signals.  

Kuparinen’s enthusiasm for studying spoken language was sparked while working towards his degree. As he attended dialect courses, he noticed that the evolution of language is much more evident in spoken than written language.  

– Written language has been standardised at some point, whereas spoken language reflects the history of the people who speak it. Some dialects retain intriguing vestiges of earlier language varieties. In addition, the future of a language is reflected in speech long before it appears in writing.” 

Language variation and change lie at the heart of Kuparinen’s research. Although technology is continuously evolving and models become obsolete quickly, the development of speech recognition models can significantly advance future research in this field.  

In recent years, generative AI models that crate new content – such as ChatGPT – have moved into the mainstream and may be the first that come to mind when talking about the combination of language and technology. 

Even though generative AI deals with written language, linguists are not typically heavily involved in its development.  

– The linguistic capabilities of generative AI models largely depend on feeding them with as much data as possible that is available on different languages, Kuparinen explains. 

Automatic speech recognition models have also evolved rapidly. Kuparinen hopes to see linguistic expertise increasingly utilised in the development of AI language models. 

In the case of major languages, AI models can be effectively trained by exposing them to all available data. However, the limited data available on smaller languages and dialects necessitates the involvement of linguists to obtain and analyse this data. Thus, the role of science is to concentrate on studying small and under-represented languages and dialects. 

Kuparinen is developing a model that promotes equality, as current speech recognition models focus on the standard forms of spoken and written language. AI can understand the colloquial language spoken in the south of Finland but struggles with the strongly regional dialect spoken in the province of Savo, for example.  

– Of course, this also applies to other languages where the spoken version differs greatly from the written form, Kuparinen notes.  

Kuparinen’s research data includes both Finnish and Norwegian dialects. The Norwegian data was collected during his earlier research. A further reason for studying dialects in Norway is the country’s rich dialectal diversity. 

– Norway is a mountainous country with numerous geographical barriers. People speak differently in each valley. 

Kuparinen has selected his Finnish-language research data to ensure that old regional dialects are well-represented. This data, recorded in the 1960s, can now be freely used and shared. In addition, the differences between dialects are more pronounced in this older data compared to more recent recordings. 

Kuparinen is not concerned about the status of dialects in modern-day Finland. 

– The differences may not be as distinct anymore, but dialects still very much exist, with new variations emerging.