Digital Language Typology: Mining from the Surface to the Core 2016-2019

Digital language typology (DLT) is a multi-disciplinary project intending to produce a computer-based platform that will be able to assess the structurally manifested family relationships within any set of languages with appropriate large digital textual and speech material. To this end, we have collected a group of specialists from phonetics, linguistics, and computer science.  

The project focuses on comparing several Uralic languages with (Indo-European) languages that are most relevant to their evolution in terms of geographical distribution and history of language contact. In addition to relatively well studied Finnish, Hungarian and Estonian, we will include three Samoyedic languages, namely Tundra and Forest Nenets and Nganasan, with distinct phylogenetic history.


The project will provide the research community with new tools and shed new light on the linguistic history of mankind and advance the new field of Digital Humanities (DIGIHUM).  

The methods and technologies include machine translations tools, speech synthesizers and recognizers, and voice controlled security systems. Substantial data sets with textual and spoken language material have been collected.  

In this collaborative research project we propose to use new technological resources to address the original question of relationships between languages, in particular their structural characteristics, in a novel way, in which digital technologies and state-of-the-art computational science methods are used for collecting, managing, and analysing data in humanities and social sciences research.

Funding source

Academy of Finland

Coordinating organisation(s)

TAUCHI, The University of Helsinki

Contact persons

Markku Turunen


+358 40 533 9689