Data Science Research Centre | Tampere University

The research activities of the Data Science Research Centre are structured around four core themes.

Fundamental Machine Learning:

Fundamental Machine Learning focuses on foundational questions in modern machine learning. We study both theoretical and experimental approaches that advance our understanding of how and why machines (can) learn, and propose methodological improvements leading to state-of-the-art solutions in applications of Artificial Intelligence. Our research areas include deep neural network design and training, generative modeling, neural architecture search, statistical (probabilistic) machine learning and inference, just-in-time and distributed inference, learning from multiple data sources, and nonlinear dimensionality reduction. Recent application areas of our research are machine and robot perception, financial data analysis, biomedical signal analysis, industrial data analysis, information retrieval, video coding and IoT.

Keywords: Deep Learning, Statistical Machine Learning, Generative Modeling, Multimodal Learning, Artificial Intelligence

Team: Alexandros Iosifidis, Moncef Gabbouj, Tarmo Lipping, Jaakko Peltonen, Henri Pesonen, Ari Visa

Explainable & Fair Artificial Intelligence:

Explainable and Fair Artificial Intelligence addresses multiple perspectives of equitable and trustworthy artificial intelligence. One research line investigates the mathematical properties of neural models to enhance their interpretability. Another explores logical models that replicate the behavior of machine learning systems, providing rule-based representations of complex model reasoning. A further direction examines how rules can be learned directly from data and applied in real AI applications. In the area of Fair AI, the research focuses on developing fair recommender systems and retrieval-augmented generation (RAG) approaches that ensure equitable treatment of consumers and providers, as well as methods for fair visualization of complex data. Moreover, we tackle the challenge of fairness-aware entity resolution in dynamic environments while advancing explainability in these contexts.

Keywords: Explainable AI, Fair AI, Responsible AI, Transparency, Algorithmic Fairness, Fair Machine Learning

Team: Moncef Gabbouj, Gerardo Iniguez Gonzalez, Jyrki Nummenmaa, Jaakko Peltonen, Kostas Stefanidis, Ari Visa

Complex Systems & Network Science:

Complex systems are large collections of components – locally interacting with each other and their environment at small scales – that self-organise into global structures and behaviours at larger scales, even without central control. Components in complex systems form networks of interactions (neurons in the brain, computers in the Internet, humans in relationships) that make it difficult to study them in isolation or to accurately predict their future, due to nonlinear or chaotic dynamics. These interactions can be heterogeneous, dynamical, multilayered or interdependent, and lead to emergent behaviour at the system level. Complex networked systems also adapt to their environment at multiple scales, via cognitive, social, and evolutionary mechanisms, leading to robustness (or fragility) to external perturbations. The field requires a cross-disciplinary approach mixing physics, biology, ecology, social sciences, finance, business, management, politics, psychology, anthropology, medicine, engineering, information technology, and more, since complex systems in different domains often show universal features captured by the same mathematical and computational models.

Keywords: Networks of Interactions, Nonlinear Dynamics, Self-organization and Emergent Behaviour, Mathematical and Computational Modelling

Team: Gerardo Iniguez Gonzalez, Henri Hansen, Juho Kanniainen

Applied Machine Learning & Statistical Modelling:

Applied Machine Learning and Statistical Modelling focuses on developing and applying intelligent data-driven methods that bridge modern Artificial Intelligence and statistical reasoning to solve real-world problems across science, engineering, and industry. Methodologically, our research centers on deep learning, generative modelling, probabilistic machine learning, Bayesian methods, likelihood-free inference, nonlinear dimensionality reduction, and visualization of high-dimensional data. We study and advance methods in predictive modelling, time series analysis, graph-based learning, recommender systems, natural language processing and text analysis, exploratory data analysis, and data-driven decision making. Application areas of our research include healthcare and biomedical data analysis, financial modelling, industrial process optimization, social and behavioral data analysis, and smart systems.

Keywords: Time Series Analysis, Graphs, Natural Language Processing, Text Analysis, Likelihood-free Inference, Recommender Systems, Exploratory Data Analysis

Team: Moncef Gabbouj, Gerardo Iniguez Gonzalez, Henri Hansen, Alexandros Iosifidis, Juho Kanniainen, Tarmo Lipping, Jaakko Peltonen, Henri Pesonen, Ari Visa, Arto Luoma, Jarkko Isotalo, Hyon-Jung Kim-Ollila

Contact persons

Tampere University

Konstantinos Stefanidis

Professor. Data science. Faculty of Information Technology and Communication Sciences.

+358504174121

konstantinos.stefanidis [at] tuni.fi (konstantinos[dot]stefanidis[at]tuni[dot]fi)

Computational Intelligence group (CoIn)

Professor Alexandros Iosifidis is leading Computational Intelligence Group. Its research focuses on designing, analyzing, understanding, and applying Machine Learning approaches in problems coming from Computer/Robot Vision and Perception, Finance, and graph analysis.

Computational Intelligence group (CoIn)

Prof. Alexandros Iosifidis

Signal Analysis and Machine Intelligence

In the field of Machine Learning, Prof. Gabbouj’s Signal Analysis and Machine Intelligence Group introduced a paradigm shift in ANN by extending the linear operation part of the perceptron to an arbitrary nonlinear function. In this way, Multilayer Perceptrons (MLP) were upgraded to Generalized Operational Perceptrons, and Convolutional Neural Networks (CNN) were extended to Operational Neural Networks (ONN).

Signal Analysis and Machine Intelligence Group

Prof. Moncef Gabbouj

Financial Data Science

Prof. Kanniainen is leading Financial Data Science group. His group, in collaboration with Prof Iosifidis, develops machine learning models for Limit Order Book Forefcasting, published in leading AI journals, widely recognized, and among the most cited in the field. Examples are TABL model (cited >300 times) and recently developed LOBERT model (NeurIPS workshop best poster award). Moreover, the group has developed network methods to model information cascades with partial observations on individuals’ states, which are applied for stock markets. These methods can be used to study how social relations drive investors in their decision making and to identify abuse of information in stock markets.

Financial Data Science

Prof. Juho Kanniainen

Natural Language Processing

Our Natural Language Processing group, led by Prof. Nummenmaa, has developed novel rule-based and machine learning approaches for question answering, to retrieve answers to natural language queries from big data and knowledge bases. The group has also developed methods for managing and analyzing grammatically parsed data and worked on different text mining tasks, such as frequent pattern mining and distinguishing pattern mining, including sequence mining for textual representations, suitable for mining biological data, represented as text.

Natural Language Processing Group

Prof. Jyrki Nummenmaa

Recommender Systems

Recommender Systems tend to anticipate user needs by automatically suggesting the information which is most appropriate to the users and their current context. Prof. Stefanidis' Recommender Systems Group focuses on algorithmic approaches for traditional and more sophisticated scenarios, like group and sequential recommendations, developing and applying machine learning solutions, and building on both numerical ratings and textual reviews. Moreover, the group studies the big data integration and entity resolution problem for highly heterogeneous data, with a recent focus on progressive solutions for entity matching.

Recommender Systems Group

Prof. Konstantinos Stefanidis

Statistical Machine Learning and Exploratory Data Analysis

Prof. Peltonen’s Statistical Machine Learning and Exploratory Data Analysis Group focuses on designing and developing Statistical Machine Learning solutions for modeling and exploring data. This includes novel methods for modeling text and matrix data with topic modeling approaches, vectorial embedding approaches generalizing word embeddings, and novel matrix factorization solutions. His group also works on methods for information retrieval from large databases, including modeling and elicitation of user intent by Bayesian regression, probabilistic retrieval, and visualization of user intent.

Statistical Machine Learning and Exploratory Data Analysis Group

Prof. Jaakko Peltonen

Data Analytics and Optimization

Prof. Lipping’s Data Analytics and Optimization group, located at the Pori Campus, develops deep learning and AI solutions for agriculture, health, and industry.

Data Analytics and Optimization Group

Prof. Tarmo Lipping

Tampere Complexity Lab

The Tampere Complexity Lab, Prof. Iñiguez’s research group in network science and computational social science, develops computational tools and mathematical theories to understand collective human behaviour by analyzing data and making models of social digital interactions available online. TaCoLAB uses an interdisciplinary, data- and mechanism-driven perspective to study group segregation in social networks, attitudinal polarization online, information diffusion, and the dynamics of ranked and hierarchical complex systems.

Tampere Complexity Lab

Associate Prof. Gerardo Iñiguez

Responsible Data Management and Ethical Artificial Intelligence

Profs. Nummenmaa, Peltonen, Elomaa, Stefanidis and Juhola focus as well on Responsible Data Management and Ethical Artificial Intelligence, where a rising concern is how to perform statistical data analysis and machine learning in an ethical, fair, transparent, and explainable manner. In this line of work, we also focus on enabling different stakeholders to query, understand and fix sources of bias in data science solutions, in an accessible and transparent manner. Methods for providing explanations that target at understanding the cause of unfairness and examine the capability to capture user intent that typically changes across sessions are developed.

Prof. Martti Juhola

Prof. Jyrki Nummenmaa

Prof. Tapio Elomaa

Prof. Konstantinos Stefanidis

Multimedia and Data Mining

Prof Visa's Multimedia and Data Mining Group works with explainable machine learning or artificial intelligence. The main application fields for this technique are time series of hyperspectral signals or images.

Prof. Ari Visa

Urban Physics Research Group

The way our urban areas are designed influences the amount of energy we consume, our exposure to environmental hazards such as pollution and climate change, and our health. The Urban Physics Research Group uses data science and physics-based models to understand how to design urban environments that are healthy and energy efficient, now and in the future.

Urban Physics Research Group

Associate Prof. Jonathon Taylor

Decision Support for Health

The group, led by prof. Mark van Gils, develops data-driven analysis methods to help healthcare professionals and patients get actionable information out of complex health-related data. The groups’ methods are typically based on combinations of biomedical signal processing, (explainable) AI and ML, and statistical analysis. Our methods are designed to work with real-life, suboptimal quality, data, and coming from different modalities. As specific domain examples, we have several decades of expertise in critical care decision making (intelligent patient monitoring, intervention planning) and chronic diseases (risk assessment, recommendation and motivation). Furthermore, we actively contribute to health ICT standardization initiatives.

Decision Support for Health

Prof. Mark van Gils

Applied Statistical Data Analysis

Our Statistics Research is strongly connected with data science but has both its own distinct aspects within data science and its own core research separate of data science: in particular, solutions of tasks are carried out via statistical modeling, analysis of time-dependent data (timeseries, longitudinal), planning of data gathering, treatment of distributional assumptions, representation and management of uncertainty, probabilistic estimation and inference, prediction and hypothesis testing, and the research and theory of these core methodologies is unique to statistics. The Group of Applied Statistical Data Analysis conducts applied statistical research, where statistical methods are used and modified to solve research problems in different disciplines, for example in health, medicine, social sciences and technology.

Centre for Applied Statistics and Data Analysis

AI Hub Tampere

The principle of AI Hub Tampere is to make AI easy to reach and affordable, and thus all our services are free of charge, neutral and equal for all companies active in Pirkanmaa. The AI Hub is part of nationwide network of AI centres that is developing fast. Our aim is to assist local companies in boosting their competitive edge. Our focus has been on sustainable AI, health technology, and energy efficiency. The methods and devices we often work with include collaborative robots and self-driving vehicles.

AI Hub Tampere

Director

Konstantinos Stefanidis

Tampere University

Professor

Data science

Faculty of Information Technology and Communication Sciences

+358504174121konstantinos.stefanidis [at] tuni.fi (konstantinos[dot]stefanidis[at]tuni[dot]fi)

Co-Directors

Jyrki Nummenmaa

Tampere University

Professor

tietojenkäsittelyoppi, al. ohjelmistot, erit. hajautetut järj. tai ohj.kehitys

Faculty of Information Technology and Communication Sciences

+358405277999jyrki.nummenmaa [at] tuni.fi (jyrki[dot]nummenmaa[at]tuni[dot]fi)

Jaakko Peltonen

Tampere University

Professor

tilastotiede, erityisesti data-analyysi

Faculty of Information Technology and Communication Sciences

+358503187116jaakko.peltonen [at] tuni.fi (jaakko[dot]peltonen[at]tuni[dot]fi)

Gerardo Iniguez Gonzalez

Tampere University

Associate Professor

complex systems

Faculty of Information Technology and Communication Sciences

+358409344233gerardo.iniguez [at] tuni.fi (gerardo[dot]iniguez[at]tuni[dot]fi)

Professors

Juho Kanniainen

Tampere University

Contact person of Data Science area.

Professor

Computing Sciences

Faculty of Information Technology and Communication Sciences

Alexandros Iosifidis

Tampere University

Professor

Machine learning

Faculty of Information Technology and Communication Sciences

+358504479074alexandros.iosifidis [at] tuni.fi (alexandros[dot]iosifidis[at]tuni[dot]fi)

Moncef Gabbouj

Tampere University

Professor

Information Technology

Faculty of Information Technology and Communication Sciences

+358400736613moncef.gabbouj [at] tuni.fi (moncef[dot]gabbouj[at]tuni[dot]fi)

Tarmo Lipping

Tampere University

Professor

signaalinkäsittely

Faculty of Information Technology and Communication Sciences

+358408262860tarmo.lipping [at] tuni.fi (tarmo[dot]lipping[at]tuni[dot]fi)

Ari Visa

Tampere University

Professor

signaalinkäsittely

Faculty of Information Technology and Communication Sciences

+358407287969ari.visa [at] tuni.fi (ari[dot]visa[at]tuni[dot]fi)

Henri Pesonen

Tampere University

Assistant Professor

soveltava tilastotiede

Faculty of Information Technology and Communication Sciences

+358503498322henri.pesonen [at] tuni.fi (henri[dot]pesonen[at]tuni[dot]fi)

Researchers & Lecturers

Henri Hansen

Tampere University

University Lecturer

Faculty of Information Technology and Communication Sciences

+358504478616henri.hansen [at] tuni.fi (henri[dot]hansen[at]tuni[dot]fi)

Jarkko Isotalo

Tampere University

University Lecturer

tilastotiede

Faculty of Information Technology and Communication Sciences

+358504377565jarkko.isotalo [at] tuni.fi (jarkko[dot]isotalo[at]tuni[dot]fi)

Jari Turunen

Tampere University

University Lecturer

Faculty of Information Technology and Communication Sciences

+358408262748jari.turunen [at] tuni.fi (jari[dot]turunen[at]tuni[dot]fi)

Hyon-Jung Kim-Ollila

Tampere University

University Lecturer

tilastotiede, data-analyysi

Faculty of Information Technology and Communication Sciences

+358503187468hyon-jung.kim [at] tuni.fi (hyon-jung[dot]kim[at]tuni[dot]fi)

Kati Iltanen

Tampere University

University Lecturer

tietojenkäsittelyoppi, alana tiedonhallinta

Faculty of Information Technology and Communication Sciences

+358503185857kati.iltanen [at] tuni.fi (kati[dot]iltanen[at]tuni[dot]fi)

Arto Luoma

Tampere University

University Lecturer

tilastotiede

Faculty of Information Technology and Communication Sciences

arto.luoma [at] tuni.fi (arto[dot]luoma[at]tuni[dot]fi)

Giulia de Meijere

Tampere University

Postdoctoral Research Fellow

Faculty of Information Technology and Communication Sciences

giulia.demeijere [at] tuni.fi (giulia[dot]demeijere[at]tuni[dot]fi)

Mehmet Yamac

Tampere University

Postdoctoral Research Fellow

Faculty of Information Technology and Communication Sciences

+358503458755mehmet.yamac [at] tuni.fi (mehmet[dot]yamac[at]tuni[dot]fi)

Fahad Sohrab

Tampere University

Postdoctoral Research Fellow

Faculty of Information Technology and Communication Sciences

+358504731085fahad.sohrab [at] tuni.fi (fahad[dot]sohrab[at]tuni[dot]fi)

Anubha Goel

Tampere University

External Expert

Faculty of Information Technology and Communication Sciences

+358503201908anubha.goel [at] tuni.fi (anubha[dot]goel[at]tuni[dot]fi)

Javier Ureña Carrion

Tampere University

Postdoctoral Research Fellow

Faculty of Information Technology and Communication Sciences

javier.urenacarrion [at] tuni.fi (javier[dot]urenacarrion[at]tuni[dot]fi)

Reza Shafiloo

Tampere University

Doctoral Researcher

Faculty of Information Technology and Communication Sciences

+358505951439reza.shafiloo [at] tuni.fi (reza[dot]shafiloo[at]tuni[dot]fi)

Tanvi Sharma

Tampere University

Visitor, Research

Faculty of Information Technology and Communication Sciences

tanvi.sharma [at] tuni.fi (tanvi[dot]sharma[at]tuni[dot]fi)

Lei Xu

Tampere University

Postdoctoral Research Fellow

Faculty of Information Technology and Communication Sciences

lei.xu [at] tuni.fi (lei[dot]xu[at]tuni[dot]fi)

Seyedeh Ebrahimi

Tampere University

Doctoral Researcher

Faculty of Information Technology and Communication Sciences

+358503377230seyedeh.ebrahimi [at] tuni.fi (seyedeh[dot]ebrahimi[at]tuni[dot]fi)

Anton Muravev

Tampere University

Student, Doctoral Research

Faculty of Information Technology and Communication Sciences

0401981309anton.muravev [at] tuni.fi (anton[dot]muravev[at]tuni[dot]fi)

Mete Ahishali

Tampere University

Visitor, Research

Faculty of Information Technology and Communication Sciences

mete.ahishali [at] tuni.fi (mete[dot]ahishali[at]tuni[dot]fi)

Petri Linna

Tampere University

Project Manager

Faculty of Information Technology and Communication Sciences

petri.linna [at] tuni.fi (petri[dot]linna[at]tuni[dot]fi)

Antti Halla

Tampere University

Researcher

Faculty of Information Technology and Communication Sciences

+358504073990antti.halla [at] tuni.fi (antti[dot]halla[at]tuni[dot]fi)

Wenji Bai

Tampere University

Doctoral Researcher

Faculty of Information Technology and Communication Sciences

+358504270052wenji.bai [at] tuni.fi (wenji[dot]bai[at]tuni[dot]fi)

Matin Beiram Vand

Tampere University

Student, Doctoral Research

Faculty of Information Technology and Communication Sciences

matin.beiramvand [at] tuni.fi (matin[dot]beiramvand[at]tuni[dot]fi)

Ilke Adalioglu

Tampere University

Doctoral Researcher

Faculty of Information Technology and Communication Sciences

ilke.adalioglu [at] tuni.fi (ilke[dot]adalioglu[at]tuni[dot]fi)

Kirsi Sandberg

Tampere University

Visiting Researcher

Faculty of Social Sciences

kirsi.sandberg [at] tuni.fi (kirsi[dot]sandberg[at]tuni[dot]fi)

Juho Karvinen

Tampere University

Doctoral Researcher

Faculty of Social Sciences

juho.karvinen [at] tuni.fi (juho[dot]karvinen[at]tuni[dot]fi)

Elina Siren

Tampere University

Doctoral Researcher

Faculty of Built Environment

+358503516622elina.siren [at] tuni.fi (elina[dot]siren[at]tuni[dot]fi)

Affiliated Members

Mark van Gils

Tampere University

Professor

Digital Healthcare

Faculty of Medicine and Health Technology

+358504066610mark.vangils [at] tuni.fi (mark[dot]vangils[at]tuni[dot]fi)

Pekka Abrahamsson

Tampere University

Professor

ohjelmistotekniikka

Faculty of Information Technology and Communication Sciences

+358405415929pekka.abrahamsson [at] tuni.fi (pekka[dot]abrahamsson[at]tuni[dot]fi)

Jonathon Taylor

Tampere University

Professor

Urban Physics

Faculty of Built Environment

+358505914794jonathon.taylor [at] tuni.fi (jonathon[dot]taylor[at]tuni[dot]fi)

Topics on network science, computational social science, complex systems

Contact: Gerardo Iñiguez, gerardo.iniguez [at] tuni.fi

Universal behaviour in rank dynamics and other complex phenomena

Many complex systems develop rankings of their elements that emerge from networked interactions. These rankings evolve according to system-dependent mechanisms of interaction and reflect the relevance of elements in performing a function in the system. By analysing ranking data on many social, biological, and economic systems, as well as simple models of rank dynamics, we will explore generic features of rank stability that allow us to model and predict patterns of ranking behaviour across a variety of complex systems.

G. Iñiguez et al. Dynamics of ranking. Nature Communications 13, 1646 (2022). DOI: https://doi.org/10.1038/s41467-022-29256-x. arXiv: https://arxiv.org/abs/2104.13439

J. A. Morales et al. Rank dynamics of word usage at multiple scales. Frontiers in Physics 6, 45 (2018). DOI: https://doi.org/10.3389/fphy.2018.00045. arXiv: https://arxiv.org/abs/1802.07258

J. A. Morales et al. Generic temporal features of performance rankings in sports and games. EPJ Data Science 5, 33 (2016). DOI: https://doi.org/10.1140/epjds/s13688-016-0096-y. arXiv: https://arxiv.org/abs/1606.04153

Underlying mechanisms of collective behaviour in social systems

With dynamical systems analysis of simple models of social network evolution, we will estimate how much of the structure and dynamical patterns in friendship and communication networks is a result of underlying mechanisms of social interaction, explaining the rise of segregated groups in society, among other collective social phenomena. This will also allow us to infer attitudinal space embeddings for political social networks.

A. F. Peralta et al. Multidimensional political polarization in online social networks. Physical Review Research 6, 013170 (2024). DOI: https://doi.org/10.1103/PhysRevResearch.6.013170. arXiv: https://arxiv.org/abs/2305.02941

G. Iñiguez et al. Universal patterns in egocentric communication networks. Nature Communications 14, 5217 (2023). DOI: https://doi.org/10.1038/s41467-023-40888-5. arXIv: https://arxiv.org/abs/2302.13972

A. Asikainen et al. Cumulative effects of triadic closure and homophily in social networks. Science Advances 6, eaax7310 (2020). DOI: https://doi.org/10.1126/sciadv.aax7310. arXiv: https://arxiv.org/abs/1809.06057

Uncovering the temporal evolution on/of techno-social networks

We will use large-scale data from online social platforms to study and predict the temporal evolution of social contagion processes and cascading behaviour related to the use of innovations and new digital markets. By considering data features like weighted or multiplex social interactions, product competition, and temporal social networks, we will aim at increasing the explanatory and predictability power of models of temporal networks, information diffusion, and social contagion.

S. Unicomb et al. Dynamics of cascades on burstiness-controlled temporal networks. Nature Communications 12, 133 (2020). DOI: https://doi.org/10.1038/s41467-020-20398-4. arXiv: https://arxiv.org/abs/2007.06223

S. Unicomb et al. Threshold driven contagion on weighted networks. Scientific Reports 8, 3094 (2018). DOI: https://doi.org/10.1038/s41598-018-21261-9. arXiv: https://arxiv.org/abs/1707.02185

Z. Ruan et al. Kinetics of social contagion. Physical Review Letters 115, 218702 (2015). DOI: http://dx.doi.org/10.1103/PhysRevLett.115.218702. arXiv: https://arxiv.org/abs/1506.00251

Opinion formation, algorithmic bias, and deception on coevolving social networks

Opinions synthesise our perceptions and the knowledge we have of the external world, other people, and ourselves. To better understand the impact opinions have on social interactions and decision-making, we will explore data from small controlled experiments and analyse idealised models of opinion formation, deception, and network change, potentially under the effect of algorithmic bias, letting us gauge how opinions influence the structure and dynamics of off- and online society.

A. F. Peralta et al. The effect of algorithmic bias and network structure on coexistence, consensus, and polarization of opinions. Physical Review E 104, 044312 (2021). DOI: https://doi.org/10.1103/PhysRevE.104.044312. arXiv: https://arxiv.org/abs/2105.07703

G. Iñiguez et al. Effects of deception in social networks. Proceedings of the Royal Society B 281, 20141195 (2014). DOI: http://dx.doi.org/10.1098/rspb.2014.1195. arXiv: https://arxiv.org/abs/1406.0673

G. Iñiguez et al. Opinion and community formation in coevolving networks. Physical Review E 80, 066119 (2009). DOI: http://link.aps.org/doi/10.1103/PhysRevE.80.066119. arXiv: https://arxiv.org/abs/0908.1068v2

Topics on Recommendation Systems

Contact: Kostas Stefanidis, konstantinos.stefanidis [at] tuni.fi

Explainability in Retrieval-Augmented Generation (RAG) Systems

Retrieval-Augmented Generation (RAG) architectures combine Large Language Models (LLMs) with external knowledge sources to improve factual accuracy and scalability of generative AI systems. While RAG has become a dominant paradigm in industrial and scientific applications, the interaction between retrieval and generation remains largely opaque. In particular, it is unclear how individual retrieved documents influence generated outputs, how errors or biases introduced during retrieval propagate to generation, and whether RAG pipelines amplify existing biases present in data and models. The goal of this thesis is to develop explainable RAG frameworks that make the retrieval–generation process transparent, auditable, and controllable. The primary research question is: Post-hoc explainability: Develop task-agnostic, modular explanation techniques that operate independently of RAG internals, enabling broad applicability across domains and architectures.

The work will primarily rely on publicly available textual datasets commonly used in information retrieval and natural language processing research, such as web corpora, news collections, and benchmark QA datasets. Pre-trained open-source LLMs and retrieval models will be employed and adapted as experimental platforms. The methodological approach includes post-hoc explainability: Developing model-agnostic explanation techniques that analyze input–output behavior of RAG systems without requiring access to internal model parameters.

Explanations for Different Stakeholders

Unexpected (either existing or missing) and/or unjustified recommendations may frustrate the users and can be detrimental to their trust and loyalty to the recommender system (RS). Explainable RS share a piece of information from the system with users, so that they understand how/why the recommendation was computed. This information is commonly called an explanation and can either be 1) system-agnostic and data-specific information (e.g., user/item characteristics, previous actions), or 2) system-specific information (e.g., a set of rules in a rule-based decision system). At the same time, different stakeholders (e.g., consumers, producers, or system administrators) require different levels of detail in the explanations. For example, consumers can be satisfied with more general explanations, while system administrators require detailed model-specific ones.

Tasks: Design an Explainable RS that is able to provide explanations to different stakeholders based on their roles. Explanations for different stakeholders can be one of but not limited to the following:

Consumer: Model-agnostic - Explanations may include which previous actions of the consumer caused the recommendation to be received. (Counterfactual Explanations) [3,4]. Model-specific (user-based Collaborative Filtering) - Analysis of what other similar users to the consumer preferred and why the recommendation was generated [2].

Producer: Explanations for producers may include 1) which users prefer their products, 2) the product’s exposure to the consumers, 3) the differences between the product’s interested consumers and a specific user, and 4) statistics about their products (ratings received, positive-negative feedback, exposure).

System Administrator: Detailed explanations about the specific model variables that were the cause for the recommendation in question [1].

[1] M. Stratigi et al. Why-not questions & explanations for collaborative filtering. In Web Information Systems Engineering – WISE 2020.

[2] Y. Zhang and X. Chen. Explainable Recommendation: A Survey and New Perspectives. Found. Trends Inf. Retr. 14(1): 1-101 (2020).

[3] V. Kaffes et al. Model-agnostic counterfactual explanations of recommendations. In Proceedings of the 29th ACM conference on user modeling, adaptation and personalization, 2021.

[4] J. Tan et al. Counterfactual explainable recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management – CIKM 2021.

Identify and Explain Bias in Recommendation Systems

Recommendation systems (RS) can strongly influence the information we see online, e.g., on social media, and thus impact our beliefs, decisions, and actions. At the same time, these systems can create substantial business value for different shareholders. Given the growing potential impact of such systems on individuals, organizations, and society, questions of fairness have gained increased attention in recent years.

Tasks: Employ a state-of-the-art recommendation system, identify the bias present in it, and provide a consumer-friendly explanation of its origins.

Identify bias: Biases in RS are typically divided into data bias and model bias.

Data bias can come from data generation, collection, storage, etc.
Model bias can arise from biases in model designing, training, and evaluating processes, such as the biased model architecture, improper use of specific optimization methods or estimators, and inappropriate benchmarks.

Provide explanations: After identifying the bias, provide a user-friendly explanation about its origin.

E. Pitoura, K. Stefanidis and G. Koutrika. Fairness in Rankings and Recommendations: An Overview. The VLDB Journal, 2022.

Yongfeng Zhang and Xu Chen. Explainable Recommendation: A Survey and New Perspectives. Found. Trends Inf. Retr. 14(1): 1-101 (2020)

Li, Yunqi, et al. Fairness in recommendation: Foundations, methods, and applications. ACM Transactions on Intelligent Systems and Technology 14.5 (2023).

Harini Suresh and John V Guttag. 2019. A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002 2.

Alessandro Castelnovo, Riccardo Crupi, Greta Greco, Daniele Regoli, Ilaria Giuseppina Penco, and Andrea Claudio Cosentini. 2022. A clarification of the nuances in the fairness metrics landscape. Scientific Reports 12, 1.

Personalized summaries using LLMs and RAGs

There have been several works focusing on summarizing knowledge graphs. However, the development of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) frameworks, leaves an unexplored direction on how these novel AI techniques could be combined with existing works to increase the quality of the generated graph summary and potentially make it personalized. Further, the interrelation of textual summary with a summary graph is currently unexplored and could be further investigated.

Giannis Vassiliou, Nikolaos Papadakis, Haridimos Kondylakis: SummaryGPT: Leveraging ChatGPT for Summarizing Knowledge Graphs. ESWC (Satellite Events) 2023: 164-168

Sejla Cebiric et al. Summarizing semantic graphs: A Survey. VLDB J. 28(3): 295-327 (2019)

Topics on Data Analytics Applications in Agriculture and Health Technology

Contact: Tarmo Lipping, tarmo.lipping [at] tuni.fi

Recording and analysis of physiological data in real-life environment

1. Brain Computer Interfaces using consumer-oriented EEG devices

The thesis project will involve developing an environment of realt-time analysis of EEG signal. The signal will be obtained using a consumer-oriented device such as MUSE headband, for example. The analysis results will be used to control an external environment such as a computer game, robotic arm or music generation software. Also, interface between brain function and generative AI models can be considered. The specific setup for the experiments will be determined at the beginning of the project. The work is related to the EHEÄ (Well-being, vitality and smart services through experience production and technology), LuovAIn! (http://luovain.ai) and MindFlow (https://www.tuni.fi/en/research/mindflow-measuring-flow-state) projects of the DAO
research group.

2. Assessing inter-subject brainwave synchronization during joint tasks

There is a lot of evidence that if two or more subjects are engaged in joint activity, their brainwaves get synchronized. These kind of recordings are called hyperscanning. We have previously collected some hyperscanning data using the MUSE and g.tec Unicorn devices. The thesis project will involve carrying out a more systematic data collection in hyperscanning setting. The more specific tasks and research questions will be determined at the beginning of the thesis project. The project will also deal with testing and evaluating methods to detect synchronization in time series.

3. Assessment of the physiological effects of cultural and recreational interventions

It is commonly accepted that cultural experiences have positive impact on mental well-being and recovery from stress. However, evidence to support this claim is still lacking. The thesis project will involve designing and implementing a body sensory system tailored for the assessment physiological effects of cultural experiences. In addition, the project may deal with using physiological data to augment cultural experiences by, e.g., using performers' physilogical data in controlling visual or audiotory environment during cultural performance. A specific experimental setup, determined at the beginning of the thesis project will be used. Thesis project will involve data recording and preliminari analysis.

Related references:

Beiramvand, M, Shahbakhti, M, Karttunen, N, Koivula, R, Turunen, J & Lipping, T 2024, 'Assessment of Mental Workload Using a Transformer Network and Two Prefrontal EEG Channels: An Unparameterized Approach', IEEE Transactions on Instrumentation and Measurement. https://doi.org/10.1109/TIM.2024.3395312

T. Lipping and M. Beiramvand, "Assessment of Mental Workload in Real-Life Setup using EEG Synchronization Measures," 2024 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0 & IoT), Firenze, Italy, 2024, pp. 412-416, doi: 10.1109/MetroInd4.0IoT61288.2024.10584156.

Beiram Vand, M, Shahbakhti, M & Lipping, T 2023, Cross-Entropy-Based Assessment of Mental Workloads Using Two Prefrontal EEG Channels. In 2023 IEEE EMBS Special Topic Conference on Data Science and Engineering in Healthcare, Medicine and Biology. IEEE, pp. 49-50, International Conference on Data Science and Engineering in Healthcare, Medicine & Biology, St. Julians, Malta, 7/12/23. https://doi.org/10.1109/IEEECONF58974.2023.10404709

Hakim, U. et al. 2023 'Quantification of inter-brain coupling: A review of current methods used in haemodynamic and electrophysiological hyperscanning studies. NeuroImage, vol 280, 120354, https://doi.org/10.1016/j.neuroimage.2023.120354

Applied machine learning, computational models and statistics in agriculture

1. Remote sensing data in crop field characterization and zoning

Crop fields are monitored using remote sensing observations, often from satellites and drones. These result in multiband images with varying spatial and spectral resolutions. Using these images we can follow the progression of the growing season, identify differences within and between crop fields and
identify patterns and anomalies. Find ways to divide fields into meaningful zones based on remote sensing images and other relevant data sources, to support field activities for precision field management and targeted information gathering.

2. Topological Data Analysis in Agriculture

Topological Data Analysis (TDA) is an emerging set of tools for analysing complex, multi-dimensional data. TDA has solid mathematical foundations and an existing set of mature software tools, yet it has not yet widely been used with agricultural data. Explore the capabilities of TDA with agricultural datasets such as remote sensing imagery, in extracting meaningful features with predictive power. Demonstrate how to use TDA as a standalone data analysis tool or how to combine it together with existing machine learning methods in solving practical problems.

3. Simulation models for crop growth

Dynamic crop models represent the growth process of a plant from seeding to maturity. These kind of models aim to encode scientific knowledge about the relevant natural processes, allowing us to explain and predict the growing behavior, given the environmental conditions during the season. Process models typically need to be calibrated to local conditions and their behavior analysed statistically, due to the many uncertainties involved. Analyze and calibrate the behavior and results of crop growth models using statistical methods such as Monte carlo simulations and sensitivity analysis, to support tasks such as biomass estimation and irrigation planning.

Topics on Urban Physics

Contact: Jonathon Taylor, jonathon.taylor [at] tuni.fi

Image processing and analysis for building energy and climate adaptation modelling.

Building physics models can be used to simulate the energy and climate resilience of a building, which is essential to understand how we can adapt our buildings to be more resilient to the future climate and energy efficient. However, most studies in Finland rely on example case study buildings, or example buildings which do not represent the true range of the building stock. This proposal seeks to scrape floorplan data from real estate websites (for example Etuovi), vectorise the floorplans, and classify rooms according to their type. It will then explore intelligent/AI ways to aggregate floorplans into a representative ‘average’ buildings that we can use for simulations

Using distributed networks of low cost citizen science sensors to identify exposures to environmental hazards.

The growth of IoT-connected sensor devices (e.g. Netatmo personal weather stations or low cost air pollution sensors) means there is now vast amounts of citizen-acquired data on environmental conditions in cities for a more dense network that the official stations we have normally relied on. This big data can reveal information on how we can design our urban environments to be more healthy and climate-resilient, detect sudden events like fires, and understand environmental risks in cities (e.g. Taylor et al, 2024). But the data is vast and can be very messy and unreliable. This project would use this big data to clean and help identify, e.g signals in this data that can help us answer important questions about our cities.

Taylor, J., et al (2024). The potential of urban trees to reduce heat-related mortality in London. Environmental research letters, 19(5), 054004.

Household air pollution data aggregation and Bayesian hierarchical modelling

Air pollution from the household burning of polluting fuels (such as solid wood, coal, etc) is estimated to lead to over 3 million premature deaths worldwide every year. However, these estimates, from for example the World Health Organisation, are limited because they only consider exposures to those burning the fuels, not the wider population – for example those who do not burn these fuels, but breathe the polluted air from them (e.g. Mohajeri et al, 2023). This topic will use a database of aggregated research studies, link them to the nearest measurement station, and develop a Bayesian Hierarchical model for household air pollution exposure. Alternative calculations will examine how much this exposure could be reduced if populations switched to cleaner fuels.

Mohajeri, N., et al. (2023). Urban–rural disparity in global estimation of PM2· 5 household air pollution and its attributable health burden. The Lancet Planetary Health, 7(8), e660-e672.

Urban greenery in Finnish Cities

Greenspace in cities have proven benefits for the environment, such as storing and sequestering carbon, potentially reducing air pollution, and lowering heat exposures. They also provide mental health benefits for residents. Studies into urban greenspace can often rely on satellite imagery or city surveys of public property. But what about what we actually see from the ground? What is the greenest city in Finland? What is the greyest? This project would assess greenery in Finnish cities using streetview imagery (e.g. Lu et al, 2023) helping to understand the availability and different types of greenspace.

Lu, Y., et al. (2023). Assessing urban greenery by harvesting street view data: A review. Urban Forestry & Urban Greening, 83, 127917.

ReDS Day 2026: Data Science PhD & Postdoc Research Day

Date: May 15, 2026

Location: TAU Konetalo K1703

ReDS Day 2026 brings together PhD students and postdoctoral researchers in Data Science to share ideas, discuss ambitious research directions, and strengthen our research community. The event is open to all PhD students and postdoctoral researchers working in Data Science at Tampere University.

Fresh Thinking Talks [9.00-11.00]

The Data Science PhD & Postdoc Research Day will feature a Fresh Thinking Talks track, with 10-minute presentations delivered without interruptions for questions. Each talk highlights a core research idea, whether a technique, methodology or technology, that a young researcher (PhD student or researcher up to four years after the PhD) wishes the wider community knew about.

We invite abstracts that put forward visionary ideas, describe challenging problems, or share recent results and lessons learned. Findings may be preliminary, but they should be clearly articulated and sufficiently substantiated.

To propose a talk, please send a title and an one-paragraph abstract by email to konstantinos.stefanidis [at] tuni.fi
Abstract submission deadline: April 10, 2026

Introductory Workshop on Designing High-Ambition Research Ideas [11.30-13.30]

With the support of Anne Vilenius, coordinator of the ITC Research Incubator

Securing research funding is an essential, and often challenging, part of an academic career. Have you already thought of some funding schemes, or even already applied? Or is this all quite new to you? No matter, there will be something useful in the session in either case. There are so many different funding schemes available, so why not use the most ambitious as example – ERC, European Research Council’s individual grants, and within the Finnish landscape the Research Council of Finland’s Academy Research Fellowship grant. The session will provide a very high-level introduction to both, with key points relevant for Your career stage. We will try selected ideation tools, as whole group and within smaller discussion rounds, finishing with notes on the relevant support groups at Tampere University, relating support services and courses. While you might well not apply to these specific funding schemes in the very near future, the way of thinking required for these is similar to other funding schemes and thus the exercises we will do are easily transferable. Welcome!

Panel Discussion [14.30-16.00]

From Good Research to Research Leadership: How to transition from producing papers to shaping research agendas?

This panel explores the transition from conducting high-quality research to taking on a leadership role in shaping research agendas. Bringing together perspectives from academia, industry, and research institutions, the discussion will highlight how researchers can expand their impact beyond publications, influence directions in their fields, and contribute to the development of future research and learning environments.

Moderator: Fahad Sohrab Postdoctoral Research Fellow at Tampere University and the University of Eastern Finland, and Vice-Chair of IEEE Finland. His research focuses on machine learning, anomaly detection, and pattern recognition.

Panellists:

Henri Hansen (Tampere University): University Lecturer in Data Science at Tampere University's Computing Sciences unit and a member of the Data Science Research Centre. His research spans graph algorithms, optimization, information networks, and financial data science.

Aysen Degerli (VTT Technical Research Centre of Finland): Research Scientist at VTT, specializing in machine learning and medical imaging. Her doctoral work developed novel machine learning methods for diagnosing myocardial infarction from echocardiography and COVID-19 from chest X-ray images, earning her the Finnish AI Dissertation Award 2024.

Iftikhar Ahmad (Tieto Finland Oy): Head of R&D Central Function at Tieto Finland Oy. His expertise spans deep learning, object detection, multimedia indexing, and pattern recognition, with applications in IoT and 5G/6G connectivity on big data. He leads Tieto's research collaboration on trustworthy AI and digital society platforms.

Farhad Pakdaman (Nokia): Technical Analysis and Patent Manager in Multimedia SEP at Nokia, based in Tampere. A former MSCA postdoctoral fellow at Tampere University, his research background covers energy-efficient video compression algorithms and power-constrained multimedia systems.

Tentative Program

9.00-11.00 Fresh Thinking Talks

Kirsi Sandberg: Information and Information Structures in Parliamentary Records
Nadeesha Perera: Towards Reliable and Explainable Medical AI with Domain-adaptive RAG
Maria Stratigi: What If? Counterfactual Explanations for Understanding Recommendations
Ali Jedari Heidarzadeh: Explainability in Retrieval-Augmented Generation
Vidhi Agrawal: Explainable AI for Autonomous Vehicles
Reza Shafiloo: Multisided Fairness in Recommender Systems
Mehmet Yamac: Axiomatizing Neural Networks: Toward a Geometric Foundation of AI
Sanaz Nami: Just Noticeable Temporal Difference for Perceptual Video Coding
Özer Devecioglu: Blind Underwater Image Restoration using Co-Operational Regressor Networks
Harish Kaushik: Hierarchical Bayesian Modelling for Predicting Component Failure
Atte Pietarinen: Calibrating Approximate Bayesian Computation Credible Intervals
Tommaso Ruga & Maira Aracne: Beyond Accuracy: Fair, Explainable, and Sustainable AI in Human-Centered Domains

11.00-11.30 Coffee Break

11.30-13.30 Introductory Workshop on Designing High-Ambition Research Ideas

13.30-14.30 Light Lunch Break

14.30-16.00 Panel Discussion: From Good Research to Research Leadership: How to transition from producing papers to shaping research agendas?

Master’s Thesis Supervision in Data Science

If you are interested in completing your master’s thesis on a topic aligned with the research interests of the Data Science Research Centre, please first review our research activities, which are organized around four core themes: (i) Fundamental Machine Learning, (ii) Explainable & Fair Artificial Intelligence, (iii) Complex Systems & Network Science, (iv) Applied Machine Learning & Statistical Modelling.

https://www.tuni.fi/en/research/data-science-research-centre

After that, please send us an email at henri.hansen [at] tuni.fi including:

The theme that best matches your research interests, and
The names of three potential supervisors from the team associated with that theme.

This information will greatly help us in identifying the most suitable supervisor for you.