Doctoral dissertation

Dissertation: Regularized machine learning methods can help analyze complex biological data

The advantages of regularization have long been recognized in machine learning. In her doctoral thesis, Sakira Hassan applied regularization to biological data and found useful new insight into a number of open biological questions.

Machine learning approaches play a major role in today’s data-intensive science. Machine learning is used for transforming data into knowledge, providing an overview of the data generation process and predicting the future value by analyzing past data. This surge of knowledge enrichment is also applicable in biological discovery and biomedical science.

In her doctoral thesis, Sakira Hassan studied the integration of regularization in simple machine learning approaches, which are highly relevant for heterogeneous applications in biology.

“Much of biological data is heterogeneous and unstructured, which poses challenges in extracting salient features and statistical relationships. Regularization is a strategy that applies additional constraints, such as a restricted number of parameters, to a machine learning model,” Hassan explains.

Hassan studied regularization approaches from the viewpoint of feature selection capability. She examined performance analysis and the robustness of supervised machine learning approaches in cases where insufficient observations are available to properly estimate the underlying covariance structure.

She observed that the traditional resampling procedure is unstable due to randomness and the retraining process can be quite time-consuming. The measurements in flow cytometry analysis for cancer research, for example, are also expensive to collect.

“There is no cover-all machine learning method to be found and applied to all kinds of biological problems. A regularization strategy can fit a sparse model, however, which eventually reduces the dimensionality of high-dimensional data. These findings highlight the need of finding an alternative approach that is computationally faster as well as more stable and robust in small sample settings,’ Hassan notes.

In her dissertation, Hassan further proposes a novel accuracy metric and derives a closed-form solution of that metric to assess the quality of a model by training the model once.

The doctoral dissertation of MSc (Tech) Syeda Sakira Hassan in the field of signal processing titled Regularization in Machine Learning with Applications in Biology will be publicly examined in the Faculty of Information Technology and Communication Sciences at Tampere University at 12 noon on Friday, 17 May 2019 in auditorium TB109 of the Tietotalo building (address: Korkeakoulunkatu 1, Tampere, Finland). The Opponents will be Professor Pekka Neittaanmäki (University of Jyväskylä, Finland) and Associate Professor Ulisses M. Braga-Neto (Texas A&M University, USA). The Custos will be Associate Professor Heikki Huttunen from the Faculty of Information Technology and Communication Sciences.

Sakira Hassan comes from Bangladesh and currently works as a data scientist at Basware Oy.

The dissertation is available online at the http://urn.fi/URN:ISBN:978-952-03-1085-1