Doctoral dissertation

Tommi Rantapero: Developing a Framework for Analysis of Next-Generation Sequencing Data in Cancer Genetics and Epigenetics

The development of Next-generation sequencing technology has opened up new possibilities in the field of biomedical research. This novel technology has been widely applied in cancer research to study various aspects of this complex disease. However, efficient algorithms, statistical methods and various databases are needed to be able to harness the massive amounts of data being produced by this technology. Such computational methods are applied in bioinformatic tools, which in turn are integrated into analysis frameworks which can be used to answer various biological questions.

The first aim of the study of M.Sc. Tommi Rantapero was to develop a bioinformatics framework for analysis of Next-generation sequencing data in order to discover and characterise germline variants associated with hereditary cancer. This framework was applied and developed further in three studies.

In the first study two loci 2q37 and 17q11.2-q22, which have been previously associated with prostate cancer, were sequenced and the variants were characterised by conducting an association study within in a larger set of individuals.

In the second study individuals with breast cancer and/or ovarian cancer, which are not known to carry BRCA1/2 germline variants, were sequenced using Whole Exome sequencing in order to discover candidate genes associated with cancer susceptibility.

In the third study, Finnish and Swedish individuals with lethal prostate cancer were sequenced using Whole Exome sequencing and compared against cases which were not deemed lethal based on the aggressiveness of the disease and population controls in order to uncover genes associated with the extremely aggressive form of the disease.

The second aim was to extend the established framework for integrating data from several NGS applications to uncover the role of both genetics and epigenetics in cancer development. This extended framework was applied in two studies.

Firstly, the framework was applied in the first study to characterise the regulatory potential of the non-coding variants located in 2q37 and 17q11.2-q22. Secondly, the framework was applied to analyse and integrate RNA-seq and Dnase-seq data in order to study BMP4 response in two breast cancer cell lines. The overall aim of this study is to gain insight into how epigenetic factors and transcriptional regulators mediate the effects of BMP4 stimulus.

By utilising the developed framework low to moderate risk variants significantly associated with prostate cancer were discovered in HDAC4 and ZNF652. Moreover, the individuals with breast and/or ovarian cancer were found to have enriched number of pathogenic variants in ATM, MYC, PLAU, RAD1 and RRM2B suggesting that these genes may be associated with cancer susceptibility.

Finally, the framework discovered variants likely to be associated with extremely aggressive prostate cancer and comparison of carrier rates of these variants revealed that among the Finnish and Swedish populations ATM and CHEK2 seemed to be strongly associated with extremely aggressive prostate cancer.

Interestingly, in BRCA2 which has been shown to have the strongest association to aggressive prostate cancer in previous studies did not harbour likely pathogenic variants among the lethal cases.

The extended framework revealed non-coding variants which are associated to gene expression (eQTL variants) of which one targeted TBKBP1 that was also shown to be differentially expressed between affected individuals and controls. Moreover, this variant has been reported as an eQTL by previous studies.

Another putative eQTL variant was found to be associated with ZNF652 which was also shown to be associated with prostate cancer based on coding variants harboured by the gene which had been observed in the same cohort.

Moreover, the use of the extended framework in the integration of epigenetic and transcriptomic data revealed that BMP4 response genes are dependent on the epigenetic profile and that transcription factors MBD2, CBFB ja HIF1A have a role in the regulation of some these target genes. Furthermore, BMP4 stimulation was shown to cause varied responses in the epigenetic profiles of the different breast cancer cell lines which are consistent with findings related to the behaviour induced by the stimulation.

In conclusion, the framework developed for analysis of germline variant data identified novel candidate genes as well as variants associated with hereditary prostate, breast and ovarian cancer. The extended framework identified eQLTs which might be associated with the development of prostate cancer. Moreover, epigenetic alteration as well as transcription factors involved in cancer progression were characterised utilising the developed framework.

The doctoral dissertation of M.Sc. Tommi Rantapero in the field of bioinformatics titled Developing a Framework for Analysis of Next-Generation Sequencing Data in Cancer Genetics and Epigenetics will be publicly examined in the Faculty of Medicine and Health Technology at Tampere University on Friday 16 October starting at 12 o'clock in Arvo building auditorium F115, Arvo Ylpön katu 34. The Opponent will be docent Esa Pitkänen from University of Helsinki. The Custos will be Professor Matti Nykter.

Because of the coronavirus pandemic the event can be followed via remote connection

The dissertation is available online at