Entity resolution is a very common task in Big Data processing, where different entity profiles, usually described under different schemas, are mapped to the same real-world object. Beyond the deduplication and cleaning problems that appear in traditional data integration, such as data warehouses, entity resolution is a prerequisite for many Web applications, posing several challenges due to the volume and variety of the data collections. In general, entity resolution constitutes an inherently quadratic task; given an entity collection, each entity profile must be compared to all others.
In this course, we will focus on algorithmic approches for entity resolution in the Web of data. We will study approaches that aim to reduce the set of possible comparisons to be performed between data collections, like blocking and meta-blocking, and approaches that aim to minimize the number of missed matches via an iterative entity resolution process that exploits any intermediate results of blocking and matching in order to discover new candidate description pairs for resolution. Moreover, we will discuss works on progressive entity resolution, which attempt to discover as many matches as possible given limited computing budget, by estimating the matching likelihood of yet unresolved descriptions, based on the matches found so far.
Learning outcomes
After completing the course, the student is expected to: - know the basic concepts and techniques for big data entity resolution, including blocking and meta-blocking techniques, and techniques for iterative and progressive entity resolution, - be able to handle contemporary research issues and problems on big data entity resolution, and - solve real-world problems.
Contents
Blocking techniques, meta-blocking techniques, techniques for iterative entity resolution, techniques for progressive entity resolution
Teaching language
English
Modes of study
Option
1
Available for:
Degree Programme Students
Other Students
Open University Students
Doctoral Students
Exchange Students
Participation in course work
In
English
Lectures, exercises, student presentations in class, programming project. Participation in course work.