Oppgaven er ikke lenger tilgjengelig

Semantic Technologies and Identity Resolution

Often information about the same entities (people, towns, songs, etc) is represented in different databases. These might be within one corporation or spread over the Web. In many cases, it is different to link this information, due to the lack of common identifiers across data sets.

In this thesis, you will investigate automated, heuristic methods for linking datasets, and infrastructure to extract information from a multitude of sources linked in this way.

Identity Resolution (IR) is the process of identifying which entities mentioned within one or several datasources, are actually identical. Examples include

identification of identical media files on one server,
identification of individuals between a customer database and an employee database,
identification of entities between public information sources like musicbrainz, Wikipedia, etc.

From a semantic technology standpoint, IR can be used to establish a dataset of “owl:sameAs” triples. But existing tools (local and federated query engines) don't currently make use of such identity information.

This thesis is about

evaluating existing IR technology for establishing identification data sets.
enhancing existing query answering technology to make use of identity information.

Work may be carried out in connection with the SIRIUS centre (www.sirius-labs.no), giving opportunities to interact with various industry partners.

Emneord: semantic technologies, identity resolution, quantum physics

Publisert 25. sep. 2013 15:50 - Sist endret 27. sep. 2018 11:40

Veileder(e)

Martin Giese Universitetet i Oslo

Semantic Technologies and Identity Resolution

Veileder(e)

Omfang (studiepoeng)