Data Science related MSc topics

The types of MSc theses that I could offer fall within three categories:

  1. Project specific: the focus is here is in management/analytics of domain specific data (e.g. company information), this means the thesis will focus on specific datasets and will try to obtain insights in various forms from the datasets
  2. DataGraft related: the focus here will be to develop/validate generic technology, mostly related to big data scalable solutions for working with large datasets; tasks will be more software design/engineering in nature
  3. Data exchange related: the focus here will be to develop/validate technology specifically related to data exchange; tasks will be technologically exploratory, involve some design/engineering work


Project specific topics can be related to the following projects:

A number of thesis topics could be defined as part of this project, for example:

  1. Matching companies appearing in various sources (challenge here would be to come up with a mechanism to match fuzzy/ambiguous representations of companies)
  2. Onboarding/Population of the graph with data (challenge here would be related to big data system and mapping data to the core ontology)
  3. Search / faceted search / visualization for companies in the graph (challenge here would be related to identify cases for use of company data from the graph and find proper ways to search/browse/visualize the data)
  4. Data analytics on the data in the graph (challenge here would be related to find analytics tasks to be done on the data, e.g. all sorts of graph algorithms such as centralities, data quality, etc, and implement them on the company data)
  5. Data marketplace for company data (challenge here would be related to finding supply/demand for company data and implementing an ecosystem for trading company data)
  • EW-Shopp: - a project for data analytics in the eCommerce domain. It’s similar to the euBusinessGraph, the difference here is that we talk about product data (vs company data in euBusinessGraph). Many of the thesis topics I mentioned above for euBusinessGraph could apply here for product data as well (though access to data is more restrictive in this case – you can see info about datasets at
  • TheyBuyForYou: - a project related to procurement data (also linked to company information). MSc topics similar to those for euBusinessGraph, with a focus on public contracts with companies


DataGraft related topics are centered around DataGraft – a tool for data cleaning / transformation / knowledge graph generation which is used in the above mentioned projects:

A number of thesis topics could be defined as part of this stream, they’d be more generic components (not specific to some domain, e.g. company information) and focus will be more on engineering (big data), for example:

  1. Design and implementation of scalable backend for data transformations (big data system for executing transformations on large amounts of data)
  2. Intelligent data cleaning mechanisms (how to make it easier and more intuitive to people to apply transformations on data)


Data exchange related topics focus around the concept of trusted data exchange, here I can give you a couple of pointers:

Thesis topics could be related to testing out and evaluating technologies for data exchange/transactions, possibly developing new technologies or extending emerging ones since the area is quite new (a combination of technologies for data storage/cryptography/payment would be needed here).



Emneord: data science, data sharing, data exchange, data management, knowledge graphs
