Predictive, Intelligent Cleaning and Transformation of Data
Data analysis activities are becoming increasingly important in today's organizations, and incorrect or inconsistent data may distort analysis and compromise the benefits of any data-driven approaches. In this context, data preparation activities such as data cleaning and transformation can improve data quality. Data cleaning and transformation processes are most often a domain specific problem that focus on the statistical properties, semantics and structure of data. Traditional data preparation approaches require the user to manually specify how the data will be cleaned and transformed.
A solution to this time and cost consuming process, could be to apply an intelligent approach to the data cleaning and transformation problem that automatically suggests relevant and effective data cleaning and transformation actions to the user. The system will intelligently predict the next interaction, and the user is involved to judge whether the next step is relevant or needs to be modified.
What is the context of the thesis?
The thesis will explore data cleaning and transformation approaches to identify the applicability of various solutions for intelligent, automated data cleaning and transformation. This work includes exploring state-of-the-art in machine learning techniques that can be applied to improve data quality in different domains (e.g. finance, air quality measurements or real estate).
The thesis will address some or all of the following questions:
- Which approaches are applicable within the context of predictive, intelligent data cleaning and transformation?
- How can predictive, intelligent approaches to data cleaning and transformation be applied in different domains?
- How can predictive, intelligent approaches to data cleaning and transformation be applied with different data sources (e.g. stream or batch data)?
- How can the process of data cleaning and transformation be automated?
What are the practical aspects of the thesis?
The practical side of the thesis will involve the development of a prototype for predictive, intelligent data cleaning and transformation. This will include the development of a web service for a selected subset of data cleaning and transformation functions, domains, and data sources.
Who is the thesis for?
Students that are passionate about intelligent approaches to development, machine learning and web-service technologies, and want to contribute to state-of-the art within data cleaning and transformation approaches.