Using Machine Learning for Data Curation
Data is a core part of every business and therefore the quality of the data that is gathered, stored and consumed is crucial. Nevertheless data is often imperfect such as due to missing, incorrect, and duplicate information. The main purpose of data curation is to ensure that data is reliably retrievable for future research purposes or reuse. Data curation process in this respect important to ensure quality of data; however, manual data curation is a tedious work. In this thesis, you will investigate Machine Learning approaches to improve the curation of structured (SQL databases) and/or semi-structured data sources (e.g., JSON and CSV).