Data integration (or data interoperability) is still a major problem in industry that creates a lot of overhead in digitalization projects. Relevant examples of data integration problems include entity matching, (i.e., linking records to entities), and schema alignment (i.e., aligning types and attributes from multiple sources).
Tailored made solutions are currently implemented and deployed for particular problems, projects or organisations. These solutions are expensive to develop and maintain, and they are not suitable to be generalized to support a larger range of problems and projects.
Large companies such as Amazon, Google, Apple and IBM are applying deep learning algorithms and semantic technologies (e.g., knowledge graphs, ontologies and graph databases) to enable a higher degree of automation for data integration problems. However, these techniques are still not known or adopted by many companies and public institutions.
The goal of this project is to explore the applicability of deep learning techniques and semantic technologies to solve data integration problems in real national or European projects aiming to create digital twins for different domains such as Energy, Manufacturing, Maritime and Biology.
Candidates should have a good understanding on deep learning techniques, data engineering, and semantic technologies. Moreover, it will be recommended some experience programming in Python with libraries for data processing (e.g., Pandas, SQLAlchemy, etc.), data analytics (NumPy, Scikit-learn, TensorFlow, PyTorch, etc.) and data visualisation (e.g., Matplotlib, Seaborne, etc.).
Some relevant courses at UiO: TEK5040, IN3060, IN2090, IN5800 and IN3110.
Li Y., et al., 2020. Deep entity matching with pre-trained language models.
Li Y., et al., 2021. Deep entity matching: Challenges and opportunities.
Tan W.C., 2021. Deep Data Integration
Weikum G. et al., 2021. Machine knowledge: Creation and curation of comprehensive knowledge bases.