SIRIUS has an important role to play in data science
Successful data science requires a mixture of statistics, computer science and domain knowledge. In DataScience@UiO, we bridge the first of these gaps by bringing statisticians and computer scientists from the Factulty of Mathematics and Natural Sciences together, and the other by bringing in industry-heavy research centers such as BigInsight and SIRIUS. In this blog-post, I want to say something about why I believe SIRIUS has a very important role to play in the current and future data science engagement at the University of Oslo.
The SIRIUS Centre for Scalable Data Access in the Oil and Gas Domain brings together researchers with partners from operating companies, service vendors and the IT industry, providing real-world problems that the research aims to tackle. The researchers come from several IT-disciplines, including high-performance and cloud computing, database technology, semantic technologies, formal methods and natural language processing. These researchers work together in cross-disciplinary projects to solve problems that are pressing for data scientists:
Scalable data access
Providing research based methods and technologies for new and better ways to access large and diverse amounts of data spread across different data sources, in a structured manner. This is imperative for doing data science—especially when many data scientists claim to spend as much as up to 80 % of their work day on data acquisition and data wrangling.
Doing data science on big data requires significant computing power. SIRIUS does research on high-performance computing and cloud computing through focusing on how specialized hardware, like processors and switches, can increase performance of data access and, hence, analytics. Also, research is being done on how to automatically optimise resource utilisation in large scale [cloud] computing. Both of which provides a boost for advanced analytics and machine learning on ever growing data sets.
Natural language processing
A large part of all information is stored as unstructured data in the form of free text in documents. Research on semantic processing of unstructured data, especially documents and free text, provides methods and technology for structured information retrieval and machine learning on top of document based data.
Knowledge representation provides tools and techniques for deterministic reasoning about data. This complements statistical inference, which provides a basis for probabilistic reasoning about data. Where statistical inference extracts its knowledge about the world from the data alone, knowledge representation uses ontologies, which are formal descriptions of the world, and allows us to infer facts about and from the data from the viewpoint of the ontology. In SIRIUS, there is ongoing research trying to combine the two approaches to gain deeper insights from the data than can be done in each separate way.
Execution modeling and formal methods
The level of automation in the industry is assumed to increase dramatically over the years to come, in the wake of better and better machine learning and AI. However, both neural networks and statistical learning suffers from a lack for determinism, which makes this challenging in critical applications. SIRIUS is conducting research on formal methods and process simulation, offering a layer of safety through proving that no suggestion ever promoted by the AI will lead to a dangerous situation, and thereby lowering the threshold for utilisation of AI in critical domains.
All of these fields are central building blocks in the data science value chain, and together with statisticians from the Department of Mathematics and BigInsight, this positions DataScience@UiO perfectly for doing novel and ground breaking research within the field of data science.