Analysing big simulation data on the fly
The overall objective of this master project is to use parallel processing and machine learning for in-situ real-time analysis of big data produced by numerical simulations.
Time history of the electrical potential propagation within a heart tissue. To be able to precisely locate and categorise the wave front is an important task of data analysis for such simulations.
Computer simulation has become an indispensable tool in Science. Often, the intricate mechanisms and details found in the research subject will require very high resolutions of the computer simulation, which has to be executed on large-scale parallel computers. Consequently, the generated simulation data may be of a huge size. The traditional way of analysing such simulation data is to first store them to files, which are then loaded and processed by a separate post-processing procedure. Such a "post-processing" approach is, however, not compatible with the big-data age in Science.
Due to both the capacity and speed of today's file systems, the traditional post-processing approach of simulation data analysis can easily become a serious bottleneck. This master-degree project aims to investigate about analysing the simulation data "in-situ", that is, within the same computer program, as soon as the simulation data are generated. Such an "on-the-fly" approach avoids storing and loading huge data files. Important research ingredients of the project include parallel programming (at various levels), automatic identification of features of interest, data compression, and use of secondary memory systems (such as non-volatile memory). Machine learning will also be attempted for speeding up feature identification. Numerical simulators of the human heart will be used as the research test cases for this project.
The candidate will learn about parallel programming, use of secondary memory systems, simple techniques of data subsampling and compression (for drastically reducing the data storage need), as well as basic methods of machine learning. The candidate will, in particular, research about strategies for efficient identification of features of interest within huge 3D volumes of time-dependent data. The candidate will also get the chances to be familiarised with important scientific simulations in the domain of computational cardiology. Upon successful finish of the master-degree project, the candidate will possess cutting-edge knowledge and skills for real-time analysis of "big simulation data".
The candidate is expected to be skilful in technical programming (experience with parallel programming is not required, but will allow a quicker start of the project). The candidate is also expected to have some basic knowledge about numerical methods. Most of all, the candidate must be hard-working and eager to learn new knowledge and skills.