AliceVision: How efficient is OpenACC for GPU programming?
Mixed Reality is merging virtual elements in a natural way into the people's perception of the real world. Mixed Reality is something that we have come to consider quite normal in movies, where tiny film sets are extensively extended and modified using hundreds of working hours and large compute clusters. Can we provide an open source solution that works on a wide variety of computers using the OpenACC language extensions to provide good readability efficient the parallelization? How to combine OpenACC and CUDA?
Depth mapping stage in the AliceVision Meshroom
A lot of parallel processing work these days relies on large compute clusters to solve data-intensive problems. What we frequently forget is that desktops and laptops are actually quite capable of solving moderately hard problems when we really use their resources.
OpenACC is a language extension that can provide compiler-assisted parallelization on CPUs, GPUs and Xeon Phis, as well as parallelization that uses CPU and GPU at the same time. However, it is meant for ease of use, not for ultimate performance. We want to know the performance price that we pay for this ease of use, using a free and open source 3D reconstruction software as an example.
In two stages of AliceVision Meshroom, an open-source 3D reconstruction software, we have contributed CUDA code to make 3D reconstruction from photos on a single computer feasible. Unfortunately, this ties the very important DepthMapping stage of Meshroom to computers with NVidia cards.
OpenACC is supported by many researchers and practitioners, and available for various platforms (although no OpenACC compiler supports all parallel programming platforms).
OpenACC appears capable of providing an alternative to our DepthMapping implementation, with an unknown performance cost. An OpenACC version would (at least) lose the ability to use the GPU's texture engine, but other challenges remain to be uncovered.
The goal of this project is to replace CUDA kernels with an OpenACC formulation throughout the DepthMapping code, and evaluate the performance penalty.