AliceVision: GPU programming for real-time depth estimation

Mixed Reality is merging virtual elements in a natural way into the people's perception of the real world. Mixed Reality is something that we have come to consider quite normal in movies, where tiny film sets are extensively extended and modified using hundreds or working hours and large compute clusters. How can we bring real-time speeds to Mixed Reality of comparable quality?

Bildet kan inneholde: violet, purple, font, skjermbilde.

All of our work on Mixed Reality (MR) is associated with the AliceVision Association, which you can find on GitHub: We make it affordable to create 3D environments from photos or videos, and to track the position of cameras while the move through a reconstructed environment.

One big challenge in rendering virtual elements into a real-world view to achieve MR is a real-time understanding of depth from the user's point of view.

Let's say that the end-user is wearing an AR headset like the HoloLens. The virtual elements should be placed in a natural position at their natural size into the user's field of view. This has in principle been solved by AR applications on the mobile phone. They are capable of understanding the device's movement and are fairly good at keeping the virtual object at the same location in real space. They do, however, miss an important capability. They are not able to figure out whether a real-world object should be in front of or behind the virtual object.

For this, we need a really fast segmentation of objects in the real world, to determine which pixels in the MR overlay should be in front of the real-world view, and which should be hidden. We are using the Insta360 in 3D mode as our image source.

We offer several theses to explore a variety of ideas that could achieve such real-time depth detection. This one uses the raw force of GPUs to perform Depth estimation using point matching.

The algorithm Scale-invariant feature transform (SIFT) is frequently applied in computer vision and a variety of multimedia tasks. Having existed since 2004, it still comes out as the winner in comparison with newer algorithms most of the time. It can find pixels representing identical points in space with good precision and makes it easier to understand depth all over the field of view.

The challenge of SIFT has always been that it is complex and rather slow. We use parallel programming on the GPU to tackle this challenge. This thesis should be based on our existing code and add high-speed depth matching of points that are found in each of the two camera's images.


Publisert 14. sep. 2020 10:18 - Sist endret 14. sep. 2020 10:18


Omfang (studiepoeng)