High-speed SIFT matching
Mixed Reality is merging virtual elements in a natural way into the people's perception of the real world. Mixed Reality is something that we have come to consider quite normal in movies, where tiny film sets are extensively extended and modified using hundreds or working hours and large compute clusters. How can we bring real-time speeds to Mixed Reality of comparable quality?
All of our work on Mixed Reality (MR) is associated with the AliceVision project, which you can find on GitHub: https://alicevision.github.io/.
One big challenge in rendering virtual elements into a real-world view to achieve MR is a real-time understanding of depth from the user's point of view.
Let's say that the end user is wearing an AR headset like the HoloLens. The virtual elements should be placed in a natural position at their natural size into the user's field of view. This has in principle been solved by AR applications on the mobile phone. They are capable of understanding the device's movement and are fairly good at keeping the virtual object at the same location in real space. They do, however, miss an important capability. They are no able to figure out whether a real-world object should be in front or behind the virtual object.
For this, we need a really fast segmentation of objects in the real world, to determine which pixels in the MR overlay should be in front of the real-world view, and which should be hidden
We offer several theses to explore a variety of ideas that could achieve such real-time depth detection:
(1) Depth estimation using dual cameras and block matching
This is the simplest algorithm, which looks for regions in the two cameras' images that are very similar, matches them, and estimates their depth base on the edge pixels. This approach is supposed to be very fast, but it does not consider the case where surface and virtual objects intersect. Finding means of extending this algorithm to suitable depth understanding is the topic for the first thesis idea.
The current state-of-the-art combines block matching with machine learning to establish an understanding of depth in a known kind of scene.
(2) Depth estimation using point matching
The algorithm Scale-invariant feature transform (SIFT) is frequently applied in computer vision and a variety of multimedia tasks. Having existed since 2004, it still comes out as the winner in comparison with newer algorithms most of the time. It can find pixels representing identical points in space with good precision and makes it easier to understand depth all over the field of view.
The challenge of SIFT has always been that it is complex and rather slow. We use parallel programming on the GPU to tackle this challenge. This thesis should be based on our existing code http://github.com/alicevision/popsift and add high-speed depth matching of points that are found in each of the two camera's images.