Implementing SPARQL in Free Software RDBMSes
Good old relational databases haven't been very successful backends for SPARQL engines for various reasons. However, since they have a huge user base (like, millions and millions of Android devices), they are worth a fresh look. This thesis is about choosing a database to fix!
Traditionally, SPARQL was implemented on the top of relational database systems like MySQL, PostgreSQL or SQLite. This wasn't terribly successful as it was universally rather slow, and the world has mostly moved on to SPARQL and RDF-specific database. Using the RDBMSes still has some merit, however, first and foremost because they are very widespread, each of these systems has a huge user base and many competent people rely on them every day. SPARQL adoption may be accelerated if people could use it on the top of systems they already use with little loss of performance. In principle, there is little that should have such a detrimental performance impact as has been seen in the past. The main reason has been that RDBMSes has made assumptions in their optimizations algorithms that have been invalid for SPARQL, mainly around self-joins. This thesis is about choosing a database to fix!
SQLite is possibly the most interesting database to work on, mainly because it has no alternative in the "small database" world. It is very small and can be found on small devices. It's been on all Android devices since the dawn of the platform. This is potential high-impact work!
SQLite version 4 is currently in the works and SQLite 3 is also still being maintained.
Up to now, SPARQL queries have been evaluated by converting them to SQL queries, but rather than doing this, you should consider evaluating them in the SQLite Virtual Machine directly. This is no easy task, but certainly a very interesting master's thesis!
With the recent release of HandlerSocket, it is possible to get more direct access to MySQL storage engines, and thus it might be possible to exploit MySQL features more directly for SPARQL evaluation. MySQL is still a simple hash-join engine, but very popular and for that reason interesting to look into!
PostgreSQL has gained a lot of flexibility in version 9, and that has made it possible to take advantage of the optimizations in the query engine that scales well for RDBMS applications to evaluate SPARQL queries more efficiently. It may also be possible to look into the query planner itself to see if tweaks there for SPARQL is all there is needed for decent performance.