Collaborative Filtering and Sub/Supertypes for Better Recommendations
OptiqueVQS is a user interface to construct complex queries over data described by an ontology. One of the challenges when the ontology is large, is to present possible options to the user in such a way that the most "important" or "relevant" choices are at the top of the list of choices.
We approach this by comparing the query built so far to a log of previously built queries.
Selecting relevant options then becomes similar to recommending movies or books based on users’ known preferences. But we can additionally make use of the graph structure of queries, and the type hierarchy in the ontology.
Data Access. A tremendous amount of data is being generated every day both on the Web and in public and private organisations; and, by all accounts, in this increasingly data-oriented world, any individual or organisation, who posses the necessary knowledge, skills, and tools to make value out of data at such scales, bears a considerable advantage in terms of competitiveness and development. Particularly, in an enterprise setting, ability to access and use data in business processes such as sense-making and intelligence analysis is key for its value creation potential.
Today, however, data access still stands as a major bottleneck for many organisations. This is mostly due to the sharp distinction between employees who have technical skills and knowledge to extract data (i.e., database/IT experts, skilled users etc.) and those who have domain knowledge and know how to interpret and use data (i.e., domain experts, end-users etc.). The result is a workflow where domain-experts either have to use pre-defined queries embedded in applications or communicate their information needs to database-experts. In such a workflow, the turn-around time from users’ initial information needs to receiving the answer can be in the range of weeks, incurring significant costs.
Visual Query Systems. Approaches that eliminate the man-in-the-middle and allow end-users to directly engage with data and extract it on their own, have been of interest to researchers for many years. As anticipated, for end-users, the accessibility of traditional structured query languages such as SQL and XQuery fall far short, since such textual languages do require end-users to have a set of technical skills and to recall domain concepts and the terminology and syntax of the language being used. For this very reason, visual query systems and languages have emerged to alleviate the end-user data access problem. A visual system or language follows the direct manipulation idea, where the domain and query language are represented with a set of visual elements.
Ranking. One of the challenges for a visual query system over a large vocabulary (schema, ontology) is that there may be hundreds or thousands of types, relations, or attributes to choose from. How should these be presented to the user?
One possibility is to "rank" suggestions such that the most likely choices appear at the top of the list. This ranking can be based on the ontology, the data, and most importantly previously posed queries. We have performed first experiments with a method that exploit the graph structure of previous queries, and a simpler method that ignores it, and the results are promising.
In this thesis, you will extend previous research on ranking to take into account the subtype structure in the ontology. E.g. assume that the ontology has a class Mammal with subclasses Dog and Cat, and the user constructs a query concerning dogs. If there have been very few past queries concerning dogs, the system could look at queries about mammals, and it is natural to assume that interesting properties of mammals might also be of interest to dogs. This goes both ways: ranking for a user query about mammals could look at previous queries about cats and dogs.
The work on the thesis will include
- some research into related work in collaborative filtering
- design of a new algorithm for ranking, possibly several
- implementation of that algorithm
- experimental evaluation of the algorithm