Query Interfaces and Search Technology
OptiqueVQS is a user interface to construct complex queries over data described by an ontology. It is important for a good user experience to adjust the interface based on the available underlying data. But complex queries over large amounts of data are expensive.
We want to approach this problem by using search technology like Lucene/SOLR indices to build fast and scalable backend support for the query interface.
Data Access. A tremendous amount of data is being generated every day both on the Web and in public and private organisations; and, by all accounts, in this increasingly data-oriented world, any individual or organisation, who posses the necessary knowledge, skills, and tools to make value out of data at such scales, bears a considerable advantage in terms of competitiveness and development. Particularly, in an enterprise setting, ability to access and use data in business processes such as sense-making and intelligence analysis is key for its value creation potential.
Today, however, data access still stands as a major bottleneck for many organisations. This is mostly due to the sharp distinction between employees who have technical skills and knowledge to extract data (i.e., database/IT experts, skilled users etc.) and those who have domain knowledge and know how to interpret and use data (i.e., domain experts, end-users etc.). The result is a workflow where domain-experts either have to use pre-defined queries embedded in applications or communicate their information needs to database-experts. In such a workflow, the turn-around time from users’ initial information needs to receiving the answer can be in the range of weeks, incurring significant costs.
Visual Query Systems. Approaches that eliminate the man-in-the-middle and allow end-users to directly engage with data and extract it on their own, have been of interest to researchers for many years. As anticipated, for end-users, the accessibility of traditional structured query languages such as SQL and XQuery fall far short, since such textual languages do require end-users to have a set of technical skills and to recall domain concepts and the terminology and syntax of the language being used. For this very reason, visual query systems and languages have emerged to alleviate the end-user data access problem. A visual system or language follows the direct manipulation idea, where the domain and query language are represented with a set of visual elements.
Instant Feedback. One of the staples of state of the art search systems ("faceted search") is that the options available to the user are adjusted after every choice made. E.g. once the user has decided that she is interested in a car that is newer than 5 years, and the minimum price of cars of that age in the data set is 100kNOK, the available value range for the price filter is adjusted accordingly.
This is very good for usability (because it prevents the user from overconstraining their search and ending up with no results) but it is expensive to do on a large data set with complex, structured queries. Our ongoing research shows that a solution could be to use search technology like Lucene/SOLR, Elasticsearch, Sphinx, etc.
An additional benefit would be that many of these search engines also support approximate text search, quick completion of text entry fields, etc.
In this thesis, you will modify a previous, RDB-based implementation of a faceted search index, to make it use a search engine instead.
The work on the thesis will include
- some research into the choice of search technology
- implementation of faceted filtering based on search technology
- experimental evaluation of the approach on large data sets
- implementation of text input completion based on search technology