Problem Statement and Choice of Methods
The proposed research takes as its point of departure two fundamental assumptions: (a) semantic parsing of large and diverse volumes of human language content will be of growing importance to the evolution of the Internet of the Future; and (b) scientific progress to date and long-term uptake are hindered by a vague problem definition, lack of general purpose technology, the unsolved problem of domain (and genre) adaptation, and insufficient interface standardization and documentation.
Materials, resources and technology
At the same time, we believe that linguistic theory and computational parsing have matured to a point where large-scale application to Web content is within reach. However, there are substantive scientific and technological challenges that need to be addressed to reach this goal. These pertain to the (search for equilibrium along the) three dimensions of variation: robustness, precision, and efficiency. Besides pushing ahead in all three directions, it is important to determine (and document) precisely what state-of-the-art technology can and cannot deliver—for different types of inputs and parser configurations—and what the trade-offs are along the three dimensions. Furthermore, to establish semantic parsing as a general purpose technology, there are methodological shortcomings to overcome. These include techniques for adaptation and reuse across different domains, genres, and tasks, as well as for flexible parametrization and configuration, choosing an optimal balance of robustness, precision, and processing cost for a specific use scenario.
In summary, the proposed research is technology-driven and empirical in nature. Our main hypothesis here is than an insightful synthesis of existing knowledge and currentbest practices—across approaches, frameworks and languages—will afford sufficient incremental advances in methodology and technology to enable Web-scale semantic parsing. A subsidiary hypothesis relates to the interaction of theory and technology development. We believe that adaptability, scalability and long-term scientific progress in parsing technology can only be obtained through the combination of leading linguistic and computer science expertise—reflecting the interdisciplinary nature of the problem. The project seeks to (a) validate this proposition through the development of semantic parsing technology for a large and representative selection of user-generated Internet content; (b) quantify relatative success in a multi-dimensional grid of lingusitic and computational metrics; and (c) anchor its results in a collaborative, multi-national effort to better define the semantic interface to grammatical analysis. If successful, the proposed research will advance the state of the art in language technology in several important areas, including a sharper definition for the task of semantic parsing, greatly improved parsing and adaptation techniques, detailed knowledge about the effects of relevant choices, and a community process towards a de-facto interface standard. As an important by-product, the project will compile novel UGC corpora.
The proposed research is organized in four intertwined tracks as depicted above, with one core effort in general purpose semantic parsing and three smaller support activities: resource creation, interface corroboration and a demonstration application.