Semantic Dependency Parsing

Syntactic dependency parsing has seen great advances in the past eight or so years, in part owing to relatively broad consensus on target representations, and in part reflecting the successful execution of a series of CoNLL shared tasks.  From this very active research area, accurate and efficient syntactic parsers have developed for a wide range of natural languages.  However, the predominant target representation in dependency parsing to date are trees, in the formal sense that every node in the dependency graph is reachable from a distinguished root node by exactly one directed path.  This assumption is an essential prerequisite for both the parsing algorithms and the machine learning methods in state-of-the-art syntactic dependency parsers. Unfortunately, this means that these parsers are ill-suited for producing meaning representations, i.e. moving from the analysis of grammatical structure to sentence semantics.

Even if syntactic parsing arguably can be limited to tree structures, this is obviously not the case in semantic analysis, where a node will often be the argument of multiple predicates (i.e. have more than one incoming arc), and it will often be desirable to leave some nodes unattached (with no incoming arcs), for semantically vacuous classes as, for example, particles, complementizers, or relative pronouns. Both properties are present in the semantic dependency graph shown above, where ‘technique’ for example, is the argument of at least the determiner (as the quantificational locus), the modifier ‘similar’, and the predicate ‘apply’.  Conversely, the predicative copula, infinitival ‘to’, and the particle marking the deep object of ‘apply’ arguably have no semantic contribution of their own.

Besides the relation to syntactic dependency parsing, this project also has some overlap with Semantic Role Labeling (SRL).  There, however, target representations typically draw on resources like PropBank and NomBank, which are limited to argument identification and labeling for verbal and nominal predicates.  A plethora of semantic phenomena, e.g. negation and other scopal embedding, comparatives, possessives, various types of modification, and even conjunction, typically remain unanalyzed in SRL.  Thus, target representations are partial to a degree that can prohibit semantic downstream processing, for example inference-based techniques.  In the proposed project, we require parsers to identify all semantic dependencies, i.e. compute a representation that integrates all content words in one structure.  For example, the project could focus on the role of semantically vacuous elements (e.g. the infinitival ‘to’ in the above example, but when used as a direction preposition, ‘to’ would be meaning-bearing) in different frameworks and on the reliable identification (i.e. disambiguation) of these.

Over the past six or so years, we see beginning research into parsing with graph-structured representations, for example Sagae & Tsujii (2008), Jones, et al. (2013), and Chiang, et al. (2013).  However, some of these studies are purely theoretical, others limited to smaller, non-standard data sets.  In recent years, we note an increase in interest for this line of research, as well as emerging resources that can mature into broadly accepted target representations of semantic dependencies.  Through its participation in the development of the DeepBank Treebank, as well as through co-organizing two international competitions on semantic dependency parsing for English (as part of SemEval 2014 and SemEval 2015), LTG staff has first-hand access and an in-depth understanding of such emerging resources, i.e. large corpora annotated with gold-standard semantic dependencies.  For these reasons, we expect that an MSc project (somewhere) in this space would be a good vehicle to pull together and better understand candidate target annotations, as well as to refine, adapt, or develop afresh algorithms and statistical models for parsing into these types of more semantic representations.

Seeing as it is related to ongoing research of the supervisors, the proposed project can be calibrated to a large degree to foreground different aspects, depending on prior knowledge and expected learning outcomes of the student; for example, theoretical and linguistic aspects could play a larger role if focussing on the empirical, large-scale comparison of gold-standard semantic dependency annotations; conversely, implementation and experimentation aspects could be emphasized, if the project were to work predominantly on building and evaluating practical semantic dependency parsers. Please see the supervisors to discuss possible directions of this project.

Emneord: language techology
Publisert 1. des. 2017 09:43 - Sist endret 1. des. 2017 09:43

Omfang (studiepoeng)