Word clustering for improved statistical parsing
The use of lexical semantic information for the task of syntactic parsing has seen varied success. Recently, however, the use of lexical semantic clusters derived from large corpora has been shown to improve parsing performance. It is still unclear, however, how different properties of these clusters affect results. This project aims to investigate the use of different types of clusters during syntactic parsing.
More precisely the idea is to use word clusters as a source for features in a statistical disambiguation model for a dependency parser. Generally, the clusters will group together words with similar distributional properties. The exact nature of these similarity relations, however, will vary depending on the types of context features that are used when performing the clustering. For this project we will basically be doing an extrinsic form of cluster evaluation then; investigating how different clustering parameters in turn affect the performance of a statistical parser.