Joint Dependency Parsing and Multiword Expression Tokenization

Alexis Nasr, Carlos Ramisch, José Deulofeu, André Valli


Abstract

Complex conjunctions and determiners are often considered as pretokenized units in parsing. This is not always realistic, since they can be ambiguous. We propose a model for joint dependency parsing and multiword expressions identification, in which complex function words are represented as individual tokens linked with morphological dependencies. Our graph-based parser includes standard second-order features and verbal subcategorization features derived from a syntactic lexicon. We train it on a modified version of the French Treebank enriched with morphological dependencies. It recognizes 81.79% of ADV-que conjunctions with 91.57% precision, and 82.74% of de-DET determiners with 86.70% precision.