KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer

Rudolf Rosa and Zdenek Zabokrtsky


Abstract

We present KLcpos3, a language similarity measure based on Kullback-Leibler divergence of coarse part-of-speech tag trigram distributions in tagged corpora. It has been designed for multilingual delexicalized parsing, both for source treebank selection in single-source parser transfer, and for source treebank weighting in multi-source transfer. In the selection task, KLcpos3 identifies the best source treebank in 8 out of 18 cases. In the weighting task, it brings +4.5% UAS absolute, compared to unweighted parse tree combination.