Feature Selection in Kernel Space: A Case Study on Dependency Parsing

Xian Qian and Yang Liu


Abstract

Given a set of basic binary features, we propose a new L1 norm SVM based feature selection method that explicitly selects the features in their polynomial or tree kernel spaces. The efficiency comes from the anti-monotone property of the subgradients: the subgradient with respect to a combined feature can be bounded by the subgradient with respect to each of its component features, and a feature can be pruned safely without further consideration if its corresponding subgradient is not steep enough. We conduct experiments on the English dependency parsing task. Benefiting from the rich features selected in the tree kernel space, our model achieved the best reported unlabeled attachment score of 93.72 without using any additional resource.