Accurate Linear-Time Chinese Word Segmentation via Embedding Matching

Jianqiang Ma and Erhard Hinrichs


Abstract

This paper proposes an embedding matching approach to Chinese word segmentation, which generalizes the traditional sequence labeling framework and takes advantage of distributed representations. The training and prediction algorithms of the model have linear-time complexity. Based on the proposed model, a greedy segmenter is developed and evaluated on benchmark corpora. Experiments show that our greedy segmenter achieves improved results over previous embedding-based word segmenters, and its performance is competitive with state- of-the-art methods, despite its simple feature set and the absence of external resources for training.