An Empirical Study of Chinese Name Matching and Applications

Nanyun Peng, Mo Yu, Mark Dredze


Abstract

Methods for name matching, an important component to support downstream tasks such as entity linking and entity clustering, have focused on alphabetic languages, pri- marily English. In contrast, logogram lan- guages such as Chinese remain untested. We evaluate methods for name matching in Chinese, including both string matching and learning approaches. Our approach, based on a new Chinese representation for Chinese, improves both name matching and a downstream entity clustering task.