Extended Topic Model for Word Dependency

Tong Wang, Vish Viswanath, Ping Chen


Abstract

Topic Model such as Latent Dirichlet Allocation(LDA) makes assumption that topic assignment of different words are conditionally independent. In this paper, we propose a new model Extended Global Topic Random Field (EGTRF) to model non-linear dependencies between words. Specifically, we parse sentences into dependency trees and represent them as a graph, and assume the topic assignment of a word is influenced by its adjacent words and distance-2 words. Word similarity information learned from large corpus is incorporated to enhance word topic assignment. Parameters are estimated efficiently by variational inference and experimental results on two datasets show EGTRF achieves lower perplexity and higher log predictive probability.