Unifying Bayesian Inference and Vector Space Models for Improved Decipherment

Qing Dou, Ashish Vaswani, Kevin Knight, Chris Dyer


Abstract

We introduce into Bayesian decipherment a base distribution derived from similarities of word embeddings. We use Dirichlet multinomial regression (Mimno and McCallum, 2012) to learn a mapping between ciphertext and plaintext word embeddings from non-parallel data. Experimental results show that the base distribution is highly beneficial to decipherment, improving state-of-the-art decipherment accuracy from 45.8% to 67.4% for Spanish/English, and from 5.1% to 11.2% for Malagasy/English.