Language Identification and Modeling in Specialized Hardware

Kenneth Heafield, Rohan Kshirsagar, Santiago Barona


Abstract

We repurpose network security hardware to perform language identification and language modeling tasks. The hardware is a deterministic pushdown transducer since it executes regular expressions and has a stack. One core is 2.4 times as fast at language identification and 1.8 to 6 times as fast at part-of-speech language modeling.