Training a Natural Language Generator From Unaligned Data

Ondřej Dušek and Filip Jurcicek


Abstract

We present a novel syntax-based natural language generation system that is trainable from unaligned pairs of input meaning representations and output sentences. It is divided into sentence planning, which incrementally builds deep-syntactic dependency trees, and surface realization. Sentence planner is based on A* search with a perceptron ranker that uses novel differing subtree updates and a simple future promise estimation; surface realization uses a rule-based pipeline from the Treex NLP toolkit.

Our first results show that training from unaligned data is feasible, the outputs of our generator are mostly fluent and relevant.