The Users Who Say 'Ni': Audience Identification in Chinese-language Restaurant Reviews

Rob Voigt and Dan Jurafsky


Abstract

We give an algorithm for disambiguating generic versus referential uses of second-person pronouns in restaurant reviews in Chinese. Reviews in this domain use the `you' pronoun 你 either generically or to refer to shopkeepers, readers, or for self-reference in reported conversation. We first show that linguistic features of the local context (drawn from prior literature) help in disambigation. We then show that document-level features (n-grams and document-level embeddings) - not previously used in the referentiality literature - actually give the largest gain in performance, and suggest this is because pronouns in this domain exhibit 'one-sense-per-discourse'. Our work highlights an important case of discourse effects on pronoun use, and may suggest practical implications for audience extraction and other sentiment tasks in online reviews.