Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning

Angeliki Lazaridou, Georgiana Dinu, Marco Baroni


Abstract

Zero-shot methods in language, vision and other domains rely on a cross-space mapping function that projects vectors from the relevant feature space (e.g., visual- feature-based image representations) to a large semantic word space (induced in an unsupervised way from corpus data), where the entities of interest (e.g., objects images depict) are labeled with the words associated to the nearest neighbours of the mapped vectors. Zero-shot cross-space mapping methods hold great promise as a way to scale up annotation tasks well beyond the labels in the training data (e.g., recognizing objects that were never seen in training). However, the current performance of cross-space mapping functions is still quite low, so that the strategy is not yet usable in practical applications. In this paper, we explore some general properties, both theoretical and empirical, of the cross-space mapping function, and we build on them to propose better methods to estimate it. In this way, we attain large improvements over the state of the art, both in cross-linguistic (word translation) and cross-modal (image labeling) zero-shot experiments.