Learning to Mine Query Subtopics from Query Log

Zhenzhong Zhang, Le Sun, Xianpei Han


Abstract

Many queries in web search are ambiguous or multifaceted. Identifying the major senses or facets of queries is very important for web search. In this paper, we represent the major senses or facets of queries as subtopics and re-fer to indentifying senses or facets of queries as query subtopic mining, where query subtop-ic are represented as a number of clusters of queries. Then the challenges of query subtopic mining are how to measure the similarity be-tween queries and group them semantically. This paper proposes an approach for mining subtopics from query log, which jointly learns a similarity measure and groups queries by explicitly modeling the structure among them. Compared with previous approaches using manually defined similarity measures, our ap-proach produces more desirable query subtop-ics by learning a similarity measure. Experi-mental results on real queries collected from a search engine log confirm the effectiveness of the proposed approach in mining query sub-topics.