Subtopic Ranking Based on Hierarchical Headings

by Tomohiro Manabe, Keishi Tajima


We propose methods for generating diversified rankings of subtopics of keyword queries. Our methods are characterized by their awareness of hierarchical heading structure in documents. The structure consists of nested logical blocks with headings. Each heading concisely describes the topic of its corresponding block. Therefore, hierarchical headings in documents reflect the hierarchical topics referred to in the documents. Based on this idea, our methods score subtopic candidates based on matching between them and hierarchical headings in documents. They give higher scores to candidates matching hierarchical headings associated to more contents. To diversify the resulting rankings, every time our methods adopt a candidate with the best score, our methods exclude the blocks matching the candidate and re-score all remaining blocks and candidates. According to our evaluation result based on the NTCIR data set, our methods generated significantly better subtopic rankings than query completion results by major commercial search engines.

Full Text: pdf

Slides: pdf

BibTex entry


subtopic mining; hierarchical heading structure; document structure; sectional structure; topic structure; web search; search result diversification; search intent;
Published in Proc. of WEBIST, pp.121-130, Rome, Italy, 2016

tajima@i.kyoto-u.ac.jp / Fax: +81(Japan) 75-753-5978 / Office: Research Bldg. #7, room 404