We propose methods for generating diversified rankings of subtopics of
keyword queries. Our methods are characterized by their awareness of
hierarchical heading structure in documents. The structure consists of
nested logical blocks with headings. Each heading concisely describes
the topic of its corresponding block. Therefore, hierarchical headings
in documents reflect the hierarchical topics referred to in the
documents. Based on this idea, our methods score subtopic candidates
based on matching between them and hierarchical headings in
documents. They give higher scores to candidates matching hierarchical
headings associated to more contents. To diversify the resulting
rankings, every time our methods adopt a candidate with the best
score, our methods exclude the blocks matching the candidate and
re-score all remaining blocks and candidates. According to our
evaluation result based on the NTCIR data set, our methods generated
significantly better subtopic rankings than query completion results
by major commercial search engines.
subtopic mining;
hierarchical heading structure;
document structure;
sectional structure;
topic structure;
web search;
search result diversification;
search intent;
Published in Proc. of WEBIST, pp.121-130, Rome, Italy, 2016