Proximity of query keyword occurrences is one important evidence which
is useful for effective query-biased document scoring. If a query
keyword occurs close to another in a document, it suggests high
relevance of the document to the query. The simplest way to measure
proximity between keyword occurrences is to use distance between them,
i.e., difference of their positions. However, most web pages contain
hierarchical structure composed of nested logical blocks with their
headings, and it affects logical proximity. For example, if a keyword
occurs in a block and another occurs in the heading of the block, we
should not simply measure their proximity by their distance. This is
because a heading describes the topic of the entire corresponding
block, and term occurrences in a heading are strongly connected with
any term occurrences in its associated block with less regard for the
distance between them. Based on these observations, we developed a
heading-aware proximity measure and applied it to three existing
proximity-aware document scoring methods: MinDist, P6, and Span. We
evaluated these existing methods and our modified methods on the data
sets from TREC web tracks. The results indicate that our heading-aware
proximity measure is better than the simple distance in all cases, and
the method combining it with the Span method achieved the best
performance.