Cut as a Querying Unit for WWW, Netnews, and E-mail
by Keishi Tajima, Yoshiaki Mizuuchi, Masatsugu Kitagawa, Katsumi Tanaka
In this paper, we propose a query framework for hypertext data, in
particular, WWW pages, Netnews articles, and e-mails. In existing
query tools for those hypertext data, such as search engines for WWW
or intelligent news/mail readers, data units in query are individual
nodes. In actual hypertext data, however, one topic is often
described over a series of connected nodes, and therefore, the logical
data unit should be such a series of nodes corresponding to one topic.
This discrepancy between the data unit in query and the logical data
unit hinders the efficient information discovery from hypertext data.
To solve this problem, in our framework, we divide hypertexts into
connected subgraphs corresponding individual topics, and we use those
subgraphs as the data units in query.