Cut as a Querying Unit for WWW, Netnews, and E-mail

by Keishi Tajima, Yoshiaki Mizuuchi, Masatsugu Kitagawa, Katsumi Tanaka


In this paper, we propose a query framework for hypertext data, in particular, WWW pages, Netnews articles, and e-mails. In existing query tools for those hypertext data, such as search engines for WWW or intelligent news/mail readers, data units in query are individual nodes. In actual hypertext data, however, one topic is often described over a series of connected nodes, and therefore, the logical data unit should be such a series of nodes corresponding to one topic. This discrepancy between the data unit in query and the logical data unit hinders the efficient information discovery from hypertext data. To solve this problem, in our framework, we divide hypertexts into connected subgraphs corresponding individual topics, and we use those subgraphs as the data units in query.

Full Text: free download from ACM

Slides: pdf

BibTex entry


query, structuring, structure discovery, information discovery, graph-partitioning, WWW, Web, Netnews, e-mail, hypertext, unit, subgraph
Publishd in Proc. of ACM Hypertext, pp.235-244, Jun. 1998, Pittsburgh, PA.

Copyright © 1998 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
tajima@i.kyoto-u.ac.jp / Fax: +81(Japan) 75-753-5978 / Office: Research Bldg. #7, room 404