Querying Composite Objects in Semistructured Data

by Keishi Tajima


In this paper, we propose an entity-based style of queries for semistructured data. First, we partition a semistructured data into subgraphs corresponding to real-world entities, in other words, into composite objects. To detect composite objects in semistructured data, we use the exclusiveness of references. If a reference is exclusive, then we regard it as a composite link. Then, we develop a query language for entity-based queries. That language supports path expressions, in which we can use edge expressions that match only with composite links or non-composite links. By using these expressions in combination with wild cards, we can specify queries of a form like “retrieve all entities including these data items,” which we call entity-based style queries. We show examples demonstrating how this style of query is useful especially when one does not have enough knowledge on the schema in advance.

Full Text: pdf

Slides: pdf

BibTex entry


semistructured data, query, entity, composite object, composite link, structure discovery, information discovery, subgraph, structural query, path query, regular expression, wild cards
Published in Proc. of FODO, pp.57-68, Nov. 1998, Kobe, Japan
tajima@i.kyoto-u.ac.jp / Fax: +81(Japan) 75-753-5978 / Office: Research Bldg. #7, room 404