[top]

Querying Composite Objects in Semistructured Data

by Keishi Tajima

Abstract

In this paper, we propose an entity-based style of queries for semistructured data. First, we partition a semistructured data into subgraphs corresponding to real-world entities, in other words, into composite objects. To detect composite objects in semistructured data, we use the exclusiveness of references. If a reference is exclusive, then we regard it as a composite link. Then, we develop a query language for entity-based queries. That language supports path expressions, in which we can use edge expressions that match only with composite links or non-composite links. By using these expressions in combination with wild cards, we can specify queries of a form like “retrieve all entities including these data items,” which we call entity-based style queries. We show examples demonstrating how this style of query is useful especially when one does not have enough knowledge on the schema in advance.

Full Text: pdf

Slides: pdf

BibTex entry

Keywords

semistructured data, query, entity, composite object, composite link, structure discovery, information discovery, subgraph, structural query, path query, regular expression, wild cards
Published in Proc. of FODO, pp.57-68, Nov. 1998, Kobe, Japan
tajima@i.kyoto-u.ac.jp / Fax: +81(Japan) 75-753-5978 / Office: Research Bldg. #7, room 404