Schemaless Semistructured Data Revisited --Reinventing Peter Buneman's Deterministic Semistructured Data Model-- (paper, abstract, slides)

[top]

Schemaless Semistructured Data Revisited
--Reinventing Peter Buneman's Deterministic Semistructured Data Model--

by Keishi Tajima

Abstract

This paper reviews the design of data models for semistructured data, particularly focusing on their schemaless nature. Uniform treatment of schema information and data, in other words, uniform treatment of metadata and data, is important in the design of such data models. This paper discusses what data and metadata are, and argues that attribute names, which are usually regarded as metadata, and key values, which are usually regarded as data, play similar roles when we organize large data sets. The paper revises one of the standard semistructured data models in accordance with that argument, and eventually reinvents the deterministic semistructured data model proposed by Peter Buneman and his colleagues. The contribution of this paper is an additional rationale of the design of that data model, a rationale based on the similarity between attribute names and key values.