[top]

Schemaless Semistructured Data Revisited
--Reinventing Peter Buneman's Deterministic Semistructured Data Model--

by Keishi Tajima

Abstract

This paper reviews the design of data models for semistructured data, particularly focusing on their schemaless nature. Uniform treatment of schema information and data, in other words, uniform treatment of metadata and data, is important in the design of such data models. This paper discusses what data and metadata are, and argues that attribute names, which are usually regarded as metadata, and key values, which are usually regarded as data, play similar roles when we organize large data sets. The paper revises one of the standard semistructured data models in accordance with that argument, and eventually reinvents the deterministic semistructured data model proposed by Peter Buneman and his colleagues. The contribution of this paper is an additional rationale of the design of that data model, a rationale based on the similarity between attribute names and key values.

Full Text: pdf

Slides: pdf

BibTex entry

Keywords

semistructured, schemaless, self-describing, metadata, attribute name, key value, edge label, graph, table, multidimensional table
Published in In Search of Elegance in the Theory and Practice of Computation - Essays Dedicated to Peter Buneman, pp.466-482, LNCS, Vol. 8000, 2013


tajima@i.kyoto-u.ac.jp / Fax: +81(Japan) 75-753-5978 / Office: Research Bldg. #7, room 404