Abstract As XML has become popular as a document standard in the World Wide Web, a lot of research has been done on the XML storage systems for storing and managing XML documents using existing DBMSs. Most of the research activities, however, assume a relational DBMS instead of an object-oriented/object-relational (OO/OR) DBMS, which offers more powerful modeling capabilities. In this paper, we present the design and implementation of an XML storage system designed for an OO/OR DBMS. Specifically, we first analyze the mapping from an XML document structure to OO/OR database schema. Second, we propose a method for describing the mapping using a standard language called the XML Schema Language. Third, we propose system catalog classes for storing the mapping information specified by users in the database. Fourth, we propose a detailed algorithm for storing XML documents in an OO/OR database, based on the mapping information. We believe the proposed system is practically usable for object-oriented programmers and DBMS implementors. 1 INTRODUCTION XML is widely accepted as a new standard for documents having structural information on the Web [Sim00]. Typically, the number of Web documents is very large and, therefore, storing and managing them require an efficient storage manager. There exist several types of XML storage systems, but most of them use relational DBMSs, and this is what currently available commercial DBMSs do as well [Che00, Mic00]. Naturally, their focus has been on storing XML documents as relational database records [Flo99, Sha99, Sha01]. Storing XML documents as database records requires a specification
of the mapping from the document structures to database schema. Currently,
most commercial DBMSs provide such specification languages, but the
languages are proprietary and limited to specifying a mapping to relational
databases only. This limitation keeps us from exploiting the powerful
modeling capabilities of object-oriented/object-relational (OO/OR) DBMSs
[Sto99] like the references and the collections. Furthermore, learning
the proprietary languages is a burden to users. In this paper, we propose an XML storage system geared for an OO/OR DBMS. For this purpose, we first analyze the mapping from XML document structures to OO/OR schema and propose a specification language based on the standard XML Schema Language. Then, we propose a set of system catalog classes for storing user-specified mapping information in the database and propose a detailed algorithm for storing XML documents in an OO/OR database based on the mapping information. The rest of the paper is as follows. Section 2 provides an overview
of the XML Schema Language, and Section 3 provides an overview of the
XML supports available from commercial DBMSs. Section 4 analyzes the
mapping from the structure of XML documents to an OO/OR database schema
and proposes an XML mapping language based on the analysis results.
Section 5 proposes the system catalog classes and the detailed algorithm
for storing XML documents. Section 6 concludes the paper. 2 XML SCHEMA LANGUAGE The XML Schema Language is a standard from W3C that replaces XML Document Type Definitions (DTDs) for specifying the structure of an XML document [Bir01, Fal01, Tho01]. It is written in XML and offers several important elements including xsd:element, xsd:attribute, xsd:complexType, and xsd:annotation. The element xsd:element is used for defining an element. It has the attributes name and type that respectively represent the name and type of the given element. Additionally it has the attributes minOccurs and maxOccurs that respectively represent the maximum and the minimum numbers of occurrences of the element. The element xsd:attribute is used for defining an attribute. It has the attributes name and type that respectively represent the name and type of the given attribute. The element xsd:complexType is used for defining the type of an element having subelements or attributes. In the XML Schema Language, if an element has subelements or attributes, the type of the element is called the complex type (complexType), and otherwise called the simple type. The element xsd:annotation is used for annotating additional information like the comment on a document and the information used by application programs. It has two subelements―xsd:documentation and xsd:appinfo. The former is used for specifying comment and the latter is used for specifying information for application programs. Figure 1 represents an exemplary XML Schema. The xsd:element with the optional xsd:attribute defines the elements book, title, author, etc. The element book has the subelements title and author representing the title and authors, respectively. It also has the attribute id representing the unique id of a book. The element author has the subelements name and email representing the name and email address of the author, respectively. The xsd:complexType defines the complex types of the elements book and author. The xsd:annotation defines the annotation like the copyright information. Fig. 1: An example of XML schema.
In this section we give an overview of the XML structure-to-database schema mapping specification languages currently available from two selected commercial DBMSs: IBM DB2 and Microsoft SQL Server. Note that the specification languages of both DBMSs are geared for
a mapping to relational schema only and, therefore, cannot utilize such
constructs as the references and collections available from an OO/OR
schema. Furthermore, both languages are proprietary to the specific
DBMSs and, therefore, make it involving for users to switch between
them. IBM DB2 IBM DB2 offers the XML Extender for storing XML documents. Its XML structure-to database schema mapping language is Document Access Definition (DAD). The DAD uses two elements, element_node and attribute_node, for describing the structure of XML documents, and one element RDB_node for describing its mapping to database schema. Element_node specifies the elements in an XML document, and attribute_node specifies the attributes. Both element_node and attribute_node have RDB_node as their subelement. RDB_element in turn has the subelements table, column, and condition, which respectively specifies a table, a column, and a primary key-foreign key relationship between tables in a database schema. In the XML-to-schema mapping using DAD, XML elements are mapped to database tables or columns, XML attributes are mapped to database columns, and relationships between XML elements are mapped to primary key-foreign key relationships between database tables. The principles for this mapping are as follows. First, specify the structure of XML documents by using element_node and attribute_node. Second, specify the mapping to a database schema using RDB_node. Third, specify all primary key-foreign key relationships between tables in the RDB_node subelement of the root element_node. Fourth, specify the table and column, to which an element or an attribute is mapped, in the RDB_node subelement of each non-root element_node and attribute_node. Figure 3 shows a DAD specifying the mapping from the XML document structure in Figure 1 to the database schema in Figure 2. The table book, the table author and the primary key-foreign key relationship between the two tables are specified in the RDB_node subelement of the root element_node named “book.” The columns id and title of the table book are specified in the RDB_node subelements of the attribute_node named “id” and the element_node named “title,” respectively. Likewise, the columns name and email of the table author are specified in the RDB_node subelements of the element_node named “name” and the element_node named “email.” Fig. 2: An example relational database schema. Fig. 3: An example DAD. Microsoft SQL Server Microsoft SQL Server offers the XML Bulk Load utility for storing XML documents[Mic00]. Its XML-to-database mapping language is the annotated XML-Data Reduced (XDR) Schema. This language uses four elements―ElementType, AttributeType, element, attribute―for specifying the structure of an XML document, and two attributes relation and field as well as one element relationship for specifying the mapping to a database schema. Specifically, ElementType and AttributeType are used respectively to declare XML elements and attributes, whereas element and attribute are used to refer to the declared ElementType and AttributeType. Additionally, the attributes relation and field respectively specify a table and a column, and the element relationship specifies the primary key-foreign key relationship between two tables in the database schema. Like the DAD, the annotated XDR Schema maps XML elements and attributes to database tables or columns, and relationships between XML elements to primary key-foreign key relationships between database tables. The principles for specifying the mapping are like those for the DAD. First, specify the structure of XML documents by using the elements ElementType, AttributeType, element, and attribute. Second, specify the table and column, to which an element or an attribute is mapped, by using the attributes relation and field. Third, specify the primary key-foreign key relationship between tables by using the element relationship. Figure 4 shows an annotated XDR Schema corresponding to the DAD in Figure 3. The element book is mapped to the table book, and the attribute author and the element title are respectively mapped to the columns id and title of the table book. Likewise, the element author is mapped to the table author, and the attribute name and the element email are respectively mapped to the columns name and email of the table author. Besides, the relationship between the two elements book and author is mapped to the primary key-foreign key relationship between the two tables book and author. Fig. 4: An example of annotated XDR schema.
In this section we analyze the mappings from an XML document structure to a database schema and, based on the analysis results, propose an XML structure-to-database schema mapping language designed for OO/OR databases. Analysis of Mapping from XML Document Structure to Database Schema Figure 5 shows possible mappings from an XML document structure to an OO/OR schema. The rectangle on the left side shows the components of an XML document, and the rectangle on the right hand side shows the components of an OO/OR database schema. The arrows denote possible mappings between the two. As shown in the arrows numbered 1 through 4, each XML element or attribute can be mapped to either a database class or a column. However, if XML attributes are mapped to classes, join operations are required unnecessarily when processing queries and, therefore, we do not allow this kind of mapping in this paper. The arrows for these disallowed mappings are distinguished with broken lines in Figure 5. Fig. 5: Possible mappings from an XML structure to an object-oriented/object-relational schema. A relationship between an element and an attribute can be mapped to either a relationship between a class and a column (i.e., arrow 5) or a relationship between a class and another class (i.e., arrow 7). However, only the latter option applies because XML attributes are mapped to only the database columns. The relationship between an element and another element is mapped to either the relationship between a class and a column (i.e., arrow 6) or the relationship between two classes (i.e., arrow 8). We do not need to explicitly specify the mappings to a relationship between a class and a column (i.e., arrows 5 and 6) because this relationship is already maintained in the database schema. That is, as long as we know the mappings between XML elements/attributes and database classes/columns (i.e., arrows 1, 2, and 4), we can find a relationship between an XML element and an attribute or between two elements that is mapped to to the relationship between a class and a column. XML Mapping Language for OO/OR Databases To specify the explicit mappings (i.e., arrows 1, 3, 4, and 8), we need the following kinds of information: 1) information on the XML document structure, 2) information on the database schema, and 3) information on the mappings from the XML document structure to the database schema. We specify the XML document structure in the XML Schema Language, and specify the database schema and the mappings by adding three new subelements of the element appinfo in the XML Schema Language. Thus, we need only the XML Schema Language to specify all necessary mapping information. The first two new subelements of the three are Class and Column. The element Class has the attribute name to specify the name of a class, and the element Column has the attributes name and type to specify the name and the type of a column, respectively. We use these two elements to specify the classes and columns in the database schema as well as the mappings from elements or attributes to classes or columns (i.e., arrows 1, 3, and 4). The third new subelement is Relationship. It specifies the mapping from a relationship between two elements to a relationship between two classes (i.e., arrow 8). In an OO/OR database schema, the relationship between two classes can be represented using the reference type and the collection type. The relationship is classified into one-to-one relationship and one-to-many relationship based on the cardinality constraint. It is also classified into a uni-directional relationship and a bi-directional relationship depending on whether the relationship exists in only one direction or in both directions. To specify the mapping of a relationship as described above, the element
Relationship has four attributes parent, child,
cardinality, and isOrdered. The attribute parent
specifies a reference-type column of the table mapped from a parent
element, and the attribute child specifies a reference-type column
of the table mapped from a child element. The attribute cardinality
specifies the cardinality constraint of the relationship, and the attribute
isOrdered specifies the order-preservation constraint of the
relationship. The direction of a relationship is uni-directional if
the attribute child is omitted, otherwise bi-directional. Figure 6 shows an example of a mapping written in this proposed specification
language. It shows a portion of mapping an XML document structure in
Figure 1 to an OO/OR schema in Figure 7. The mapping information is
annotated with subelements of the element appinfo. Specifically,
the database schema is specified with the elements Class and
Column on their own, and the mapping from the elements or attributes
to the database schema is specified with the two elements used as subelements
of the elements or attributes. The relationship between the class book
and the class author is one-to-many and ordered as specified
with the attributes ‘cardinality=“onetoMany”’
and ‘isOrdered=“yes”’ of the Relationship
element. Here, we use the list type, “list(ref(author))”
instead of the set type to preserve the order of the relationship. Fig. 6: An example mapping from the structure of XML documents to object-oriented/object-relational schema. Fig. 7: An example OO/OR database schema for Figure 2. 5 STORING XML DOCUMENTS BASED ON THE MAPPING In this section we describe a set of database system catalog classes for storing user-provided XML structure-to-database schema mapping information, and present an algorithm for storing XML documents in an OO/OR database according to the mapping information. Catalog Classes for Storing the Mapping information As mentioned in Section 4, we need to store the information on the XML document structure, the database schema, and the mapping between them. Since the information on the database schema is already stored in the database system catalog, we have only to store the information on the XML document structure and the mapping. In this paper we store the information on XML document structures and the information on mappings in the same classes to avoid unnecessary joins. Figure 8 shows the catalog classes used for this purpose. The class xmlSysElements is for each element and its mapping, the class xmlSysAttributes is for each attribute and its mapping, and the class xmlSysRelationships is for each relationship (between two XML elements) and its mapping. The class xmlSysElements has the columns elementId and
elementName for storing an element. Additionally, it has the
columns flag, classId, and columnNo for storing
the mapping information of the element. Specifically, the column flag
specifies whether the element is mapped to a class or a column, the
column classId specifies the identifier of the class the element
is mapped to, and columnNo specifies the number of the column
the element is mapped to. The class xmlSysAttributes has the columns elementId
and attributeNo, and attributeName for storing an attibute.
Additionally, it has the columns classId for storing the identifier
of the class the attribute is mapped to and the column columnNo
for storing the number of the column the attribute is mapped to. The class xmlSysRelationships has the columns parentId,
childId, and cardinality for storing a relationship. Additionally,
it has the columns flag, isOrdered, parentClassId,
parentColumnNo, childClassId, and childColumnNo
for storing the mapping information of the relationship. Specifically,
the column flag specifies whether the child element is mapped
to a class or to a column of the class mapped from the parent element,
the columns parentClassId and parentColumnNo specify the
class and the column the parent element is mapped to, and the columns
childClassId and childColumnNo specify those the child
element is mapped to. Fig. 8: Catalog classes for storing the mapping information. Algorithm for Storing XML documents in an OO/OR database Figure 9 shows the algorithm StoreXML_ORDB for storing XML documents
in an OO/OR database. The algorithm reads each XML element E
one by one and stores it in the database. In lines 3-15, if the element
E is mapped to a class, the algorithm creates an object O
of the class and forms a relationship with the object Op
that stores the parent element. Specifically, if the relationship is
one-to-one, in line 9 the algorithm stores the OID of O in the
column of Op that is a reference to O. If the
relationship is one-to-many, in line 11 the algorithm stores the OID
of O in the column of Op that is a collection
of references to O. Besides, if the relationship is bi-directional,
in line 13 the algorithm stores the OID of Op in the
column of O that is a reference to Op. In lines
16-22, if the element E is mapped to a column, the value of E
is stored in the column. In lines 23-24, each attribute that belongs
to the element E is stored in the column of O to which
E is mapped. 6 CONCLUSION In this paper, we proposed an XML storage system for storing and managing XML documents efficiently in an object-oriented/object-relational (OO/OR) database. The system offers an XML-to-database mapping language based on the standard XML Schema Language, and provides methods usable in an OO/OR DBMS as well as a relational DBMS. Specifically, we analyzed the mapping from an XML document structure to an OO/OR database schema, and presented (1) a mapping specification language based on the results of the analysis, (2) database catalog classes for storing user-provided mapping information, and (3) an algorithm for mapping and storing XML documents in an OO/OR database. Fig. 9: An algorithm for storing XML documents in an OO/OR database.
ACKNOWLEDGEMENT This work was supported by the Korea Science and Engineering Foundation(KOSEF) through the Advanced Information Technology Research Center(AITrc). REFERENCES [Bir01] Biron, P. and Malhotra, A.: XML Schema Part 2: Datatypes, May 2001 (available from http://www.w3.org/TR/xmlschema-2). [Che00] Cheng, J. and Xu J.: “XML and DB2,'' In Proc. the 16th Int'l Conf. on Data Engineering, pp. 569―573, San Diego, California, USA, 2000. [Fal01] Fallside, D.: XML Schema Part 0: Primer, May 2001 (available from http://www.w3.org/TR/xmlschema-0). [Flo99] Florescu, D. and Kossmann, D.: A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database, Technical Report RR-3680, INRIA, May 1999. [Mic00] Microsoft Corp.: Microsoft SQL Server 2000, 2000 (available from http://www.microsoft.com/sql/default.asp). [Sha99] Shanmugasundaram, J. et al.: “Relational Databases for Querying XML Documents: Limitations and Opportunities,” In Proc. 25th Int'l Conf. on Very Large Data Bases, pp. 302-314, Edinburgh, Scotland, UK, Sept. 1999. [Sha01] Shanmugasundaram, J. et al.: “A General Technique for Querying XML Documents Using a Relational Database System,'' ACM SIGMOD RECORD, Vol. 30, No. 3, Sept. 2001. [Sto99] Stonebraker, M. and Moore, D.: Object-Relational DBMSs: The Next Great Wave, Morgan Kaufmann, 1999. [Sim00] Simon, H., Strategic Analysis of XML for Web Application Development, Computer Research Corp., 2000. [Tho01] Thompson, H. et al.: XML Schema Part 1: Structures, May 2001 (available from http://www.w3.org/TR/xmlschema-1).
About the authors
Cite this article as follows: Woo-Shin Han, Ki-Hoon Lee, Byung Suk Lee: “An XML Storage System for Object-Oriented/Object-Relational DBMSs”, in Journal of Object Technology, vol. 2, no. 3, May-June 2003, pp. 113-126. http://www.jot.fm/issues/issue_2003_05/article2 |