An efficient multiversion access control
in a Temporal Object Oriented Database
G. Arumugam, Department of Computer Science, Madurai Kamaraj
University, Madurai, India
M. Thangaraj, Department of Computer Science, Madurai Kamaraj
University, Madurai, India
|
 |
ARTICLE

PDF Version |
Abstract
Many data base applications require the storage and manipulation of
different versions of data objects. To satisfy the diverse needs of
these applications, current database systems support versioning [Hulten94]
at a very low level. The proposed model demonstrates that an application
independent versioning can be supported significantly at higher level.
The model uses signature patterns [Norvag99], hash table and B+ trees.
The outcome of the analysis shows that the proposed model is performing
well for all possible operations on versions.
1 INTRODUCTION
In an object oriented database system, the object model is often useful
to store information about different aspects of an entity. Normally,
this information is stored as versions of the corresponding objects.
There are two broad categories of versions such as system level and
application level. The system level versions are created and maintained
by the database system. Such versions are useful for concurrency control
[Bertino94], transaction support and redundancy in distributed databases.
In contrast, application-level versions are created by applications
for specific purposes. Examples of such versions include alternative
designs for the object, previous states of the object, and so on.
In a temporal object database system[TODB] [Nervag01], it is usually
assumed that most of the accesses will be to the current version
of the objects in the database. In order to keep these accesses as
efficient
as possible, and benefit from object clustering, the database is
partitioned. The current version objects are stored in the current
database, and
the historical versions are stored in the historical database. When
an object is updated in a TODB, the current version is first moved
to the historical database, before the new version is stored in-place
in the current database. The OID needs to be updated every time an
object is updated.
The scope of this paper is to study about application level versions.
The users are able to access any subset of versions of an object
or choose a version based on specified properties. In order to
maintain different versions of an object for efficient query operation,
an
effective
indexing structure is needed. In this paper, we present an index-based
model that controls the access of versioned objects in the database.
The remainder of the paper is organized as follows: Section 2 is
devoted to the issues relevant to the version management. In
Section 3, we
describe the architecture for version control. Section 4 shows
our performance evaluation result. Finally, in Section 5 we present
conclusion.
2 RELATED WORK
The notion of versioning can be modeled using pairs of types,
each pair consisting of a generic type and version type.
Each versioned
entity has single associated generic object and zero or more
associated version objects. The generic object contains the
information common
to all of its versions. All the versions of a generic object
have the same scheme, but they differ only in the values
of their attributes.
Versioning model [Sciore93]
EXTRA V is a model, contains key words, which are versioned.
Attributes appearing after this keyword in a type declaration
are versioned
and those appearing before the keyword are unversioned.
A version of a
conceptual object denotes a previous or current state
of the object.
Multi-version schemes [Lomet89]
In a multi version database systems, each write operation
creates a new version whereas read operation selects
one of the versions.
The
concurrency control scheme must ensure that the selection
of the version to be read is done in a manner that
ensures serializability.
Multi-version indexing[MVI] [Thangaraj03][Bober97]
It extends single-version indexing schemes to handle
multi version data. It outlines four major approaches
such as
chaining, data
page version selection, primary index version
selection and all index
version selection. These approaches will work
with the on page catching method
for storing prior versions and the B+ tree is
used as a data structure for indexing.
3 PROPOSED MODEL
The multi version indexing schemes are considered
different in the way they accomplish version
selection. The selection
mechanism
locates
the appropriate version of a tuple from a
collection of existing versions. Earlier approaches use
only B+ tree
[Bober97] to
manage the different
versions. The proposed model, ATVIM, supports
efficient handling of multiple versions of
a single object.
The ATVIM is an integrated model supporting
both inheritance hierarchy and aggregation
hierarchy.
The ATVIM model
in Fig.1 is an improved
version of the model proposed by Thangaraj
al. [Thangaraj99][Thangaraj03].
The index model consists of two indexes
namely primary and auxiliary indexes.
Both indexes
are using the
data structures
with the
combination of signature patterns [Norvag99],
hash tables and B+trees.
The primary index is indexed on the values
of the attribute An. It associates
with a value K of An,
the set of
OIDs of instances
of
all classes that have K as the value
of the attribute
An. The second index,
called auxiliary index, has OIDs as
indexing keys. This auxiliary index associates
with the OID of
an object
O and the list
of OIDs of the
parents of O. The leaf-records in the
primary index contain pointers to the
leaf-node
records in auxiliary
index,
and vice-versa.
Therefore, the primary index is inverted
with respect to the values of attribute
An and it
is used for
retrieval operations. The auxiliary
index is inverted with respect to
OIDs of instances of all classes (except
for the
class, root
of the path, and
for
its
subclasses)
in the scope of a given path. Basically,
the auxiliary index is used to
determine all primary records where
the OIDs of a given instance are
stored in
order to
efficiently perform
delete
and insert
operations.
The primary index consists of a hash
table, signature patterns and B+
trees. Each bucket
in the hash
table has a signature
pattern [Norvag99]
and a pointer to a B+ tree. The
signature is a bit pattern that has a pointer
array of n bits which
holds 1 or 0,
while mapping,
the
bit corresponding to the object
key is set as 1.
During searching, the
object availability can be determined
with one comparison. Any effective
hashing function
can
be used to hash
the given key
into the hash
table and hence into the corresponding
B+ tree.
The format of the non-leaf node
has a structure similar to the
traditional
index structure
on B+ tree. The
leaf node
in the
primary B+ tree
has different structure that
contains the information about classid,
record length, key value, class
directory,
and
offset. The class directory consists
of
a set of
classes that
are having
instances with
the key value in the indexed
attribute. The offset shows the position
of
the OID in the record. The remaining
part of the node structure consists
of the number
of
OIDs in
each class
and their
versions, and the OID
along with the pointer to the
corresponding object in the auxiliary index.

Fig. 1: ATVIM architecture
The auxiliary index also consists of a hash table, signature pattern
and B+ trees. Every entry in the hash table has signature pattern that
indicates the availability of the object and a pointer to a B+ tree.
Any effective hashing technique can be used to hash the OID of the
object into the hash table and hence into the corresponding B+ tree.
The format of a leaf node in the auxilliary B+ tree is as shown in
Fig 1. It consists of the OID of the object, record length (number
of OIDs referring to the particular object called as parents of the
object), pointer to the primary record and list of parents.
Operations
In this subsection, we present an algorithm for retrieval, insert
and delete operations for ATVIM model.


The earlier model, MVI uses the data structure B+ tree for the indexed
access. The performance of the MVI model is fully depends on the height
of the tree. As the height of the B+ tree is increasing with respect
to the volume of objects, which increases the retrieval time of an
object. But in our proposed model, each bucket in the hash table points
to a B+ tree which reduces the height of the tree. As the keys are
uniformly distributed among the trees, the height of the trees is reduced
when compared to the MVI model. The searching for availability of the
particular versions in the database can be easily identified with a
single comparison using the signatures in each bucket.
4 PERFORMANCE ANALYSIS
In this section, we demonstrate the efficiency of ATVIM model with
extensive experimental evaluation. This new model and earlier indexing
model such as MVI are implemented using C++. The set of experiment
explores the effects of data and query parameters on performance.
We use synthetic datasets (100K-1M) that contain objects whose
search
key (ie. OID) are uniformly distributed in the data set. All experiments
reported in this section were conducted on Pentium 4 2.x Ghz with
256MB RAM and 80 GB of secondary storage, running Windows 2000.
In order
to test the model and to find the performance, about 100000 objects
were created and various queries were implemented on a particular
index.
Storage Cost
In all experiments performed, we have obtained that the traditional
index has the lowest storage cost. In particular, the MVI organization
has less cost when compared to the ATVIM organization. However,
the storage costs may not be crucial, since large capacity
storage devices
are today widely available for lower cost. Therefore, it may
be preferable to design models that provide good performance,
even if they have
large storage requirements.
Retrieval cost
Let h is the height of the index, np is the number of pages
that are to be accessed, ap is the probability of accessing
another
page if
it is not available and b is hashing overhead and l is
the version lookup cost. The cost of evaluating the query in
ATVIM model is

Where as is the average size
of a primary record and ps is a page size. The number of pages that
are accessed (np) is a product of average
number of pages required per record(pp) and number of classes
(target of the query). The ap can be computed using (pp –np)/pp.

Fig.2: Availability checking Fig. 2 shows
the amount of comparison needed for checking the availability of the
particular object. The ATVIM model needs just one comparison to know
that availability of the particular object. But, the MVI model has
to scan through the entire index structure that consumes more time.

Fig. 3: Measuring the height
Fig. 3 illustrates the changes in the height of the tree across our
given data set. The MVI model uses single B+ tree in which entire dataset
has to be spread over. As the OIDs are uniformly distributed, the new
model spreads the entire OIDs in various B+ trees in the structure
that drastically reduces the height of the tree.
Fig. 4 compares the query performance of the two approaches. We applied
a query that fetches the object versions from the same page. The
time overheads were measured for retrieving the versions.

Fig. 4: Searching performance
We have presented the results of three experiments comparing the query
performance of the alternative indexing schemes. One goal of these
experiments was to determine the conditions under which placing version
selection information in the table and the leaf node that reduces the
query cost. On the positive side, such information can be used by a
query to directly access an object version from its leaf index entry
without having to go through one or more intermediate pages.
5 CONCLUSION
The model ATVIM proposed in this paper incorporates the aspects of
signature patterns, B+ tree and hashing techniques to make efficient
data retrieval. This model provides a flexible method to store and
retrieve multiple versions. We conducted a simulation study of the
two models and analyzed the results of this study. The result of this
study indicates the importance of maintaining a hash table with signature
patterns on index structures, as this turned out to be a key factor
in determining the relative performance of the multiversion indexing
schemes.
REFERENCES
[Bertino94] E Bertino: “A survey of indexing techniques for
OODB”,
query processing for advanced data base systems,Morgan Kaufman publishers,
inc, 1994.
[Bober97] Paul M.Bober, Michel J.Carey: “Indexing for multiversion
locking: alternatives and Performance evaluation”, IEEE trans
on KDE, vol.9 no.1, Jan’97.
[Hulten94] Anders Bjornerstedt, Christer Hulten: “Version control
in an Object Oriented architecture” Addison Wesley publishing
Co., First ed.1994.
[Lomet89] David Lomet et.al.: “Access methods for multiversion
data” ACM, pg 315-324,1989.
[Norvag01] Kjetil Nervag, “The Vagabond Temporal OID index: an
index structure for OID indexing in TOODB”,The VLDB Conference,2001.
[Norvag99] Kjetil Norvag: “Efficient use of Signatures in OODB” Proc.
Of ADBIS’99,1999.
[Norvag98] Kjetil Norvag, K baratbergsengen: “Optimizing OID
indexing cost in TOODB”, Proc. Of FODO’98,1998.
[Sciore93] Edward Sciore : “Versioning and Configuration management
in an Object Oriented data model“, The VLDB journal, June’93.
[Thangaraj99]M Thangaraj et.al.: “A new integrated indexing
model for OODB”, Proc. Of ROVIPIA, Malaysia, 1999.
[Thangaraj03]M Thangaraj, V R Kavitha: “An efficient multiversion
control in Object Oriented data base systems”, Proc. Of NACAC,
Oct 2003.
About the authors

|
|
G. Arumugam received
his post-graduate degree in Applied mathematics from PSG College
of Technology,
Coimbatore and Ph.D degree from University of Piere and Marie
curie, Paris, France in 1987. He is now the Professor and Head
of Computer Science department at Madurai Kamaraj University,
Madurai, TamilNadu, South India. He is an active researcher in
databases, data mining, Bioinformatics and mobile computing and
has published more than 50 papers in journals and conference
proceedings. Email: gurusamyarumgam@yahoo.co.in
|

|
|
M. Thangaraj received
his Post-graduate degree in Computer Science form Alagappa University,
Karaikudi
and M.Tech degree from
Pondicherry, South India in 1998. He is now the research scholar
cum senior faculty member in the computer science department
at Madurai Kamaraj University, Madurai, Tamil Nadu, South India.
His research area is databases and data mining and has published
more than 20 papers in journals and conference proceedings.
Email: thangarajmku@yahoo.com
|
Cite this column as follows: G. Arumugam and M. Thangaraj: “An
efficient multiversion access control in a Temporal Object Oriented
Database”, in Journal of Object Technology, vol. 5, no. 1, January-February
2006, pp. 105-116, http://www.jot.fm/issues/issue_2006_01/article4
|