Open Types and Bidirectional Relationships
as an Alternative to Classes and Inheritance
Christian Heinlein, Dept. of Computer Science, University of Ulm, Germany |
 |
REFEREED ARTICLE

PDF Version |
Abstract
Open types are presented as a simple yet powerful data model for statically typed procedural
and object-oriented programming languages, that overcomes the limitations
of the traditional record-oriented model. The basic idea is to separate type definitions
from the definitions of their attributes in order to allow incremental definitions of the
latter. Furthermore, bidirectional relationships are introduced as pairs of mutually inverse
attributes whose values will be kept consistent automatically. Finally, anonymous
and automatic attributes are presented as a simple extension of attributes that unifies
the concepts of aggregation, inheritance (including subtype polymorphism), and
user-defined type conversions. Since open types do not possess methods in the objectoriented
sense, global virtual functions are used as an alternative and more flexible
means to define behaviour, while modules are employed to achieve information hiding
and type-safe separate compilation. The basic implementation ideas of open types as
a language extension for C++ are described.
1 INTRODUCTION
Since the advent of procedural programming languages in the 1960s, data structures
appearing in programs are typically modeled as records, and even the object-oriented
notion of a class is basically a record equipped with accompanying procedures or
methods. Even though data modeling based on records, e. g., modeling a person
as a record possessing fields (or members) such as name, date of birth, address
(which might itself be a record), etc., is rather natural and straightforward, a severe
limitation of records is the fact that they are fixed: Once a record type has been
defined, the set of its fields is invariably determined, and all its instances will possess
exactly these fields.
Variant records in procedural languages and subclasses in object-oriented languages
provide some more flexibility in this regard, but a particular type or class
still remains fixed once it has been defined. So, even though it is possible, for instance,
to "extend" a given class Person by defining a subclass PersonWithSsno in
order to model persons with a social security number (ssno), the base class Person actually remains unchanged (so the term "extend" is quite misleading). This is particularly
problematic when an existing program (e. g., a person management system)
shall be extended later in an unanticipated way: If the original source code is not
available or shall not be modified for reasons of modularity, introducing a subclass of Person does not help since the original code still creates instances of the original
class.
To overcome these kinds of problems, aspect-oriented programming languages
provide so-called inter-type member declarations [22] or introductions [32] in order
to actually extend an existing class definition with new data fields (and methods) in
a modular way (i. e., without needing to touch existing source code), without defining
a new class for that purpose. Interestingly, the possibility to truly extend the set of
data fields of a class later actually makes the original possibility to define a class
with data fields superfluous: It is always possible in principle to define a class empty,
i. e., without any fields, and to add them incrementally later.
If a particular class has been extended with additional data fields multiple times,
typically by different "aspects" of a program, it might happen that many instances
of the class do not actually need all the fields introduced that way. For example,
the social security number might be stored only for persons having an employment,
while information about a person’s spouse and children might be needed only for
those receiving child benefit. But even for "conventionally" defined data structures,
where all data fields are defined at once, it might happen that fields of particular
instances remain unused simply because particular information is either unknown
or irrelevant. Depending on the total number of unused data fields in a program, it
might be worthwhile to have an underlying data representation format that stores
only those data fields of an object which are actually used, similar to the way
database systems optimize storage of objects containing null values.
Finally, bidirectional relationships between data types, e. g., between a person
and the cars it owns, are hard to express naturally with record-based data models,
since conceptually they do not belong to either type, but rather constitute entities
in their own right. In order to implement them in procedural or object-oriented
programming languages, it is necessary to model them as dual pairs of data fields
of the participating record types, one for each "direction" of the relationship, which
must be kept consistent explicitly. If, for instance, a car changes its owner, not only
must its owner field be changed accordingly, but also the cars fields of both the old
and the new owner must be modified to correctly reflect the new situation.
To overcome all these limitations of traditional record and class types mentioned
above, open types will be presented in this paper as an alternative data
model for statically typed procedural and object-oriented programming languages.
Beforehand, however, Sec. 2 gives some introductory remarks about the decision to
provide open types as a syntactical extension of an existing language (viz. C++)
instead of developing either a completely new language or a library for an existing
one. Afterwards, Sec. 3 briefly presents null values as a generally useful concept
which is particularly relevant for the design of open types.
After these preparations, Sec. 4 introduces the basic notion of open types, together
with single- and multi-valued attributes, while Sec. 5 describes bidirectional
relationships as pairs of mutually inverse attributes whose values will be kept consistent automatically. Section 6 presents anonymous and automatic attributes as a
simple extension of attributes that unifies the concepts of aggregation, inheritance
(including subtype polymorphism), and user-defined type conversions.
Since open types are only one of two fundamental concepts of so-called advanced procedural programming languages1, the complementary concept of global virtual functions published earlier [14, 16] is recapitulated in Sec. 7, followed by a description
of modules to support encapsulation, information hiding, and type-safe separate
compilation in Sec. 8. To give a coherent example for the application of these concepts
in concert, Sec. 9 presents a solution of the well-known expression problem [35].
The basic ideas to implement open types, both on the linguistic level and in
terms of low-level memory management and garbage collection, are presented in
Sec. 10, before the paper is concluded with a discussion of related work in Sec. 11
and a general conclusion in Sec. 12.
2 DESIGN ALTERNATIVES
Generally speaking, there are at least three different possibilities for providing a new
programming language concept: It might be implemented either as a library for an
existing language, possibly accompanied by a framework of design patterns or coding
rules (e. g., JBoss AOP [17], which implements aspect-oriented concepts as a Java
library), or as a syntactical extension of an extension language (e. g., AspectJ [22],
which extends Java with aspect-oriented language constructs), or in a completely new language.
After careful evaluation of the pros and cons of these alternatives, it has been
decided to implement open types as a precompiler-based language extension for the
following reasons:
- Providing them as a pure library framework for a statically typed procedural
or object-oriented language turned out to be too complex and unwieldy to use.
In fact, the code generated by the precompiler for open types and attributes
is so verbose and complicated that no programmer would volunteer to write
it manually.
- The effort for developing a completely new advanced procedural programming
language based on open types, global virtual functions, and modules, has been
regarded too high for now, even though it is still considered a long-term goal.
- Even though a precompiler-based implementation of a language extension has
some well-known limitations, e. g., with respect to proper diagnosis of and
recovery from syntax errors, it allows to construct a working environment for the new language concept, which can be used to gain valuable practical experience with it, in a comparatively short amount of time.
The decision to actually use C++ as the base language has been influenced, amongst
others, by the following considerations:
- Since open types are designed as a statically type-safe concept, the base language
must at least support the notion of static types and provide a sufficient
amount of static type checking. Therefore, purely dynamically typed languages
such as Smalltalk or Lisp have been ruled out. The fact that static type checking
in C++ can always be circumvented by dirty tricks (such as taking the
address of an object, casting it to a different pointer type, and dereferencing
it) has been accepted as a minor drawback, since accidental violations of the
static type system are detected by the compiler.
- In contrast to (more) pure object-oriented languages such as Eiffel or Java,
C++ is actually a multi-paradigm language [24] that supports traditional procedural
programming as well. Since open types and global virtual functions
are more related to traditional record types and procedures than to classes and
methods, they fit in more naturally with a language providing these concepts
than with a bare object-oriented language. Since the resulting language called
C+++ supports object-oriented programming alongside with traditional and
advanced procedural programming, a programmer can (and should) select a
particular subset of the language which fits his needs and personal preferences.
In particular, when using open types, all stuff related to classes with all its
accompanying complexity becomes dispensable.
- C++ provides a rich set of independently useful concepts (one might also say
that it is conceptually overloaded), such as templates, function and operator
overloading, default arguments, etc., which are not only useful for application
programmers, but also for the code generated by the precompiler, which, for
instance, makes heavy use of templates.
Despite these arguments for C++ (and against pure object-oriented languages), it
should be noted that the basic concept of open types is actually language-independent
and might in fact be incorporated into any statically typed procedural or objectoriented
programming language.
3 NULL VALUES
The concept of null values, i. e., special "values" representing the absence of any
real value, is provided in various forms by some programming languages, e. g., nil in Lisp, None in Python, null in Java, or NaN (not a number) in languages providing
IEEE floating point arithmetics [12]. While the latter can be used with well-defined semantics just like real values (where the value of an expression containing one or
more NaN operands is NaN, too), using any of the former in an operation usually
leads to a run time error. Furthermore, statically typed languages typically do not
provide null values for all types of the language, but mainly for reference types.2
As has been shown in more detail in [15], however, providing null values with
well-defined behaviour as a general concept for all types of a language yields several
benefits and naturally solves some otherwise intractable problems, in particular:
- If a variable is not explicitly initialized, it naturally possesses no value, i. e.,
null, instead of an undefined value or a default value such as zero.
- If a function does not explicitly return a value, because it does not contain or
execute a return statement, it naturally returns no value, i. e., null.
In fact, many dynamically typed languages actually exhibit such a behaviour. On the
other hand, attempts to statically detect these situations at compile time (as, e. g.,
in Java) are necessarily limited to conservative flow analyses due to the theoretical
undecidability of the underlying problems, which frequently lead to annoying "false"
error messages.
To incorporate such a general notion of null values into C++, wrapper types
called integer, character, etc. are provided for the built-in types int, char, etc.
whose parameterless constructor yields a null value which is different from any real
value of the type (including zero). This implies that variables of such a type which
are not explicitly initialized, are implicitly initialized to null. To conveniently check
whether a given value is null, an implicit conversion to a Boolean value is provided
that returns true for all real values (including zero!) and false for a null value. To
achieve that functions that do not explicitly return a value implicitly return null,
a corresponding return statement is added by the precompiler at the end of every
function body.
For the design of open types, only the fact that the parameterless constructor
of a type logically returns nothing, will be essential, while the other aspects of null
values might be considered independently useful.
4 OPEN TYPES AND ATTRIBUTES
Based on the preparations of the previous section, this section introduces the core
concepts of open types and attributes, using the simple conceptual data model depicted
in Fig. 1 as an example, while subsequent sections about bidirectional relationships and anonymous and automatic attributes and relationships will describe
some important extensions of this basic model.

Figure 1: Conceptual data model
Type and Attribute Definitions
An open type is simply defined by declaring its name with the keyword typename,
e. g.:

This corresponds to drawing the rectangles labeled Person and Car in the graphical
data model.
Afterwards, a single-valued attribute such as name, corresponding to a data field
in record notion, can be defined by declaring it as a kind of mapping from Persons
to strings (where the latter type constitutes a predefined atomic type depicted by
an ellipse instead of a rectangle in order to distinguish it from user-defined open
types):

This corresponds to drawing an arrow labeled name from Person to string in the
graphical data model. Simultaneously, the right hand side of the definition (string name;) looks identical to a C++ (member) variable definition.
Similarly, a multi-valued attribute such as gnames (given names), corresponding
to a data field whose type is an array or container type, is defined by using a double
instead of a single arrow to indicate the multi-valuedness:

Again, this corresponds directly to drawing a corresponding arrow in the graphical data model.
Since type and attribute definitions are separated, it is easily possible to add
new attributes of a type on demand, either in the same or in a different translation
unit, without needing to change the original type definition. It is even possible to
dynamically load modules containing additional attribute definitions for a type, even
after objects of this type have been created.
Objects and References
The value of an open type variable or expression is either an internal pointer or reference3 to an object of that type or a null reference, in the same way as the value
of a variable or expression whose type is a Java class is either a reference to an
object of that class or the special value null. As a consequence, copying an open
type value simply means to copy the reference instead of the referenced object, while
comparing two values simply means to compare their references.
In contrast to Java, however, where null constitutes a generic value compatible
with all reference types, there is a separate null value for each open type that can be
obtained by calling its parameterless constructor. To check whether a given value is
null, an implicit conversion to a Boolean value is provided that returns true if the
value actually references an object and false if it constitutes a null reference (cf.
Sec. 3).
Constructors and Mutators
To create, initialize, and modify objects of an open type T, the following constructors and mutators are provided.
As already mentioned, the parameterless constructor T(), which might either be
called explicitly or is called implicitly for variables of type T which are not initialized
explicitly [33], returns the null reference of type T, i. e., a reference to no object. In
contrast, the attribute-initialization constructor T(@attr, val) creates a distinct new object of type T, i. e., an object that is different from null and any other object,
initializes its attribute attr with value val, and returns a reference to it. Since open
types are designed to be statically type-safe, attr must have been declared as an
attribute of type T, and val must be assignment-compatible with its target type.
Similarly, the attribute mutator obj(@attr, val) modifies the object (referenced
by) obj by setting the value of its attribute attr to val (if attr is a singlevalued
attribute) or by appending val to its values of attribute attr (if attr is
a multi-valued attribute). Expressed differently, obj(@attr, val) always adds val to obj’s values of attr, after discarding any previous value if attr is single-valued.4
Again, to ensure static type safety, attr must have been declared as an attribute
of obj’s (static) type, and val must be assignment-compatible with its target type.
As a result of the operation, the object (reference) obj is returned, which allows
straightforward combinations of a constructor call with one or more mutator calls
to create an object with multiple initial attribute values, e. g.:

Here, the constructor call Person(@name, "Hoare") creates a new Person object,
initializes its attribute name with the string "Hoare", and returns (a reference to)
the object. This object (reference) is directly used in the mutator call ...(@gnames,
"Charles") which initializes its attribute gnames with "Charles" and returns the
same object (reference). This is again used in and returned by the subsequent mutator
calls ...(@gnames, "Anthony") and ...(@gnames, "Richard") which in turn
add the strings "Anthony" and "Richard" to the values of attribute gnames. Finally,
the object (reference) returned by the last mutator call is assigned to the Personvariable p.
Of course, it is also possible to perform constructor and mutator calls separately,
e. g.:

To create an empty object, i. e., a distinct object which is different from null and
any other object, but does not possess any attribute values yet, the empty constructor T(@) can be used, which is called with a pseudo-argument @ to distinguish it from
the parameterless constructor T(). Therefore, a call T(@attr, val) to the attributeinitialization
constructor is actually just a shorthand for T(@)(@attr, val), i. e., a
call to the empty constructor followed by an appropriate mutator call.
In addition to these predefined constructors of open types, it is possible to define
arbitrary user-defined constructors, e. g.:

In contrast to normal C++ constructors (and constructors in other object-oriented
programming languages), which must be defined (or at least declared) inside their
class and must not explicitly return anything, but rather initialize the implicitly
available object this, user-defined constructors of open types are much like ordinary
(global) functions whose result type and name coincide (and therefore only one
of them is specified in their definition). In particular, there is no implicitly available
object this, and the constructor must explicitly (create and) return an object, typically
by calling one of the predefined constructors. Furthermore, just like attributes, constructors can be defined incrementally on demand.
Attribute Inspections
To inspect the attribute values of a given object, the attribute inspection operator @ can be used, quite similar to the way the dot operator is used to access class members
in C++ and other languages, e. g., p@name or p@gnames.5
For a single-valued attribute such as name, its current value is returned, i. e., the
value that has been set for this attribute by the most recent mutator or attributeinitialization
constructor call for this object. If none of these operations has been
performed for the object yet, i. e., the attribute does not possess any value, nothing is returned conceptually, i. e., a null value of the attribute’s target type (i. e., string
in the current example) that is obtained by calling its parameterless constructor (cf.
Sec. 3).6
If a multi-valued attribute such as gnames is inspected with the @ operator,
the values added to this attribute by all mutator (and attribute-initialization constructor)
calls performed for this object so far are returned as a (possibly empty) ordered sequence. Even though it is possible to grasp such a sequence as a whole, it
is typically processed element by element using a tailored iteration statement, e. g.:

This prints p’s given names in the order in which they have been added, i. e., Charles
Anthony Richard. Alternatively, it is possible to directly inspect individual values
of such a sequence by applying the standard index operator, e. g., p@gnames[2], to
obtain the second given name of p, i. e., "Anthony". Similarly to inspecting the value
of a non-existent single-valued attribute, inspecting a non-existent value of a multivalued
attribute by using an out-of-range index yields nothing, i. e., a null value
that is obtained in the same way as described above. Therefore, expressions such as p@gnames[0] or p@gnames[4] will return a null string in the current example.
Similar to an attribute mutator operation, an attribute inspection obj@attr is
type-correct, if attr has been declared as an attribute of obj’s static type, and the result type of the operation is equal to the attribute’s target type (for single-valued
attributes) or a sequence of it (for multi-valued attributes), where a sequence is
defined as a type-safe container similar to an array. In either case, the attribute
inspection operator returns an R-value [33], i. e., a value which must not occur on
the left hand side of an assignment operator. Therefore, attribute update operations
must only be performed by mutator calls, not directly by assignments such as:

If such an assignment would be allowed, the attribute inspection operator could
not simply return a null value for a non-existent attribute, but would be forced to
implicitly create the attribute in that case, i. e., to allocate space for it, in order to
be able to return an L-value referring to it. This would lead to many unnecessary
attribute creations in practice.
Inspecting and Modifying Null Objects
Trying to inspect or modify the "object" referenced by a null pointer is illegal in
C++ and many other languages and usually leads to a run time error such as a SIGSEGV (segmentation violation signal) or a NullPointerException. In contrast,
inspecting and modifying attributes of open types is well-defined even for null objects/
references: While inspecting such an attribute is equivalent to inspecting a
non-existent attribute, i. e., returns null, modifying such an attribute simply has no
effect. The main reason for these unusual definitions is convenience, since they allow
to omit many otherwise necessary checks. To test, for example, whether p’s name
is "Hoare", one can simply write if (p@name == "Hoare") – instead of if (p &&
p@name == "Hoare") – even if p might be null; if it actually is, p@name is null,
too, and therefore, as expected, different from "Hoare". Similarly, testing whether
the third given name of p is "Richard", one simply writes if (p@gnames[3] =="Richard"), without needing to check explicitly whether p is not null, whether it
has given names at all, and whether it has a third given name. Simultaneously, programs
tend to become more robust since inadvertently omitted checks will not lead
to run time errors, but usually merely to unsatisfied conditions.
Similarly, the fact that mutator calls on null objects are silently ignored, frequently
reduces the need to explicitly distinguish between real and null objects, and
again, inadvertently omitting such distinctions does not lead to run time errors. Reference
[15] contains a more detailed discussion of the pros and cons of this unusual
definition.
Garbage Collection and Object Deletion
In contrast to normal C++ objects, which must be explicitly deleted by the programmer
to reclaim their storage, objects of open types are automatically garbagecollected when they have become unreachable, quite similar to objects of classes in
Java, Eiffel, Smalltalk, and many other programming languages.
In addition to and independently from this automatic storage reclamation, it
is also possible to explicitly delete objects – even while they are still referenced.
Although this might appear strange at first sight, there are reasonable practical
examples where this is useful. To give an extreme one, if a person has died, it simply
does not exist anymore, no matter whether there are still "references" to it. Similarly,
if a car has been scrapped, it does not exist anymore, even though its (previous)
owner might still reference it. In Sec. 6.7, another important application of explicit
object deletion will be presented.
In contrast to C++, however, where the deletion of an object might lead to
dangerous dangling pointers, deletion of an open type object causes all remaining
references to the object to become null immediately and automatically. By that
means, it is always possible to reliably detect that an object has been deleted.
Furthermore, since null objects can be safely inspected and modified, too, neither
run time errors nor undefined behaviour will occur if deleted objects are used without
care. Since object deletions might be performed unexpectedly, this is another strong
argument for the unusual definitions given in Sec. 4.5.
Enumerations and Variant Records
Open types can be used in a straightforward way to define extensible enumeration
types by defining the enumeration values as constants initialized with (unique)
empty objects, e. g.:7

Furthermore, the fact that attributes of an open type are actually optional, i. e.,
individual objects need not possess values for all attributes, can be exploited to
model variant records without requiring additional language constructs, e. g.:

When used properly, objects of type Expr either contain a value val or an operator op plus either a body body or two operands left and right. This is reflected by
the following user-defined constructors:

Interestingly, Expr objects do not need a separate "tag field" encoding the particular
kind of expression: If an object x possesses a value val, i. e., if x@val is not null, it
is considered a constant expression; otherwise, if it possesses a body, i. e., if x@body is not null, it is considered a unary expression and x@op is expected to contain
its operator; finally, if an object x possesses operands x@left and/or x@right, it is
treated as a binary expression with operator x@op. It should be noted, however, that
it is also possible to model data structures in a more object-oriented way, resembling
type hierarchies with sub- and supertypes, as described in Sec. 6.3.
Since the underlying implementation will not consume space for unused attributes
of an object (cf. Sec.10.1), using attributes "generously" is not a problem.
In particular, it is not necessary to apply tricks such as reusing, e. g., left or right to store the body of a unary expression or to encode the operator of a compound
expression in the attribute val in order to save space.
5 BIDIRECTIONAL RELATIONSHIPS
Basically, a bidirectional relationship between two open types is also a kind of mapping
from one type to the other, with the additional possibility to directly access the inverse mapping. Since both of these mappings might be either single- or multiplevalued,
there are four different kinds of relationships altogether, one to one, one to
many, many to one, and many to many, expressed by corresponding bidirectional
arrow symbols <->, <->>, <<->, and <<->>, respectively. Furthermore, there are two
special kinds, i. e., symmetric one-to-one and many-to-many relationships, where
both mappings coincide.
Relationship Definitions
To continue the example depicted in Fig. 1, the one-to-many relationship between Person and Car called cars resp. owner can be defined as follows:

Reading this from left to right (and omitting the attribute name on the left side)
yields a multi-valued attribute cars of type Person, while reading from right to left
(and omitting the attribute name on the right side) yields a single-valued attribute owner of type Car representing the inverse mapping:

However, only by combining both attribute definitions into a single relationship
definition as shown initially, they are actually treated as mutually inverse mappings,
which means that a call to one of the mutators automatically implies a corresponding
call to the other mutator with reversed roles.
For example, a mutator call such as p(@cars, c), adding c to p’s sequence
of cars, implies the call c(@owner, p), assigning p as c’s owner, and vice versa.
Furthermore, if c already possesses another owner q when either such call is made, c is first removed from the sequence of q’s cars.
According to the rules for unidirectional attributes (cf. Sec. 4.5), the mutator
calls p(@cars, c) and c(@owner, p) exhibit the following behaviour if p and/or c is null:
- If p is null, but c is not, p(@cars, c) has no effect per se, while c(@owner, p) sets the owner of c to null, i. e., removes the current owner q of c, if any,
which implies the removal of c from q’s sequence of cars. However, since either
mutator call implies a call to the other, even p(@cars, c) will cause c to
become owner-less.
- If c is null, but p is not, c(@owner, p) has no effect per se, while p(@cars, c) adds a null car to p’s sequence of cars. Again, since either mutator call implies
a call to the other, even c(@owner, p) will cause p to receive a null car.
- If both p and c are null, both p(@cars, c) and c(@owner, p) have no effect.
Similar rules exist for the other kinds of relationships, where a many-to-one relationship
is equivalent to a one-to-many relationship with reversed roles. Even though
the precise specification of these rules is somewhat lengthy in order to cover all
possible combinations of null and non-null values, their intuitive meaning is simply
expressed by the fact that the two mappings defined by a relationship are inverse to
each other. To preserve this property, it is sometimes necessary to modify additional
objects which are not directly mentioned in an update operation. For example, if a
one-to-one relationship between two objects is established, up to two other objects
might be involved in the operation which are currently associated with the objects
in question.
Symmetric Relationships
Symmetric relationships such as spouse (one to one) and friends (many to many)
are almost completely equivalent to ordinary relationships whose types and names
are the same for both directions. Therefore, the right hand side of the definition is
simply omitted, e. g.:

According to the normal rules for relationships, a mutator call such as p(@spouse, q) or p(@friends, q), declaring q as p’s spouse resp. friend, implies the inverse call q(@spouse, p) resp. q(@friends, p), declaring p as q’s spouse resp. friend, and
vice versa. The only exception to this rule is that the inverse call is suppressed if p and q refer to the same object, i. e., in that case p is declared only once as its own
spouse (which is semantically strange, of course) resp. friend.
6 ANONYMOUS AND AUTOMATIC ATTRIBUTES AND RELATIONSHIPS
Anonymous Attributes and Relationships
Sometimes, the most appropriate name for an attribute is simply the name of its
target type. Given, for example, an open type Address, an attribute of Person
representing a person’s address might simply be called Address, too, instead of
inventing an artificial different name such as address. Viewed differently, such an
attribute does not possess an explicit name (cf. the unnamed arrow between Person and Address in Fig. 1), but rather implicitly receives the name of its target type.
Therefore, such attributes are called anonymous, and their declaration does not
introduce a new name, e. g.:

In order to inspect and modify the values of an anonymous attribute, the name of
its target type is used, e. g.:
Similarly, if one or both names of a relationship definition are omitted, the corresponding
type names can be used instead, e. g.:

Automatic Attributes and Relationships
If the arrow in an attribute declaration is followed by an exclamation mark, the
attribute might be applied automatically on demand to perform an implicit type conversion from its source type (left of the arrow) to its target type (right of the
arrow), e. g.:

This declares an integer attribute ssno for type Person, which can be used just
like any other attribute, with the additional property that an expression of type Person is implicitly convertible to an integer value by automatically applying this
attribute.
Similarly, it is possible to declare automatic relationships by adding an exclamation
mark before or after the bidirectional arrow, depending on which direction of
the relationship should be automatically applicable.
Modeling Type Hierarchies
Automatic one-to-one relationships can be exploited to model object-oriented type
hierarchies with subtype polymorphism, without requiring any additional mechanisms.
For example, a new type Student might be defined as a "subtype" of Person
by declaring a one-to-one relationship between these types that is automatically
applicable from the derived type to the base type:

Typically, but not necessarily, such "subtype" relationships are anonymous. In the
graphical data model (Fig. 1), the automatic direction of a relationship is indicated
by an open instead of a filled arrow head.
If another regular attribute number denoting a student’s matriculation number
as well as a user-defined constructor accepting a person "part" and a matriculation
number is defined, a student named Peter Clark with matriculation number 777
can be created and used as follows (employing the user-defined Person constructor
introduced in Sec. 4.3):

Because the relationship between Student and Person is applied automatically on
demand, the subexpression s@name is actually equivalent to s@Person@name. Furthermore,
if a function expects a Person argument p, it can be called with a Student object s, too, which will be implicitly converted to its associated Person object s@Person, i. e., Student objects can be used polymorphically as Person objects.
If a student object is accidentally created without an associated person object, an
implicit conversion to Person – and, consequently, accessing any person attributes –
would yield null. In practice, this can be avoided by always calling the above userdefined
constructor instead of an attribute-initialization constructor and mutators
to create a student. By declaring attributes in a private or protected section of a
module (cf. Sec. 8.2), it is even possible to completely prevent such undesired calls
outside the module. Of course, it is also possible to define another Student constructor accepting a
given name and last name instead of a pre-built person object in order to hide the
explicit creation of this "subobject:"

By using this constructor, the fact that a student object actually consists of two
associated subobjects (cf. Fig. 2) remains completely hidden. However, for reasons
explained below (Sec. 6.4), it is always recommended to provide a constructor accepting
a pre-built object of its base type, too.

Figure 2: Student object s with associated person subobject p
The fact that the relationship between Student and Person is bidirectional can
be exploited to check whether a given Person object "is" actually a student, i. e.,
to perform a dynamic type test, and to access its student attributes if appropriate,
i. e., to perform a downcast (cf. Fig. 2):

This corresponds roughly to a dynamic_cast in C++ which returns a valid pointer
to an object of a derived class if the cast has been successful and a null pointer
otherwise.
By employing automatic relationships to model object-oriented type hierarchies,
the traditionally distinct or even conflicting concepts of aggregation (expressed by
normal attributes and relationships) and inheritance (expressed by automatic attributes
and relationships) have been merged into a single coherent concept. Furthermore,
the fact that relationships can be defined incrementally, allows "supertypes"
of a type to be declared later on, e. g.:
Despite its practical usefulness, such a possibility is missing in most statically typed
object-oriented programming languages.
Multiple Inheritance
Of course, it is possible to use automatic relationships to model type hierarchies with
multiple inheritance, too. For example, one might define a type EmployedStudent that is derived from both Student and another type Employee:

Since both of these types in turn "inherit" from Person, the typical question arises
whether an EmployedStudent object should possess one or two Person subobjects,
i. e., whether Person is, in C++ terminology, a virtual base type or not. In C++,
the corresponding decision must be taken when the types Student and Employee are defined, even though it does not make any difference for these types. Therefore,it would be much more logical to answer the question when EmployedStudent is
defined, because only for this type (and possible subtypes of it) the distinction is
relevant. However, the concept of automatic relationships does not provide a way to
specify the difference at the level of declarations: The four <->! relationships between
the types Person, Student, Employee, and EmployedStudent merely specify that
there are two ways to convert an EmployedStudent to a Person, either via Student or via Employee, but they do not specify whether these ways lead to the same
destination, i. e., to the same Person object, or not. Even though this appears to
be disadvantageous at first sight, it will turn out to be the most flexible approachbr>
possible.
To actually distinguish between virtual and non-virtual inheritance, one simply
creates either one or two Person "subobjects" when creating an EmployedStudent object in a constructor, e. g.:

Here, a single Person object p is created (by calling the user-defined Person constructor
defined in Sec. 4.3), that is passed to both the Student and Employee constructors to create objects s and e, respectively, which share the subobject p.
Afterwards, an EmployedStudent object es with subobjects s and e is created and
returned (cf. Fig. 3). Therefore, converting an EmployedStudent object created by
this constructor to type Person always yields the same Person subobject, no matter
whether the conversion is done via Student or via Employee. However, since
the compiler cannot know how a particular EmployedStudent object has been created,
its conversion to Person is nevertheless syntactically ambiguous, and Sec. 6.5
discusses possibilities to resolve this.
The above example demonstrates the typical structure of a user-defined constructor,
which calls (typically user-defined) constructors of its base types to construct
the necessary subobjects and finally calls the predefined attribute-initialization constructor
(plus mutators) of its own type to assemble the complete object.
Even though this parallels the structure of class constructors in C++ (and other
languages), where a constructor of a derived type is forced to (explicitly or implicitly)
call constructors of its base types, it is actually more flexible since the rules for calling
these constructors – especially when virtual base classes are involved – are not hardwired
in the language specification, but can be freely defined by the programmer. In particular, it is easily possible to combine virtual and non-virtual inheritance to model types such as F shown in Fig. 4 which inherits from four base types B,
C, D, and E, which in turn inherit from two partially shared "incarnations" of A.

Figure 3: Employed student with shared person subobject
Modelling such a structure in C++ would require artificial intermediate classes A1
and A2 derived non-virtually from A acting as virtual base classes of B and C resp. D and E.

Figure 4: Combination of virtual and non-virtual inheritance
Apart from such rather esoteric examples, the approach to distinguish virtual
from non-virtual inheritance in constructor implementations instead of in declarations appears to be more direct and clear than the C++ approach where the constructor
of a virtual base class is only called from the constructor of the most derived class [33], even though constructors of intermediate classes may contain calls to it,
too, which are simply ignored.
On the other hand, it might be argued that "hiding" the precise semantics of an
inheritance hierarchy in constructor implementations does not contribute to comprehensibility
either. Therefore, it is necessary to clearly document the intended semantics,
e. g., in comments if there should be any possibility of uncertainty. Furthermore,
the way of resolving the syntactically ambiguous conversion from EmployedStudent to Person that will be described in Sec. 6.5 will also help to make the intended
semantics clear, even though it is still not strictly unambiguous.
On the other hand, by postponing the decision between virtual and non-virtual
inheritance to constructor implementations, it would in fact be possible to define
different constructors of the same type with different semantics. For example, one
might define another constructor of EmployedStudent that creates a "schizo" whose
first personality is a student (with given name g1, name n1, and matriculation
number m) while its second personality is an employee (with given name g2, name n2,
and company c), even though this is not expected to be very useful in practice (cf.
Fig. 5):


Figure 5: Employed student with two person subobjects The example constructors of EmployedStudent shown in this subsection have
demonstrated the usefulness of having constructors of base types (Student and Employee in these examples) which in turn accept pre-built objects of their base
type(s) (here Person) as arguments. If, for example, the constructors Student (Person, integer) and Employee (Person, string) would not have been provided,
and the predefined attribute-initialization constructors of Student and Employee can or should not be used due to information hiding concerns (cf. Sec. 8.2),
it would be actually impossible to implement the constructor EmployedStudent (string, string, integer, string) shown earlier. This is because the other
constructors of Student and Employee each create a separate Person object, so
using them to create an EmployedStudent would actually create a "schizo," i. e.,
the "schizo constructor" shown above could also be implemented as follows:

Consequently, since EmployedStudent might also serve as a base type for other
types, it is advisable to provide an additional constructor for it that accepts prebuilt
objects of its (direct) base types, too:

Conflict Resolution
The fact that automatic attributes and relationships are actually just attributes and
relationships implies that, e. g., a Student object does not really inherit anything
from its associated Person object, but rather simply refers to the latter, while the
apparent "inheritance" simply results from the automatic application of the corresponding
attribute. By that means, the full advantages of object-oriented subtype
polymorphism – including multiple and repeated inheritance – are available, while
at the same time avoiding many problems usually associated with these mechanisms
in other languages.
In particular, name conflicts between multiply "inherited" attributes are simply
resolved by qualifying attribute names with the name of their original type, e. g.:

While the inspection of such attributes with the @ operator is straightforward, since es actually possesses (anonymous) attributes named Student and Employee and
the objects es@Student and es@Employee both possess (different) attributes named number, a mutator call such as es(@Student@number, ...) is actually interpreted
as es@Student(@number, ...). The only difference is that the former returns the original object es while the latter would return the intermediate object es@Student.
If the type EmployedStudent itself would possess another attribute number, this
could be directly accessed with its unqualified name, e. g., es@number, since in that
case no implicit conversion of the object es is necessary at all.
Another potential source of conflicts are repeatedly "inherited" types. Rephrased
in open types terminology, such a conflict arises when the graph induced by automatic
attribute declarations contains multiple paths from a source type (such as EmployedStudent) to a target type (such as Person). In such a case, trying to use
an EmployedStudent object es as a Person object is ambiguous since it could mean
either es@Student@Person or es@Employee@Person. As the (of course rather artificial)
example of a schizophrenic employed student demonstrated, both interpretations
might actually lead to different results in practice, even though under "normal"
circumstances they are expected to yield the same person object. While in C++,
these "normal" circumstances can be expressed by declaring Person as a virtual
base type of Student and Employee, this information has not been made explicit
with open types yet. Therefore, trying to assign es to a Person variable or trying
to directly access a person attribute such as es@name actually leads to a compile
time error due to ambiguity, and one has to write, e. g., es@Student@Person@name,
or just es@Student@name since the intermediate student object es@Student is automatically
convertible to its associated person object es@Student@Person.
To avoid this significant inconvenience and to explicitly declare that an Employed-Student shall possess only one Person subobject, an additional direct relationship
between these types can be declared:

At first sight, this does not seem to solve the problem, but rather to aggravate
it, since it introduces a third path from EmployedStudent to Person. However, a direct conversion path between two types is generally preferred over any indirect
path. Therefore, using an employed student object es where a person object is expected
is now unambiguously interpreted as es@Person, not as es@Student@Person or es@Employee@Person. In particular, person attributes such as name might now be
used directly, e. g., es@name (meaning es@Person@name) or es(@name, ...) (meaning
es(@Person@name, ...)).
The final step to make this actually work as expected, however, is to change all EmployedStudent constructors to explicitly initialize the new Person attribute, e. g.
(cf. Fig. 6):



Figure 6: Employed student with direct relationship to person
Since now all three paths from an EmployedStudent object to Person lead to
the same person subobject, this object might also be accessed indirectly via the Student or Employee subobjects.
From a C++ point of view, declaring the direct automatic relationship between EmployedStudent and Person (and appropriately implementing the Employed-Student constructors) achieves the same effect as declaring Person as a virtual
base class of Student and Employee. However, declaring this in the context of EmployedStudent is much more logical, since for Student and Employee themselves
the distinction between virtual and non-virtual inheritance from Person is
completely irrelevant.
On the other hand, the bare declaration of this relationship does not automatically
force programmers to appropriately implement the necessary constructors: In
principle, it would still be possible to implement a "schizo constructor" as shown earlier
which may not even initialize the direct relationship between EmployedStudent and Person – or initialize it arbitrarily to one of the schizo’s personalities or even
to a third personality! Since the programmer has complete freedom, he also has
increased possibilities to make nonsense, of course. Repeated Inheritance
By using one-to-many relationships, it is also possible to model repeated inheritance,
e. g., a student with multiple enrollments:


Dynamic Object Evolution
The fact that an object of a derived type such as Student or EmployedStudent is
actually a network of interconnected subobjects – even though this remains invisible
except when constructing the objects –, can be exploited in a straightforward manner
to implement dynamic object evolution in a completely type-safe manner. For
example, it is almost trivial to transform an object that has been initially created
as a bare person into a student, an employee, or even an employed student later, by
simply creating additional associated subobjects, e. g.:

Instead of directly using the mutator call p(@Student, ...) to transform p to a
student, one might also pass p to the Student constructor accepting a pre-built Person object (and ignore its result):

This is another strong argument for always providing these kinds of constructors.
Conversely, it is also possible to delete subobjects to transform a specialized object to a more general one, e. g.:

Here, it does not matter whether p has been originally created as a Person or a Student (or something else).
By explicitly deleting the student subobject associated with p, all "student references"
to this person (such as s in the example) automatically become null. Otherwise,
if only the association between p and its student object would have been cut,
these references would remain valid, but refer to a degenerate student object that
does not possess an associated person object anymore.8
It is even possible to create "hybrid" objects, such as a person that is both a
student and an employee, even if no common "subtype" of these types (such as EmployedStudent) would exist.
7 GLOBAL VIRTUAL FUNCTIONS
Basic Concept
So far, open types, attributes, and relationships have been described as an approach
to model data structures in programming languages that is much more flexible than
procedural record-based and object-oriented class-based data models. The essential
reason for this increased flexibility is the fact that type definitions are completely
separated from the definition of their "constituents," i. e, their data fields (modeled
by attributes), associations (modeled by relationships), and base types (modeled by
automatic attributes or relationships). Actually, an open type itself is just an "empty
shell" whose "substance" is defined incrementally. (This parallels the philosophical
observation that a concept is meaningless in itself, but gains its meaning only from
its associations to other concepts.)
Based on this principle, it is logically consistent that an open type itself does not
possess any operations (or "methods") either, but that operations on types are defined
incrementally, too. This goes in line with traditional procedural programming
languages where operations are defined as free-standing procedures or functions.
For example, the following global function concatenates all names of a particular
person p:

Since global functions are not bound to any type, it is always possible to add new
functions in a modular way, i. e., without needing to touch any previously written
source code. Even though this fact is almost too trivial to mention, it should be noted
that methods or member functions of classes in statically typed languages usually
do not exhibit this highly desirable property: To add a new method to a class, it
is necessary to change and recompile the class definition as well as to recompile all
code depending on it, which might be a considerable amount.
On the other hand, simple global functions such as the one shown above lack the
also highly desirable property of being overridable or redefinable, that is provided
by methods or virtual member functions. To reconcile these apparently conflicting
properties of global functions and virtual member functions and to combine their advantages
without encountering their disadvantages, global virtual functions (GVFs)
have been proposed earlier [14, 16]. Even though the basic idea is very simple, its
consequences are rather far-reaching: By declaring a global function virtual (which
is not allowed in C++), it is possible to override or redefine it later by simply giving
a new definition. (To use C++ terminology, this means that the one definition rule,
that basically says that no program entity must be defined more than once, does not
hold for global virtual functions.) However, every new definition (which is also called
a branch of the function) is able to call the previous branch on demand by using the
keyword virtual as the name of a pseudo-function referring to this branch.
For example, if the function format shown above would have been defined virtual, it could be redefined as follows:
Here, the redefinition checks whether p actually refers to a real person object before
calling the previous branch to concatenate its names. Otherwise, i. e., if p is null, a
more instructive text is returned than would be produced by the previous branch,
which obviously did not take this case into account. It should be noted that the
previous branch denoted by the keyword virtual is always called without explicit
arguments, since the original arguments of the call are implicitly passed unchanged
(even if the formal parameters would have been modified). Furthermore, if the first
branch calls its previous branch, a predefined "branch zero" with an empty body is
executed, which returns a null value of the function’s result type (cf. Sec. 3).
If a function is redefined multiple times, the order of redefinitions might be
important, even though in many practical cases it is actually irrelevant. If both the
original definition and all redefinitions appear in the same translation unit, their
textual order is decisive: A call to the function will always call the last branch,
which might call the second-last as its previous branch, which in turn might call
the third-last as its previous branch, and so on. If branches are distributed over
multiple translation units, the concept of modules introduced in Sec. 8 determines
their precise order (cf. Sec. 8.3). In any case, however, a simple linear chain of
branches is built up.
Possible Applications
This simple general mechanism, which is hardly more complex than functions in C or procedures in other procedural languages, can be used to solve a broad range of
problems for which many different specialized mechanisms are provided in objectoriented
and aspect-oriented programming languages:
- A redefinition can execute additional prologue and/or epilogue code before/
after calling the previous branch.
This can be used, for example, to implement retroactive synchronization, monitoring,
or profiling, to notify observers of an object, etc. It subsumes the
concept of before/after/around methods in CLOS [21] and similar languages
and that of before/after/around advice in aspect-oriented languages such as
AspectC++ [32].
- A redefinition can execute completely different code, without calling the previous
branch at all.
This can be used, for example, to patch erroneous functions whose source code
is not available, to provide a more efficient implementation for performancecritical
functions, etc. It is also covered by around methods and advice.
- A redefinition can check some condition on its arguments (or any other data)
and call the previous branch only if it is satisfied while signalling an error or
exception otherwise.
This can be used, for example, to retroactively check necessary preconditions
to implement the "design by contract" model, to implement protection, etc.
- A redefinition can check some condition on its arguments (or any other data)
and execute more specialized code if it is satisfied, while delegating the call to
the previous branch otherwise.
This can be used, for example, to treat unanticipated special cases differently,
as in the example of format given above. Furthermore, it can be used to
simulate the well-known dynamic dispatch found in object-oriented languages
by checking whether the function’s first argument is actually an instance of a
subtype of its static type and executing specialized code in that case, while
calling the previous, more general branch otherwise.Since the condition might also check the dynamic types of multiple arguments – or any other predicate over them –, it is equally easy to simulate multiple and even predicate-based dispatch [9].
Since the last possibility – check a particular condition and execute the previous
branch if it is not satisfied – occurs so frequently in practice, some syntactic sugar
is provided to simplify its usage: By specifying a guard consisting of the keyword if and a subsequent condition in the function’s head – after the parameter list and
an optional throw clause, but before the function’s body –, this branch is executed
only if the condition is satisfied, while otherwise the call is delegated to the previous
branch. For example, the redefinition of format shown above might be written more
compactly as follows:

Equivalence of Attributes and Global Virtual Functions
Even though attributes of open types and virtual functions9 have been presented as
separate concepts so far, they are actually interrelated as follows.
A single-valued attribute such as

is actually equivalent to a pair of virtual functions, one for getting and one for setting
the attribute’s value for a particular object:

These functions are called automatically, whenever an attribute inspection x@name or
an attribute mutator x(@name, y) is executed, respectively. While the first branch
of these functions is predefined and cannot be changed directly, it is possible to
define additional branches for them which modify or extend the original behaviour,
for example:

Now, attribute accesses x@name and x(@name, y) will call these redefinitions of the
get resp. set function, which in turn call the original implementations via virtual.
Similarly, a multi-valued attribute is actually equivalent to three virtual functions,
one for getting an object’s sequence of attribute values, one for appending or
inserting a value (in)to it, and one for removing a value from it. Finally, a bidirectional
relationship is equivalent to a pair of attributes, i. e., to four (for 1:1 relationships),
five (1:N and N:1), or six (N:N) virtual functions: a get function for each
direction, plus one or two update functions (i. e., a set function or an insert and a
remove function) for each direction. Since both directions of the relationship shall
be kept consistent automatically, each of the update functions calls the appropriate
function of the other direction with reversed arguments after it has performed
its own update operation. To prevent infinite mutual recursion, however, the other
function is only called if the current function has not itself been called from it.
The fact that all these functions defining attributes and relationships can be redefined,
significantly increases the flexibility of these concepts once more and opens the
door to many advanced applications such as implementations of the Observer Pattern [10] without any preparation or cooperation of the observed objects, transparent
implementation of persistent attributes, implementation of derived attributes, etc.
Due to the equivalence of attributes and virtual functions, the attribute inspection
operator @ as well as attribute initialization constructors and mutators are actually
just syntactic sugar supporting a more object-oriented instead of a functional
or procedural interface to open type objects.
Virtual Constructors
Since user-defined constructors of open types are actually global functions (cf.
Sec. 4.3), it is possible to define them virtual, too, which also increases their flexibility.
Furthermore, the predefined empty constructor of an open type is actually a
virtual constructor, which can be redefined on demand, e. g.:10

Since the empty constructor is the only way to actually create a new object, which
is called directly by each attribute-initialization constructor and either directly or
indirectly by each user-defined constructor, a redefinition of the former is a reliable
way to intercept any object creation and to associate extra code with it.
8 MODULES
Motivation
If types, their attributes and relationships, and the functions operating on them
are all global entities, it is hard if not impossible to write modular software and to
enforce the principle of information hiding [31].
To achieve some basic structuring of a large system into more comprehensible
units and to reduce the danger of accidental name clashes between globally defined
entities, C++ provides the concept of namespaces [33]. If this is combined with the
possibility provided by classes to divide them into public, protected, and private
sections, a simple module concept similar to that of Oberon [37] is achieved. In
particular, information hiding becomes possible if the public part of a namespace –
which is also called a module now – contains only definitions of types, attributes,
and functions which should be visible to other modules (i. e., the externally visible interface of the module), while internal entities are defined in the private part. It
is even possible to reduce functions specified in the public part to bare declarations without a body and to give the full definitions only in the private part.
Protected Definitions
While all entities declared in a private section of a module are completely invisible
to other modules, those defined in a public section are fully accessible outside the
module. In particular, a public open type is provided together with its predefined
constructors (cf. Sec. 4.3), a public attribute or relationship can be both inspected
and updated from within other modules, and a public virtual function can be both
called and redefined elsewhere.
However, when taking the information hiding principle seriously, it should be
possible to separate these issues, i. e., it should be possible to provide open types
without their empty constructors (to force clients to use other constructors since completely empty objects might be semantically meaningless), to provide attributes
and relationships read-only (since arbitrary updates might violate object invariants),
and to provide virtual functions which can only be redefined, but not called from
other modules (which might appear strange at first sight, but will be explained
below). To achieve all these aims, the corresponding entities can be defined in a protected section of a module.
If an open type is provided that way, only its parameterless constructor returning
a null object/reference as well as its attribute-initialization constructor and mutator
can be used outside the defining module. The latter are needed to allow other
modules to define attributes for the type and to initialize resp. modify their values.
However, to enforce the information hiding principle, they can only be called with
attributes whose update function(s) are accessible (cf. below).
If an attribute or relationships is provided in a protected section, only its get function(s) can be used outside the defining module, i. e., it can only be inspected using
the @ operator, but neither initialized nor modified using an attribute-initialization
constructor or mutator.
Finally, if a virtual function is provided in a protected section, it can only be
redefined, but not called from outside the defining module. This can be used, for example,
to provide "hooks" to internal functions of a module at which other modules
can "hang" on extensions (by redefining the functions in a way that always calls the
previous branch). The other combination of "access rights" to a GVF, i. e., that it
can only be called, but not redefined outside its defining module, can be achieved in
principle by declaring it as an ordinary, non-virtual function in the module’s public
part and implementing it as a virtual function in the private part. However, doing so
actually leads the basic idea of GVFs ad absurdum: If functions cannot be redefined
outside their defining module, it would be necessary to modify and recompile this
module whenever (anticipated or unanticipated) changes or extensions to one of its
functions become necessary. Furthermore, inside a single module the possibility to
redefine a function is only of limited usefulness.
Inter-Module Dependencies
In addition to information hiding concerns, another important purpose of modules is
to define a precise initialization order for programs consisting of multiple translation
units.
In standard C++, the order in which global variables of different translation
units are initialized at program startup is completely undefined. In particular, if a
function defined in some translation unit uses the value of a global variable defined
and initialized in the same translation unit, it is not guaranteed that the variable
is properly initialized before the function gets called. Even though it is possible to
circumvent this special problem with a trick (using static variables in functions which are guaranteed to get initialized before they are used), this situation is generally
unsatisfactory. Furthermore, if different branches of the same virtual function are distributed over multiple translation units, their activation order is important to
precisely specify which branch is (currently) the last branch of the function (cf.
Sec. 7.1).
To meet these requirements, a module might be specified to depend on one or
more other modules by listing the names of these "base modules" in a way similar
to the base classes of a class, e. g.:

Here, module D depends on modules C and B which both in turn depend on module A.
Therefore, at program startup time, C and B will be initialized – in this order –
before D, while A will be initialized (once) before B and C, i. e., the overall initialization
order will be A, C, B, D. This implies in turn that the branches of the function A::f11 will be activated in the order indicated by the comments above.
Formally, the module initialization order of a program is obtained by traversing
the directed acyclic graph consisting of modules (nodes) and inter-module dependencies
(edges) in a depth-first, left-to-right manner (where left-to-right corresponds
to the textual order in which base modules of a module are specified), starting at
a designated main module and visiting each module exactly once (i. e., ignoring
already visited ones).
Initializing a single module means to initialize the global variables of the module
by executing their initializer expressions and/or calling appropriate constructors
and to activate the branches of virtual functions defined in the module, where the
order of these steps is given by the textual order of the corresponding definitions.
In particular, variable initializations and branch activations will be performed interleaved
to guarantee on the one hand that all variables defined before a particular
branch are initialized prior to any execution of this branch and on the other hand
that all branches defined before an initializer expression are activated when this
expression is evaluated. Together with the general C++ rule of "declare before use" this guarantees that all variables are properly initialized before their first use.
If, for example, a module defines the following variables and virtual function
branches:

these are initialized resp. activated as follows: First, variable x is initialized to 1.
Then, the first branch of the virtual function f is activated causing it to become the
currently last branch of this function. Next, variable y is initialized by calling this
function, i. e., by executing exactly this branch, resulting in the value 2. Then, the
second branch of f is activated causing it to become the last branch of the function
now. Finally, variable z is initialized by calling this function, too, i. e., by executing
this new branch, resulting in the value 3. The important thing to notice in this
example is that the call to f in the initialization of y does not call the last branch of
the function defined in the whole module – which depends on the value of exactly
this variable –, but rather the last branch activated so far.
To complete these details of module initializations, it should be noted that
there is also a deactivation order of branches which is interleaved with the "deinitialization"
of global variables, i. e., the execution of their destructors. The latter
are executed in exactly the reverse order of the corresponding constructor calls.
Deactivating a branch of a virtual function means to remove it from the linear
chain of branches of this function, causing its previous branch to become the currently
last branch. This is again important if a branch depends on the values of
global variables defined earlier: As soon as a destructor has been executed for such
a variable, its value is undefined, and therefore the branches defined after it should
no longer get executed.
Type-Safe Separate Compilation
Besides promoting information hiding and establishing a precise global initialization
order, modules support the decomposition of a large program into smaller parts
which can be compiled separately without sacrificing static type safety.
If some module B wants to use an entity x defined in another module A, the
latter must be specified as a base module of the former. Afterwards, according to
the normal rules for name lookup in C++, the entity might be accessed either via
its qualified name A::x or via the unqualified name x if it has been "imported" by
a declaration using A::x; To simplify the import of many entities from the same
module, an extended using declaration such as using A { x, y, z }; is provided.
Since each module constitutes a separate C++ namespace, name clashes between
modules are excluded by definition. In particular, if both A and B define an opentype T, no matter whether it is public, protected, or private, this yields two completely
different types A::T and B::T. If both shall be used in a third module C,
at most one of them must be imported by a using declaration to avoid ambiguity.
Similarly, if both A and B define an attribute a for the same type U, this yields
unrelated attributes A::a and B::a, which can be used simultaneously if necessary,
e. g., as U(@A::a, ...)(@B::a, ...). Therefore, it is in fact possible to compile
and completely type-check modules separately, without any danger of encountering
type errors or name clashes at link or run time.
This is in sharp contrast to, e. g., AspectJ [22], where multiple extensions of the
same class might in fact conflict with each other, leading to errors at weave time,
even if all aspects have been compiled successfully in isolation.
Of course, to type-check the usage of an entity defined in a different module,
the (pre)compiler must consult its definition in the public or protected part of this
module. This implies that a module must be recompiled if its own source code
and/or the non-private part of one of its base modules has been modified since
the last compilation. To release the programmer from the burden of tracking these
dependencies manually, the C+++ precompiler checks the timestamps of all base
modules and automatically performs the necessary recompilations similar to the
Unix make facility.
Dynamically Loaded Modules
Since the process of linking modules together, which have been compiled and typechecked
separately, cannot lead to any errors, it is even possible to delay this process
partially or completely until run time, i. e., to load (and unload) modules dynamically,
similar to the way Java classes are loaded dynamically. Here, loading a module
simply means to initialize it according to the rules given in Sec. 8.3, i. e., to initialize
its global variables and to activate the GVF branches defined in the module. To
make sure that everything the module depends on is actually available, its direct
and indirect base modules are automatically loaded/initialized before in the order
described in Sec. 8.3, too. In particular, modules which have already been initialized,
either at program start time or during an earlier dynamic load operation, will not
be initialized again.
Typically, but not necessarily, dynamically loaded modules do not introduce any
new (public) functions into a program, but only provide additional branches for
already existing virtual functions. Since these branches will be activated when the
module containing their definitions gets loaded, they become implicitly accessible
via the GVF they belong to. Of course, such a dynamically loaded branch of a GVF
might internally use other functions and global variables defined in its module,
which will not be directly accessible to the remaining program. Furthermore, a
dynamically loaded module might contain declarations of additional attributes for
open types defined in other modules. Again, these attributes will neither be known
to nor directly accessible by other modules of the program, but the functions and
GVF branches defined in the same module can use them.
9 THE EXPRESSION PROBLEM
To complete the conceptual part of the paper, this section presents a coherent example,
namely a solution to the well-known expression problem [35], that demonstrates
a typical application of open types, global virtual functions, and modules and their
use in concert. In particular, it illustrates their ability to perform non-invasive extensions
and modifications of an existing software system, which is particularly useful
to cope with unanticipated software evolution.
As a starting point, Fig. 7 shows a module expr providing a basic implementation
of arithmetic expressions, consisting of:
- a protected open type Expr with five protected attributes, resembling a variant
record as described in Sec. 4.7;
- three public virtual constructors of this type to construct constant, unary, and
binary expressions, respectively;
- a public virtual function eval to evaluate an expression, i. e., to compute its
value, with three branches corresponding to these different kinds of expressions.
According to Sec. 8.2, declaring Expr and its attributes protected forces client modules
to use the public user-defined constructors to create objects, and these objects
will be immutable since no functions performing attribute update operations are
provided.
Based on this module, Fig. 8 shows an operational extension that provides another
public12 virtual function print to print an expression on the standard output
stream cout. This is a typical example of a dynamically bound function that can be
added to a system in a completely modular way, i. e., without touching or recompiling
any existing source code.
The following module rem retroactively extends the function eval defined in
module expr with a branch that evaluates remainder expressions denoted by the % operator (since the constructor for binary expressions defined in module expr and
the print function defined in module print are already general enough to properly
handle such expressions, they do not need to be extended):


Figure 7: Basic implementation of arithmetic expressions

Figure 8: An operational extension
Again, this extension can be performed in a completely modular way.
Finally, the following module cache provides a redefinition of expr::eval that
calls its previous branch via the pseudo-function virtual exactly once per expression
x and caches its result value in a private attribute val (which is completely
distinct from the attribute of the same name defined in module expr, cf. Sec. 8.4):
Since this attribute is declared private, it is completely inaccessible, i. e., virtually
invisible, to any other module. This is indeed appropriate when taking the information
hiding principle seriously, since the branch of expr::eval defined in this
module is the only function that needs to know about this attribute and requires access to it.13 Since this module does not introduce any new functions, it would
even be possible to load it dynamically on demand to enable caching of expression
values at run time.
To complete the example, the following module main assembles the components
defined by the previous modules and provides a global statement block that replaces
the traditional main function of C++ programs:

10 IMPLEMENTATION
Possible approaches to implement types with null values have been described in [15],
while the implementation of global virtual functions and modules has been described
in [14, 16]. Therefore, the remainder of this section deals with the implementation of
open types and attributes only. Here, the particular challenge is to find a way to store
objects compactly, especially to avoid storage consumption for unused attributes,
without causing attribute access operations to become overly expensive, i. e., to
provide a reasonable balance between run time efficiency and storage consumption.
The possibility to load attributes of a type dynamically, even after objects of the
type have been created, further complicates these matters.
Basic Attribute Management
Figure 9 illustrates the basic approach to implement open type objects possessing
values for different subsets of attributes. Here, an open type value is actually a
pointer to an intermediate cell, which in turn references an associated data area containing the object’s attribute values (in decreasing order of their alignment restrictions
to avoid any padding in between) and a shared object descriptor describing
its contents. For each attribute of the open type (Expr in this example), an object
descriptor contains the information whether and where the corresponding attribute
value is stored in an object’s data area: If the "slot" corresponding to the attribute
contains a non-negative number, it denotes the relative address of the attribute’s
value in the data area, while a negative value or dash indicates the absence of the
attribute.
Figure 9: Basic attribute management
Based on this information, an attribute access operation proceeds as follows: A
unique ordinal number assigned to the attribute (e. g., 1 for op) is used to access the
corresponding slot of the object’s descriptor, and the offset value found there (nonnegative
number) is used to access the attribute’s value in the object’s data area. If
no offset value is found, however (negative number or dash), reading the attribute
simply returns the null value of its target type, while a write access must extend the
object’s data area in order to obtain space for the new attribute. (This might change
the address of the data area, but the address of the intermediate cell stored in open
type values remains stable.) Furthermore, the object’s descriptor pointer must be
redirected to the "successor" descriptor that contains exactly the same attributes
as the current one plus the new attribute.
Since object descriptors are created dynamically on demand, this extended de-scriptor might not yet exist. In that case, it is created and inserted into the type’s
descriptor repository. Furthermore, it is compared with all currently existing descriptors
in order to find its "successors," i. e., those descriptors containing the same
subset of attributes plus one extra attribute, as well as its "predecessors," i. e.,
those descriptors for which the new one is a successor. These predecessor/successor
relationships are remembered by replacing the dash in the predecessor’s slot corresponding
to the extra attribute with a logical reference to the successor (depicted
by an arrow in Fig. 9), i. e., with the negated value of its index in the descriptor
repository. Using this information, an already existing successor descriptor can be
found easily and efficiently via the descriptor repository.
Typically, an object has to be extended a few times during its initialization, while
afterwards the set of its attributes (but, of course, not necessarily their values) is
expected to remain rather stable. Similarly, the descriptor "network" of an open
type is expected to change a number of times during the initialization phase of a
program, until the object descriptors and their predecessor/successor relationships
required for the typical object initialization patterns of the program have been built
up. Afterwards, most object descriptors needed to perform an object extension will
be found immediately via the negated index in the corresponding slot of the object’s
current descriptor.
Since object descriptors are only created on demand, their number usually remains
rather small compared to their theoretically possible exponential number.
For example, Fig. 9 shows 7 out of 32 possible descriptors which are required by the
three user-defined Expr constructors shown in Sec. 9.
Dynamically Loaded Attributes
If an attribute of a particular type is loaded dynamically (e. g., val defined in module
cache, cf. Sec. 9), all object descriptors belonging to this type must be extended
by a new slot corresponding to this attribute. This can be achieved by iterating
through the descriptor repository and reallocating each descriptor. To make sure
that the descriptor pointers of objects remain stable in that case, an additional level
of indirection is used that is not shown in Fig. 9. Since loading new attributes is not
expected to happen very frequently, the effort for these reallocations is acceptable.
In order to reduce it even further, descriptors are always allocated with a certain
number of extra slots in advance (indicated by the dashed boxes in Fig. 9).
Empty and Null Objects, Object Deletion
Initially, the descriptor repository of each open type contains a single entry at position
zero called the empty descriptor, which does not possess any attributes, i. e.,
all its slots initially contain dashes.
An empty object is represented by an intermediate cell referencing this empty
descriptor and a unique empty data area (which will never be accessed). Similarly, a
null object is represented by an intermediate cell referencing the empty descriptor,too, but with a null data pointer. Thus, attribute inspections on empty and null
objects always return the same result (i. e., null values of the attribute’s target type),
without requiring any additional run time checks to detect null objects. Furthermore,
an object can be easily and efficiently deleted (even while it is still referenced) by
simply deallocating its data area, setting its data pointer to null, and replacing
its descriptor pointer with a pointer to the empty descriptor. Finally, since every
empty object possesses a unique empty data area, comparisons for object equality
can simply compare the object’s data pointers for identity.
Precompiler for C++
The definitions of open types, attributes, and relationships are transformed into
suitable C++ type and function definitions by a rather simple precompiler based
on the EDG C++ Front End (cf. www.edg.com). For example, an open type is
transformed to a C++ class, while an attribute is transformed to two or three
global virtual functions, as described in Sec. 7.3 (and GVFs are in turn transformed
to regular functions plus some auxiliary data structures, as described in [14, 16]).
Usage of the attribute inspection operator @ in an expression is transformed to a
call of the corresponding get function, while mutator (and attribute-initialization
constructor) calls are transformed to calls of the corresponding update function.
The implementation of anonymous and automatic attributes, in particular the way
to tell the underlying C++ compiler to automatically apply automatic attributes
transitively, requires some sophisticated tricks whose description is beyond the scope
of this paper.
Automatic Garbage Collection
To perform automatic garbage collection (GC) for open type objects, all global and
local variables of open types are treated as root pointers, whose addresses are added
to resp. removed from a global root table by the constructors resp. destructors of the
class representing the type.14 Furthermore, the values of all attributes whose target
type is an open type, too, are treated as inter-object pointers, whose location in an
object’s data area can be determined from the object’s descriptor.
Based on this information, any tracing GC algorithm can be used to reclaim the
storage of objects which have become unreachable. By overloading the assignment
operator of open types to perform appropriate extra actions in addition to copying
the pointer to the intermediate cell, it is even possible to implement an incremental
GC algorithm. In particular, an incremental Mark&Sweep algorithm based on [8]
has been implemented.
Performance Results
To compare the run time efficiency of the open type implementation (including its
garbage collector) with standard class-based object-oriented systems, a simple test
program creating, inspecting, and manipulating randomly structured object graphs
has been written in standard C++, Java, and C+++ (i. e., C++ with open types).
In C++, objects which are no longer needed, are explicitly deleted, while in C+++
and Java automatic garbage collection is employed for that purpose. By changing the
values of some configuration parameters, the relative frequencies of object creations,
inspections, and manipulations can be modified to obtain quite different application
characteristics with a single program.
The main results of comparing the overall run times of these programs with
different parameter settings can be summarized as follows:
- C+++ is between 1.9 and 2.5 times slower than C++.
Since C/C++ is usually regarded as a performance yardstick, it is not really
surprising that the current precompiler-based C+++ implementation cannot
really compete with it. On the other hand, a slowdown of this magnitude
might still be acceptable – in exchange for the significantly improved flexibility
provided by open types – for many applications where performance is not really
critical.
- C+++ is up to 2.5 times faster than Java running on the Hotspot Virtual
Machine.
When considering the fact that over many years lots of people have spent
considerable effort to improve the performance of Java implementations in
general and garbage collection techniques in particular, this is both a surprising
and a quite encouraging result.
Of course, to obtain more substantial and detailed performance data and to further
improve the C+++ implementation, more fine-grained measurements with different
benchmark and real-world programs are required. For now, however, it has been
sufficient to prove that open types (as well as global virtual functions and types
with null values) can be implemented with quite acceptable run time performance.
11 RELATED WORK
Since open types and bidirectional relationships constitute the genuine contribution
of this paper, while the accompanying issues of null values, global virtual functions,
and modules have already been published and discussed elsewhere [15, 14, 16], the
following discussion of related work is confined to the former. Extensible Data Structures
As already mentioned in Sec. 1, aspect-oriented programming languages such as AspectJ
[22] and AspectC++ [32] provide so-called inter-type member declarations
or introductions to retroactively extend existing data structure definitions without
needing to change the code of the original definitions. Nevertheless, however, some
kind of recompilation or "weaving" is required by all these approaches: While the
AspectC++ compiler needs the source code of the original definition together with
all extension code to produce a new definition that is actually compiled by a C++
compiler, the AspectJ compiler is able to perform the combination on the byte code
level. Frameworks such as JMangler [23] are even able to delay the final composition
until load time, but in either case a class remains fixed once it has been loaded. In
particular, all objects of a class possess the same storage structure and size, and
it is impossible to change this at run time. Open types, on the other hand, allow
attributes to be loaded dynamically, even for types which have already been instantiated.
Actually, aspect-oriented approaches still adhere to the traditional concept
of records as fixed data structures and only make their definition more flexible, while
open types support truly flexible objects whose storage size might vary over time.
Furthermore, as already mentioned in Sec. 8.4, extensions provided by different aspects
might conflict with each other, while attributes defined in different modules
will always be compatible.
Subject-oriented programming [13] and multi-dimensional separation of concerns [34] also allow data structures to be specified and composed very flexibly, supporting
even the merge of independently developed classes into a single new one. Again,
however, all resulting classes remain fixed while a program is being executed.
The Common Lisp Object System (CLOS) [21] and languages such as Dylan
[7] based on similar principles deviate from the approach of other object-oriented
languages that everything belonging to a class must be defined (or at least declared)
in the class, by associating methods with generic functions and allowing them to be
defined separately and incrementally. However, the set of data fields making up a
class must still be defined at once and cannot be extended later, except by redefining
the whole class. In contrast, open types apply the "generic function principle," i. e.,
the possibility to define methods separately and independently, to data fields, too.
MultiJava [6, 27] is a recent extension of Java that also addresses the problem
of retroactively extending existing classes in a strictly modular way, but again only
augmenting methods is supported, while the data fields making up a class remain
fixed.
Dynamically typed languages and systems with integrated interactive programming
environments such as Smalltalk [11] allow the user to extend an existing class
by adding new instance variables and automatically reallocate all currently existing
instances to contain space for the new variables. While the effort for these reallocations
might be acceptable if such extensions are not performed very frequently,
the open type implementation avoids it by reallocating only the descriptors of an open type instead of all its objects when a new attribute is loaded dynamically. Afterwards,
individual objects are only extended on demand, i. e., when they actually
receive a value for the new attribute. Prototype-based languages such as Self [36] offer many of the advantages of open
types over record-oriented data models, e. g., the possibility to incrementally define
attributes (called slots) and inheritance relationships (via parent slots) as well as
support for dynamic object evolution (by dynamically adding or removing slots).
Even the implementation of objects in Self based on maps [4] is similar to that of
open type objects based on descriptors. However, since object reallocations are not
expected to occur frequently, objects are addressed directly instead of through an
intermediate cell, requiring a scan through the entire heap to update all references
to an object if it has to be actually moved. Furthermore, no attempt is made to find
and possibly reuse an existing map if an object is extended by a new slot. Since a
new object is usually created by cloning an existing prototype, and afterwards it
is only extended in rare cases to set up a new "clone family," these drawbacks are
acceptable. Open type objects, however, are always created as empty objects which
are frequently extended, at least in the course of their initialization. Therefore, it is
vital to perform this operation rather efficiently.
Apart from these implementation details, the key difference between Self and
open type objects is the fact that the latter are statically type-safe, i. e., they combine
the flexibility of dynamically typed systems with the safety of statically typed
languages.
Bidirectional Relationships
Even though bidirectional relationships between data types are omnipresent in various
kinds of applications, and modeling languages such as the Entity-Relationship
model [5] and UML [29] support them as an integral part, the possibility to directly
define them in a programming language, whose run time system keeps their
directions consistent automatically, is hard to find.
RelJ [2] is a prototypical language consisting of a "middleweight" fragment of
Java plus support for relationships as first-class citizens. Similar to open types,
relationship definitions are separated from class definitions and therefore can be
added incrementally. The way to establish and remove relationships between objects
is quite similar to mutator calls for open type objects, and the rules to enforce the
restricted multiplicities of 1:1, 1:N, and N:1 relationships are pretty much the same.
In addition to a source and a target type, a relationship might possess its own
data fields and methods, it might be part of a separate inheritance hierarchy, and
it might itself take part in another relationship. Since data fields, methods, and
inheritance are not directly provided for open types, however, it would be logically
inconsistent to provide them for relationships between them. Instead, one might
allow attributes and relationships as source or target types of other attributes and
relationships and possibly as parameter types of functions in order to provide comparable features for open types. While RelJ’s main contribution so far is a sound formal model, no implementation
ideas have been published yet. In fact, to provide an efficient implementation
of the model, a flexible data representation format similar to that of open types
appears to be necessary to allow information about an object’s relationships to be
stored directly within the object.
Description logic systems such as Classic [3] and Loom [25] provide data models
which are very similar in nature to open types and have in fact influenced some of
their ideas. They provide concepts (corresponding to open types) and roles (corresponding
to attributes and relationships), which are defined separately and independently,
and roles might possess inverse roles. Furthermore, a running system can be
extended by new definitions at any time.
However, since description logic systems are actually AI tools, providing powerful
reasoning capabilities such as subsumption checking, automatic instance classification,
and truth maintenance, using them as bare data models of a programming
language would mean to break a fly upon the wheel. Therefore, open types might
be viewed as the result of reducing a description logic system to a simple data
representation system by stripping off all AI functionality.
Inheritance and Aggregation
Supporting multiple and repeated inheritance in a way that combines high expressiveness
and flexibility with comprehensibility, ease of use, and efficient implementability
has always been a great challenge for object-oriented language designers. In particular,
the question whether and under which circumstances a multiply inherited entity
(i. e., a data field, a method, or a whole class) shall be replicated in a derived class or
not, is hard to answer in general. Furthermore, if entities are replicated, the resulting
name clashes have to be resolved in some satisfactory way.
The proposed solutions range from totally rejecting multiple inheritance (e. g.,
in Smalltalk [11]) over restricting it to inheritance of interfaces (e. g., in Java [1])
to supporting it in a rather general way (e. g., in Eiffel [26], C++ [33], and Timor
[19, 20]) with quite different syntactical and semantical forms. In any case, however,
numerous specialized language constructs and partially sophisticated rules are
required to support at least the most frequently occurring cases in practice. With
open types, however, the already present and comparatively simple concept of bidirectional
relationships can be used to model virtually all kinds of multiple and
repeated inheritance in a straightforward manner, without requiring any additional
language constructs and rules for that purpose.
Again, the approach taken in Self [36] is quite similar to that of open types:
Instead of introducing special language constructs and rules, the already present
concept of slots can be employed to distinguish, e. g., between replicated and shared
inherited entities. In addition to significantly simplifying inheritance issues, the concept of automatic
attributes and relationships unifies the concepts of aggregation (expressed by
normal attributes and relationships), inheritance (expressed by automatic attributes
and relationships), and user-defined type conversions (expressible by automatic attributes
whose get function is redefined to perform the appropriate conversion).
Even though related, these concepts are usually clearly separated from each other
in other programming languages. C++, for example, provides two different ways
to specify user-defined type conversions, i. e., constructors and conversion operators
[33], in addition to implicit type conversions based on subtype polymorphism
(where the latter are applied transitively if necessary to convert a derived class to
an indirect base class, while the former are not). The combination of these different
mechanisms leads to rather complicated rules about the applicability of conversions
and the ranking of conversion sequences that is needed for overload resolution. In
contrast, automatic attributes constitute a single coherent concept which subsumes
and unifies these different ones.
Other approaches that aim at reconciling the partially conflicting concepts of
aggregation and inheritance are parts inheritance in Timor [19] and compound references in [30].
The fact that an attribute is actually a set of functions with a predefined implementation
that can be changed by the programmer, corresponds (amongst others)
to the concept of abstract variables also found in Timor [18] and again to the concept
of slots in Self. However, open types go a step further by allowing these functions to
be redefined multiple times, where each new implementation can call the previous
one on demand.
Variable functions proposed by Odersky [28] are quite similar to attributes of
open types by constituting an updatable mapping from a source to a target type
that might be accessed either in an object-oriented (x.y resp. x@y) or in a functional
notation (y(x)). The cited paper also contains some general remarks about
possible implementations, but no ideas comparable to the sophisticated attribute
management of open types have been found.
12 CONCLUSION
Summary
Open types have been presented as a simple yet powerful data model for statically
typed procedural and object-oriented programming languages, that overcomes
the limitations of the traditional record-oriented model, especially the problem of
retroactively extending existing data types with new attributes. Even though the
core concepts of open types and attributes are hardly more complex than simple
record types, their extension with bidirectional relationships and automatic and
anonymous attributes and relationships leads to a remarkable degree of expressiveness
and flexibility. In particular, virtually all kinds of object-oriented inheritancecan be modeled in a rather straightforward way.
By complementing open types with global virtual functions and modules, other
core concerns of object-orientation, i. e., dynamic binding and encapsulation, are
covered, too. In summary, the combination of these concepts leads to advanced procedural
programming languages, which are at the same time conceptually simpler
and more expressive than their object-oriented counterparts. In particular, some
important features provided by aspect-oriented languages, such as inter-type member
declarations and advice, are available for free, i. e., without requiring additional
language constructs.
Current Status and Future Work
The basic ideas of open types and virtual functions have been developed, implemented,
and refined over several years. Recently, they have been successfully employed
in a small student project (2500 lines of C+++ code), a larger master’s
project (4000 lines of C+++ code), and in re-implementing the C+++ precompiler
in C+++ itself (4000 lines of C+++ code providing truly modular extensions to
several hundred thousand lines of existing C code).
In particular, the re-implementation of the precompiler based on the EDG C++
Front End (cf. Sec. 10.4) has proven that the basic idea of non-invasively extending
and modifying an existing software system is actually practicable, even if the
original system has not been specifically designed with that aim. Furthermore, it
has demonstrated that it is easily possible to automatically turn ordinary statically
bound C functions into dynamically bound global virtual functions which can be redefined
on demand, i. e., to actually integrate "legacy" code into C+++ programs.
Even though the current implementation of C+++ already exhibits quite acceptable
performance, the potential for optimizations is surely not exhausted yet.
By fine-tuning critical parts of the attribute management system and the garbage
collection algorithm, additional performance gains are expected.
Footnotes
1 See www.informatik.uni-ulm.de/rs/mitarbeiter/ch/apples.
2 Besides IEEE NaN values for floating point types, C#’s Nullable types constitute a partial
exception; however, instead of actually providing null values for all types of the language, there is
a separate nullable type T? for each non-nullable type T.
3 The term "reference" is used in its general meaning here, which is somewhat different from
the C++ notion of references [33].
4 For multi-valued attributes, there are also variants of the mutator to remove a previously added
value and to insert a new value at a particular position.
5 It would have been possible to reuse the existing dot or -> operator for that purpose, too, just
like existing keywords such as typename have been reused to avoid the introduction of new ones.
Introducing a new operator, however, cannot create any incompatibilities with existing C++ code,
while introducing a new keyword (e. g., type) would cause programs using this as an identifier to
become invalid.
6 If the target type of an attribute is a user-defined or built-in C++ type (such as int) that does
not possess a distinct null value, its parameterless constructor would return a default value such
as zero, which is conceptually quite different from no value. Therefore, such types should ideally
be prohibited as target types of attributes, which could actually be enforced by the precompiler.
For practical reasons, however, in particular to simplify the integration of C++ legacy code into
C+++ programs, any kind of target type is allowed. By means of a compile-time switch, it is
possible to specify whether the default value returned by the parameterless constructor shall be
returned or an exception shall be raised when a non-existent attribute value of such a target type
is accessed. If a type does not possess a public parameterless constructor, the former alternative
will lead to a compile-time error.
7 A C++ declaration such as Color red(...); is a shorthand notation for Color red =
Color(...);
8 According to the rules given in Sec. 4.5, access to this person object and its attributes would
still be well-defined, however.
9 Global virtual functions are simply called virtual functions in the sequel, as there is no possibility
of confusion with virtual member functions.
10 Just like a call to the empty constructor requires a pseudo-argument @ to distinguish it from a
call to the parameterless constructor (cf. Sec. 4.3), its definition also requires a pseudo-parameter
declaration @.
11 Note that to redefine a function of a foreign module it is necessary to qualify its name with
the name of this module, since an unqualified name would denote a different function introduced
in the current module.
12 The first section of a module is implicitly public.
13 Since a redefinition of a global virtual function is never called directly, it does not make any
difference whether it is declared public, protected, or private.
14 If open type values are dynamically created with the C++ new operator, these are treated as
root pointers, too.
REFERENCES
[1] K. Arnold, J. Gosling, D. Holmes: The Java Programming Language (Third Edition). Addison-Wesley, Boston, 2000.
[2] G. Biermann, A. Wren: "First-class Relationships in an Object-oriented Language." In: A. P. Black (ed.): ECOOP 2005 – Object-Oriented Programming (19th European Conference; Glasgow, UK, July 2005; Proceedings). Lecture Notes in Computer Science 3586, Springer-Verlag, Berlin, 2005, 262–286.
[3] R. J. Brachman, D. L. McGuinness, P. F. Patel-Schneider, L. A. Resnick: "Living with CLASSIC: When and How to Use a KL-ONE-Like Language." In: J. F. Sowa (ed.): Principles of Semantic Networks. Explorations in the Representation of Knowledge. Morgan Kaufmann Publishers, San Mateo, CA, 1991, 401–456.
[4] C. Chambers, D. Ungar, E. Lee: "An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes." In: N. K. Meyrowitz (ed.): 4th Ann. Conf. on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA) (New Orleans, LA, October 1989). ACM SIGPLAN Notices 24 (10) October 1989, 49–70.
[5] P. P. Chen: "The Entity-Relationship Model – Toward a Unified View of Data." ACM Transactions on Database Systems 1 (1) March 1976, 9–36.
[6] C. Clifton, G. T. Leavens, C. Chambers, T. Millstein: "MultiJava: Modular Open Classes and Symmetric Multiple Dispatch for Java." In: Proc. 2000 ACM SIGPLAN Conf. on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA ’00) (Minneapolis, MN, October 2000). ACM SIGPLAN Notices 35 (10) October 2000, 130–145.
[7] I. D. Craig: Programming in Dylan. Springer-Verlag, London, 1997.
[8] E. W. Dijkstra, L. Lamport, A. J.Martin, C. S. Scholten, E. F. M. Steffens: "Onthe-Fly Garbage Collection: An Exercise in Cooperation." Communications of the ACM 21 (11) November 1978, 966–975.
[9] M. Ernst, C. Kaplan, C. Chambers: "Predicate Dispatching: A Unified Theory of Dispatch." In: E. Jul (ed.): ECOOP’98 – Object-Oriented Programming (12th European Conference; Brussels, Belgium, July 1998; Proceedings). Lecture Notes in Computer Science 1445, Springer-Verlag, Berlin, 1998, 186–211.
[10] E. Gamma, R. Helm, R. Johnson, J. Vlissides: Design Patterns. Elements of Reusable Object-Oriented Software. Addison-Wesley, Reading, MA, 1995.
[11] A. Goldberg, D. Robson: Smalltalk-80. The Language. Addison-Wesley, Reading, MA, 1989.
[12] D. Goldberg: "What Every Computer Scientist Should Know About Floating-Point Arithmetic." ACM Computing Surveys 23 (1) March 1991, 5–48.
[13] W. Harrison, H. Ossher: "Subject-Oriented Programming (A Critique of Pure Objects)." In: 1993 Ann. Conf. on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA) (September 1993), 411–428.
[14] C. Heinlein: "Virtual Namespace Functions: An Alternative to Virtual Member Functions in C++ and Advice in AspectC++." In: Proc. 2005 ACM Symposium on Applied Computing (SAC) (Santa Fe, New Mexico, March 2005), 1274–1281.
[15] C. Heinlein: "Null Values in Programming Languages." In: H. R. Arabnia (ed.): Proc. Int. Conf. on Programming Languages and Compilers (PLC’05) (Las Vegas, NV, June 2005), 123–129.
[16] C. Heinlein: "Global and Local Virtual Functions in C++." Journal of Object Technology 4 (10) December 2005, 71–93. http://www.jot.fm/issues/issue_2005_12/article4
[17] JBoss AOP. http://www.jboss.org/products/aop.
[18] J. L. Keedy, G. Menger, C. Heinlein: "Taking Information Hiding Seriously in an Object Oriented Context." In: Net.ObjectDays 2003. Tagungsband (Erfurt, Germany, September 2003). tranSIT GmbH, Ilmenau, 2003, ISBN 3-9808628-2-8, 51–65.
[19] J. L. Keedy, C. Heinlein, G. Menger: "Inheriting Multiple and Repeated Parts in Timor." Journal of Object Technology 3 (10) November/December 2004, 99–120. http://www.jot.fm/issues/issue_2004_11/article1
[20] J. L. Keedy, C. Heinlein, G. Menger, M. Evered: "Diamond Inheritance and Attribute Types in Timor." Journal of Object Technology 3 (10) November/December 2004, 121–142. http://www.jot.fm/issues/issue_2004_11/article2
[21] S. E. Keene: Object-Oriented Programming in Common Lisp: A Programmer’s Guide to CLOS. Addison-Wesley, Reading, MA, 1989.
[22] G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, W. G. Griswold: "An Overview of AspectJ." In: J. Lindskov Knudsen (ed.): ECOOP 2001 – Object-Oriented Programming (15th European Conference; Budapest, Hungary, June 2001; Proceedings). Lecture Notes in Computer Science 2072, Springer-Verlag, Berlin, 2001, 327–353.
[23] G. Kniesel, P. Costanza, M. Austermann: "JMangler–A Powerful Back-End for Aspect-Oriented Programming." In: R. E. Filman, T. Elrad, S. Clarke, M. Aksit (eds.): Aspect-Oriented Software Development. Pearson International, 2004, 311–342.
[24] A. Koenig, B. Stroustrup: "Foundations for Native C++ Styles." Software-Practice and Experience 25 (S4) December 1995, 45–86.
[25] R. MacGregor: "The Evolving Technology of Classification-Based Knowledge Representation Systems." In: J. F. Sowa (ed.): Principles of Semantic Networks. Explorations in the Representation of Knowledge. Morgan Kaufmann Publishers, San Mateo, CA, 1991, 385–400.
[26] B. Meyer: Eiffel: The Language. Prentice-Hall, New York, 1994.
[27] T. Millstein, M. Reay, C. Chambers: "Relaxed MultiJava: Balancing Extensibility and Modular Typechecking." In: Proc. 2003 ACM SIGPLAN Conf. on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA’03) (Anaheim, CA, October 2003). ACM SIGPLAN Notices 38 (11) November 2003.
[28] M. Odersky: "Programming with Variable Functions." In: Proc. Int. Conf. on Functional Programming (Baltimore, Sept. 1998).
[29] OMG: OMG Unified Modeling Language Specification (Version 1.5). Object Management Group, March 2003.
[30] K. Ostermann,M. Mezini: "Object-Oriented Composition Untangled." In: Proc. 2001 ACM SIGPLAN Conf. on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA ’01) (Tampa Bay, FL, October 2001). ACM SIGPLAN Notices 36 (11) November 2001, 283–299.
[31] D. L. Parnas: "On the Criteria to Be Used in Decomposing Systems into Modules." Communications of the ACM 15 (12) December 1972, 1053–1058.
[32] O. Spinczyk, A. Gal, W. Schr¨oder-Preikschat: "AspectC++: An Aspect- Oriented Extension to the C++ Programming Language." In: J. Noble, J. Potter (eds.): Proc. 40th Int. Conf. on Technology of Object-Oriented Languages and Systems (TOOLS Pacific) (Sydney, Australia, February 2002), 53–60.
[33] B. Stroustrup: The C++ Programming Language (Special Edition). Addison-Wesley, Reading, MA, 2000.
[34] P. Tarr, H. Ossher, W. Harrison, S. M. Sutton Jr.: "N Degrees of Separation: Multi-Dimensional Separation of Concerns." In: Proc. 21st Int. Conf. on Software Engineering (May 1999), 107–119.
[35] M. Torgersen: "The Expression Problem Revisited. Four New Solutions Using Generics." In: M. Odersky (ed.): ECOOP 2004 – Object-Oriented Programming (18th European Conference; Oslo, Norway, June 2004; Proceedings). Lecture Notes in Computer Science 3086, Springer-Verlag, Berlin, 2004, 123–143.
[36] D. Ungar, R. B. Smith: "Self: The Power of Simplicity." In: 2nd Ann. Conf. on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA) (Orlando, FL, October 1987). ACM SIGPLAN Notices 22 (12) December 1987, 227–241.
[37] N. Wirth: "The Programming Language Oberon." Software-Practice and Experience 18 (7) July 1988, 671–690.
About the authors

|
|
Christian Heinlein has been working as a Scientific Assistant
at the University of Ulm, Germany, where he conducted the research
project APPLEs, that aims at developing "Advanced Procedural
Programming Languages," which are both conceptually simpler
and more flexible than standard object-oriented languages.
He can be reached at christian.heinlein@uni-ulm.de. See also www.informatik.uni-ulm.de/rs/mitarbeiter/ch/apples. |
Cite this article as follows: Christian Heinlein: "Open Types and Bidirectional Relationships as
an Alternative to Classes and Inheritance", in Journal of Object Technology, vol. 6 no. 3,
March-April 2007, pages 101-151, http://www.jot.fm/issues/issue_2007_03/article3
|