The Theory of Classification

Part 12: Building the Class Hierarchy

Anthony J H Simons, Department of Computer Science, University of Sheffield, U.K.

COLUMN

1 INTRODUCTION

This is the twelfth article in a regular series on object-oriented type theory for non-specialists. Readers following the series will by now have gained some experience in theoretical models and their uses. Previous articles have gradually built up models of objects [1], types [2] and classes [3] in the -calculus. Inheritance has been shown to extend both type schemes [4] and implementations [5]. The most recent article combined the type and implementation aspects of inheritance [6], introducing the typed notation that will be used in the rest of this series. The theoretical model has already revealed some interesting differences between the notions of type and class. We have shown that one group of languages, exemplified by C++ and Java, uses subtyping [2, 3, 5] as the basis for type compatibility, whereas another group, exemplified by Smalltalk and Eiffel, uses subclassing [3, 4, 5], which is a distinct formal relationship in the Theory of Classification.

In this article, we review the whole model so far, to demonstrate more of its formal modelling power. The theoretical model is able to represent the whole spread of object-oriented concepts, such as objects, types, classes, abstract classes and interfaces. It can handle the notion of object creation, class extension through inheritance, type compatibility and interface matching. We shall build up a simple model of a class hierarchy, demonstrating all of these concepts. First, we shall briefly review the elements of the model.

2 SIMPLE OBJECTS AND TYPES

Objects are modelled as simple records, whose fields are values, representing attributes, or functions, representing methods. Types are modelled as record types, whose fields are the corresponding type signatures of the attributes or methods. The following introduces a simple co-ordinate type and an instance of the type:

We use capitalisation to indicate type names, and lower case for object names. The above coordinate is a specific instance of Coord at the location (2, 3). To construct such an instance, we can define the constructor-function makeCoord:

This accepts a pair of arguments a, b, and returns a record, corresponding to a Coord instance. We can then create coord from first principles by the constructor-call:

We access its fields using the “dot” syntax: coord.x 3. The fields x, y have fixed values once the object is constructed. We can think of these fields either as constant public attributes, or else we can think of them as “unary methods”, functions that accept no argument and access an encapsulated value¹. We typically take the second view, so that all record fields can be thought of as functions.

3 RECURSIVE OBJECTS AND TYPES

Objects are frequently recursive, since methods may invoke further methods on self, the current object. Types are also recursive, since methods may accept or return arguments of the same type as the self-type. These two kinds of recursion are related, but independent. We use a standard technique in the -calculus to introduce and bind the recursion, called fixpoint finding [1] and separate fixpoints are needed at the object-level and the type-level. The technique involves the use of generators (also called functionals), which are functions of the self-value (or self-type). The following introduces type- and object-generators for a more sophisticated Cartesian point type, which has a recursive equal method:

The type generator GenPoint is not yet a record type, but rather a function of (the self-type parameter) that returns a record type. We construct the recursive Point type by binding recursively to the function body, using the fixpoint finder Y (see [1] for full details):

In this way, we build the recursive Point type from first principles. Note how, once the definition is complete, we may unroll Point (denoted by the evaluation step ) to a record type containing recursive reference to the Point type.
Likewise, the object generator genPoint is not yet an object instance, but rather a function that returns this instance. It accepts two arguments: stands for the self-type and self for the self-value. To create a point-instance from this generator, we need to supply a suitable type for and then bind self recursively over the rest of the function body. In the first step, we choose to supply the type Point, and the returned result is a generator, a simple function of self:

In the second step, this simple generator may be recursively fixed using Y:

to bind self over the rest of the body. Note again how, after unrolling, the defined point instance contains recursive references to point. If we consider that the type Point was itself the result of a fixpoint operation, we could equally define point in the style:

illustrating how it takes two fixpoints to bind the recursions at object- and type-level in each instance. From a theoretician’s point of view, this presents some challenges. Bruce and Mitchell [7] were the first to show that Y existed both at both the type- and object-levels, and prove that fixpoint operations converged at infinity, provided that objects were records of functions. (Convergence fails if an object has a field which directly returns self; however, we can always convert the identity method into a function accepting an empty argument – see the earlier footnote¹).

4 CONSTRUCTING OBJECT INSTANCES

In the previous article [6], we settled on a style for defining all classes using type- and object-generators, as illustrated above. It turns out that this is most useful, because we can create subclasses by adapting generators [4, 5], in a style that mimics inheritance. However, the exact process of constructing distinct object instances was not fully described. In fact, genPoint is a generator for a specific instance, with point.x 3. To make this function more general-purpose, we should force it to accept extra initialisation arguments:

Here, initPoint is an extended version of genPoint, which accepts a type , then a pair of Integer arguments a, b, then the usual value for self. This function can be used to construct objects having distinct x, y field values in the following style:

Note the order of application in the creation of p1: first we supply the precise type, Point; then we supply the initialisation values (4, 5). This returns a simple object generator, having the form: (self : Point).{ … }, which we can fix using Y to bind self recursively over the rest of the generator body.

If we wanted, we could define a simple object constructor, makePoint, in the same style as makeCoord (see section 2), which uses the extended function initPoint internally:

In this, you can think of Y as taking a fixed snap-shot of the flexible type-structure and object-structure represented by the two generators. Constructors in practical object-oriented languages always create a specific instance of a specific type.

5 INVOKING OBJECT METHODS

Having created two distinct instances of Point, we can simulate their behaviour in the theory, by evaluating expressions representing method invocations, such as:

The steps shown above are mostly single evaluation steps in the calculus, showing on the right which rule applied in each case. In particular, we use the record selection rule to access the methods for equal and x, and the function application rule when applying the result of p1.equal to the argument p2. The details of primitive Integer and Boolean operations are omitted here. We assume that suitable definitions exist for these types.

6 ROOTING A CLASS HIERARCHY

Within the theory, we can model the notion of a class hierarchy, starting with a root class of all objects. We shall later derive other classes by extending the root class. In this example, we shall assume that the root class only defines a single method, equal, and that the default implementation of this method is identity of reference, represented by ==.

The type generator GenObject describes the type-shape of the most general kind of object, saying that it has an equal method. The object generator genObject provides the default implementation of the equal method, and restricts the applicability of this method to arguments of the type (o : ranges over the family of types expressed by the constraint: . In earlier articles [3, 4] we described how this expresses exactly the notion of a class, a family of related types. This constraint [8] accepts all record types that have at least an equal method, with a type signature Boolean.

The Point from section 3 above appears to match this pattern. It is possible to derive Point’s generators from Object’s generators, in the style that mimics the operation of inheritance in object-oriented programming. Recall that is union with override, creating an extended record by combining a base record with a record of extra methods [5]:

The bold highlights indicate how the old type-generator and object-generator are reused in the definition of the new, extended functions. At the type-level, we added the types of the x, y fields. At the object-level, we added implementations for these, and also redefined equal to compare the x, y field values, rather than test for reference identity. The right-handed preference of causes the new version of equal to override the default version provided in the root class, when the extra record is combined with the base record [5].

To establish that the Point- and Object-classes are in a proper hierarchical relationship, we need to show that the Point-interface is compatible with the Object-interface. This is done formally using the Classify rule [4], in which we have to compare the two type-generators for a pointwise subtyping relationship:

We can show that the relationship holds for a single dummy exemplar type t:

and by generalising t to all types , assert that the relationship holds everywhere. As a result, we maintain that all types belonging to the Point-class will also belong to the Object-class.

7 INTERMEDIATE ABSTRACT CLASSES

We can also define the notion of an abstract class. This is expressed by a pair of generators in which full type information is given in the type-generator, but some of the implementation information is left undefined in the object-generator. Earlier, we used to represent the undefined value [1]. In -calculus, any expression which simplifies to is the formal equivalent of raising an exception in a practical programming language. This is also quite suitable for representing abstract methods, since it is an error to invoke them (instead, you would expect the abstract methods to be overridden in a concrete subclass).

We seek to define a Shape-class, the abstract ancestor of geometric shapes, such as Circle and Rectangle, which provides a concrete origin method, indicating its screen coordinates, but an abstract area method, for which no implementation can yet be given. Furthermore, we define this class by extension from the root Object-class above.

In the type-generator GenPoint, the extra type-information is highlighted in bold. The object-function initShape is written in the same style as initPoint, with extra initialisation arguments, since we want to be able to supply the initial Point coordinate p at which a Shape is to be located. Given this argument p, we can define origin to return p, and can redefine equal to compare the origins of two Shapes – this in turn uses the equal method defined earlier for Points. Note the use of in the body of the area method, indicating that this is so far undefined.

To demonstrate what happens in the model when you try to invoke an abstract method, we exceptionally provide a simple constructor for abstract Shapes, so that we can create an instance and invoke the abstract method (normally, no constructor would be provided, since the class is abstract):

The result of attempting to select an abstract method is the undefined value ?, equivalent to raising an exception.

8 FINAL CONCRETE CLASSES

We now seek to derive a concrete Rectangle-class as a subclass of the abstract Shape-class, with an implementation of its area method. The type-generator declares the extra signatures for the width and height fields, while the object-function supplies implementations for these, and also provides a concrete implementation for the area method, and re-implements the equal method to compare width and height, in addition to origin:

By the usual operation of , the right-hand, concrete version of area is preferred, overriding the abstract version inherited from the Shape-class; likewise, the new version of equal is preferred, replacing the inherited version. The Rectangle-class is now fully defined, with suitable implementations for all of its methods.

In the object-function initRectangle, the initialisation-value is of a different type than the argument supplied to initShape, since we need to initialise a rectangle with (all of) its origin, width and height values. Formally, this initialiser is a single value, a tuple of the product type Point x Integer x Integer, whose projections we access implicitly. Notice how the initialiser passed back to initShape in the inheritance-expression (on the left of ) is just the p : Point value, the first projection from the current initialiser. This models the notion of passing back some construction arguments to a superclass.

We can provide a simple constructor for Rectangle objects, in the usual way, which fixes the object- and type-level recursions to construct an instance of this exact type:

and with this, we may construct a number of distinct Rectangle instances:

The instances r1, r2 : Rectangle have different values for their width and height fields, although they happen to share their origin in this example. Also, their area and equal methods invoke further methods (recursively) on the same instance, as intended.

9 TYPE COMPATIBILITY AND INTERFACE MATCHING

The Rectangle instances should be type-compatible with the Shape-class interface, and transitively with the Object-class interface. Methods defined for these general classes should be type-correct when applied to instances of the specific Rectangle type. One way of checking this is to see whether Rectangle is included in the type-family of each class [4]. First, we want to see if Rectangle is in the class of Shapes:

and secondly, whether Rectangle is in the class of Objects:

In both cases, the answer is yes, because Rectangle extends the interface provided by each superclass. In particular, the equal method was originally declared in the Object-class, with the type: Boolean, so applying this to a Rectangle instance simply produces the substitution: {Rectangle/} and yields the specifically-typed version: Rectangle.equal : Rectangle Boolean. Similarly, the area method was originally declared in the Shape-class with the type: .area : Integer, so applying this to a Rectangle instance produces the same substitution: {Rectangle/} and yields the specifically-typed version: Rectangle.area : Rectangle Integer.

Looking at the unrolled version of the Rectangle type above, we can see that the redefined versions of these methods have exactly the same types. So, Rectangle matches all the expected superclass interfaces, as intended. These interfaces were defined as part of the type-information associated with a class. However, some practical programming languages allow the definition of interfaces that are not associated with any class.

In the theory, the concept of an independent interface may be represented exactly by a type-generator, without a corresponding object-generator. If we wanted to specify (for example) a Locatable interface for all object types providing an origin method, we could do so using just the type generator:

and then give Locatable variables the polymorphic type: ?(? <: GenLocatable[?]). It is clear that both the Shape and Rectangle classes satisfy this interface, by the pointwise subtyping condition expected in the Classify rule [4]:

and the interested reader is encouraged to prove this, using the same kind of strategy as demonstrated in section 6 above.

10 CONCLUSION

This article has focused on two main areas: how to construct specific object instances, and the development of a simple class hierarchy. The overall aim was to demonstrate how the Theory of Classification can model a variety of object-oriented concepts, including objects, types, classes, abstract classes and interfaces.

Regarding object construction, we showed how generators can be extended into flexible object-creation functions with initialisation arguments. Furthermore, initialisation values may be passed back to the superclass functions, mimicking the behaviour of real object-oriented languages. From a formal perspective, creating a unique instance of an exact type always involves taking a double fixpoint, one for the type-generator and one for the object-generator.

Regarding class hierarchy development, we gave examples of recognisable classes, with mixtures of default, abstract and concrete methods. Note especially how the type- and implementation-aspects were able to evolve independently, according to need. A class may introduce a new method, may declare an abstract method, or may re-implement an existing method, replacing it with a more appropriate version. All of this is handled uniformly within the theory. Furthermore, we have shown that classes derived by inheritance (using generators) give rise to object constructors, which create objects of exact types (after fixpoints are taken). These exact types match the expected interfaces of their superclasses, and also of separately-declared interfaces, where appropriate. The matching condition (Classify [4]) is exactly the same in both cases, demonstrating the economy of the theory.

Footnotes

¹ It is possible to think of simple fields as functions accepting an empty argument, eg: x : Empty Integer, and think of all method invocations as supplying the empty value ? automatically, eg: coord.x( 2.

REFERENCES

[1] A J H Simons, “The theory of classification, part 3: Object encodings and recursion”, in Journal of Object Technology, vol. 1, no. 4, September-October 2002, pp. 49-57. http://www.jot.fm/issues/issue_2002_09/column4

[2] A J H Simons, “The theory of classification, part 4: Object types and subtyping”, in Journal of Object Technology, vol. 1, no. 5, November-December 2002, pp. 27-35. http://www.jot.fm/issues/issue_2002_11/column2

[3] A J H Simons, “The theory of classification, part 7: A class is a type family”, in Journal of Object Technology, vol. 2, no. 3, May-June 2003, pp. 13-22. http://www.jot.fm/issues/issue_2003_05/column2

[4] A J H Simons, “The theory of classification, part 8: Classification and inheritance”, in Journal of Object Technology, vol 2, no. 4, July-August 2003, pp. 55-64. http://www.jot.fm/issues/issue_2003_07/column4

[5] A J H Simons, “The theory of classification, part 9: Inheritance and self-reference”, in Journal of Object Technology, vol. 2, no. 6, November-December 2003, pp. 25-34. http://www.jot.fm/issues/issue_2003_11/column2

[6] A J H Simons, “The theory of classification, part 11: Adding class types to object implementations”, in Journal of Object Technology, vol 3, no. 3, March-April 2004, pp. 7-19. http://www.jot.fm/issues/issue_2004_03/column1

[7] K Bruce and J Mitchell, “PER models of subtyping, recursive types and higher-order polymorphism”, Proc. 19th ACM Symp. Principles of Prog. Langs., (1992), 316-327.

[8] P Canning, W Cook, W Hill, W Olthoff and J Mitchell, “F-bounded polymorphism for object-oriented programming”, Proc. 4th Int. Conf. Func. Prog. Lang. and Arch. (Imperial College, London, 1989), 273-280.

About the author

Anthony Simons is a Senior Lecturer and Director of Teaching in the Department of Computer Science, University of Sheffield, where he leads object-oriented research in verification and testing, type theory and language design, development methods and precise notations. He can be reached at a.simons@dcs.shef.ac.uk.

Cite this column as follows: Anthony J.H. Simons: “The Theory of Classification, Part 12: Building the Class Hierarchy”, in Journal of Object Technology, vol. 3, no. 3, May-June 2004, pp. 13-24. http://www.jot.fm/issues/issue_2004_05/column2

Previous column

Next column