The Theory of Classification

Part 8: Classification and Inheritance

Anthony J H Simons, Department of Computer Science, University of Sheffield, U.K.

COLUMN

1 INTRODUCTION

This is the eighth article in a regular series on object-oriented type theory, aimed specifically at non-theoreticians. In earlier articles, we explored the view that a programmer's class in C++ or Java corresponds in some way to a type in the formal model, and that a compatible subclass therefore corresponds to a subtype [1], [2]. In the last article, simple subtyping was found to be inadequate to express systematic relationships between recursive types [3]. Instead, we found that a second-order model with type parameters was needed. In this model, a programmer's class corresponds to a bounded polymorphic type, representing a family of similar types which share a minimum common structure and behaviour. The family likeness was expressed using a constraint, known as a function bound or F-bound [4] on the type parameter, which ensured that the parameter could only be replaced by types having at least the structure and behaviour specified in the F-bound.

This opens up a completely different formal notion of class, and consequently of inheritance. Whereas before, we thought of a class as a type, clearly it is now the pattern for a family of related types. Likewise, whereas inheritance was formerly the simple extension of a type, in the new model it is the extension of a general pattern. In this article, we explore further the differences between classes and types, developing the alternative formal model of classification and inheritance, which is quite different from subtyping [5].

2 SPECIFICATION VERSUS IMPLEMENTATION

So far in the Theory of Classification, I have been seeking to show how the intuitive notion of class in object-oriented languages has a strictly formal interpretation that is more general than the simple notion of type. At this point, I usually come up against a long-held prejudice among practically-minded programmers that the "real" difference between a class and a type is that a class is merely a programming language construct, whereas a type is the formal description of this. In other words, there is out there the entrenched view that "type = specification" and "class = implementation". In the following, I hope to show that this view is a red herring in our thinking about classification in object-oriented languages.

In the early days, we struggled to understand the formal nature of novel object-oriented language features, especially inheritance, which could sometimes be used in a strict way, to derive a family of related types, and sometimes in an opportunistic way, to extend implementations [6]. This led some to believe that objects have class and type independently [7], [8], asserting that an inheritance hierarchy was merely a convenience for describing shared implementation, whereas a separate type hierarchy was necessary to describe the subtyping relationships between the same objects. In some cases, the "classes" and "types" for the very same objects could be linked in different orders (see figure 1).

Figure 1: Sharing type (left) and implementation (right)

Now, while this is an interesting issue, it relates to conceptual design more than it relates to type theory and the notions we have been discussing here. The left-hand hierarchy expresses the conceptual family of Shapes, while the right-hand hierarchy expresses how you might conveniently derive extended records by adding variables, although it leads to conceptual nonsense: a Circle is not a kind of Point, for example. Nonetheless, it is quite possible to define a strange Circle type that is genuinely a subtype of Point - in the theory of subtyping, this would be perfectly legitimate for many definitions of Circle and Point, so long as you obeyed the rules [2]. The real issue here is one of discipline versus opportunism in conceptual design, and is not really related to types and subtyping.

Another blow to the "class = implementation" viewpoint is that programming languages like Pascal and C had types with concrete implementations long before the object-oriented notion of class was popular - nothing necessarily forces "class" to mean implementation. In Computer Science, we have always talked in terms of abstract and concrete types. Abstract types are formal, described in terms of operation signatures and axioms; concrete types have a representation in a programming language. So, why not extend this notion to classes, which can also be both abstract (formal and typeful) and concrete (practical and implemented)?

Modern object-oriented languages, like C++ and Eiffel, link the type hierarchy directly to the implementation hierarchy. Some allow further expression of type compatibility apart from the main implementation hierarchy, like Objective C and Java, which have separate interfaces to express common type relationships that cut across the main divisions in the class hierarchy. Earlier languages like POOL-T [8] developed completely independent implementation- and type-hierarchies, in the hope of preserving "pure subtyping" in a language with multiple, variant implementations. However, even the separate subtype hierarchy of POOL-T is defeated by recursive types and inescapably suffers from the same restrictions and lack of expressiveness that we identified previously for subtyping [3].

3 CLOSED VERSUS OPEN

A more satisfying way of distinguishing object-oriented classes from traditional simple types is to realise that a type is closed, in the sense of being complete, whereas a class is open-ended, in the sense of being subject to arbitrary subdivision and further specialisation. In older object-based languages like Modula-2 and Ada (pre-95, before the addition of inheritance), the type of an object is expressed exactly by its interface. In later object-oriented languages with polymorphic inheritance, the class of an object is understood to express only the minimum interface which members of the class must satisfy. This subtle difference between exact and minimum interfaces is what characterises the essential difference between a traditional programmer's type and a modern class. Taxonomic classification in biology also follows this pattern: a mammal is defined as something with (at least) warm blood and hair that bears and suckles live young. This is not a complete or finished definition - biologists don't exclaim: "Look, there's an instance of a mammal!", but rather identify dogs, cats or gerbils and show that these belong to the class of mammals, by virtue of having the four essential mammalian properties.

Mathematically, we can capture the same distinction between simple, closed types and open-ended polymorphic classes. In the previous article [3], we showed that a basic class of Numbers may be expressed using the F-bound:

According to this definition, the parameter may range over all kinds of numeric types, so long as they have at least a plus method with the specified signature. By contrast, the exact Number type is a recursive type, created from first principles as the least fixed point of the generator GenNumber (see earlier article [1] for a full explanation):

This is a very general numeric type with only a plus method - we are unlikely to use direct instances of this type (like mammal, above), but instead we will want to use instances of Integer, Real, or Complex.
Note that exact types, like Number, are fixed, whereas classes are open-ended and flexible. If we say n : Number, we assert that n is a variable that can only receive objects of exactly the Number type. On the other hand, if we say x : <: GenNumber[]), we assert that x is a variable that can receive objects of any numeric type in the class of Numbers, even types having more than the minimum required methods. This is the difference between monomorphism (exact typing) and bounded polymorphism (constrained flexible typing). In the following, types are always exact and classes are always polymorphic.

4 RELATING CLASSES AND TYPES

There is an interesting relationship between classes and types. It turns out that the exact Number type is the least type which is still a member of the polymorphic Number class. In other words, it has just enough fields to satisfy the F-bound constraint:

which unrolls (on the left) and evaluates (on the right) to give:

from which it is clear that both sides are equal. In fact, for this one case alone, we could rewrite the subtyping condition as a type equivalence:

and the reader may recall that this is exactly the same formula that identifies Number as the fixpoint of the generator GenNumber, that is, a type which is unchanged by the application of the generator [1]. From this, we know that there is exactly one type which "only just" satisfies the conditions for class membership - and this is always the recursive type created from the generator that is used to express the F-bound constraint.

We can visualise this in figure 2 by drawing the Number class as a cone-shaped bounded volume, representing a space of possible numeric types that satisfy the F-bound, and the corresponding "least type" Number, which is only just a member of this class, as the point at the apex of this cone. There are many other numeric types which belong to the Number class, which have more than the minimum required methods: figure 2 also shows the Integer type as a point in the space enclosed by the Number cone.

Figure 2: Classes as volumes containing types as points

Can we demonstrate this mathematically? Recall that the exact Integer type is given by:

after unrolling the recursion. We then apply the test for class membership, manipulating the expression on both sides, by unrolling the recursive type (on the left) and applying the type generator (on the right) to yield a comparison between record types:

which is true by the record subtyping rule [2]. We have therefore shown that the Integer type is indeed in the class of Numbers. As well as the least type Number, we may expect many more numeric types like Integer, which have more than the minimum required methods, to be members of this class. Such numeric types could include Real, Complex, Fraction and any numeric type with at least a plus method. This expresses exactly the object-oriented notion of class membership.

5 A MODEL OF CLASSIFICATION

Figure 2 also visualises how we would expect a class of Integers to nest inside the class of Numbers. This is drawn using "stacking cones" to show that the volume occupied by the Integer class is contained within the Number class. Intuitively, we ought to be able to partition (divide up) the space of Numbers into sub-spaces, corresponding to the spaces occupied by the more specific numeric classes; however, we have yet to demonstrate this idea mathematically.

The Integer class is constructed in the same way as the Number class, first by defining a type generator, a type function which accepts the self-type as an argument:

then by expressing the polymorphic Integer class using an F-bound constructed using the same generator:

This constraint ensures that the parameter may range only over those numeric types which have at least the methods: plus, minus, times and divide with the specified signatures. It should be clear now that the exact Integer type is the least member of this class:

How can we show that the Integer class nests inside the Number class? Intuitively, the purpose of classification is to show that objects which belong to specific classes also belong to the more general classes. We therefore want to be able to show that, if any actual type satisfies the constraint for the Integer class, it will also satisfy the constraint for the Number class. In other words, we seek the condition under which:

Intuitively, this holds true for these two particular generators, because GenInteger defines strictly more methods than GenNumber and otherwise it has an identical plus method. So, any type satisfying <: GenInteger[] is bound to satisfy ]. If GenInteger had not defined a plus method with a matching signature, then no types could satisfy both constraints. The condition we seek therefore depends on the structure of one generator including all the structure of the other generator.

This sounds somewhat similar to a record subtyping rule [2], except that the generators are not record types, they are type functions with distinct self-type arguments. Though we cannot make a rule to compare the generators directly, we can do this indirectly using a rule that compares instantiations of the generators, since this yields proper record types. In fact, we want to compare all possible instantiations of the two generators with the same type. This is known as a pointwise subtyping condition, which we write as:

We read this as: "For all types , the record type you get by instantiating GenInteger with is always a subtype of the record type you get by instantiating GenNumber with ". Literally, the constraint expresses two things:

the subclass generator must have a larger interface than the superclass generator
the self-types must be made to refer to the same type when making any comparison

and this is the condition under which the GenInteger generator stands in a subclass-to-superclass relationship with the GenNumber generator.
Abstracting away from numeric types, we obtain the subclass rule relating any two generators, which we shall name (arbitrarily) GenSub and GenSuper:

"If the generators GenSub and GenSuper stand in a pointwise subtyping relationship, then any type which satisfies the Sub-class constraint will also satisfy the Super-class constraint". This is the rule which describes how classes are nested inside each other. From this, we may derive another rule, which allows us to infer how types which belong to one class also belong to more general classes:

"If t is a member type of the Sub-class, and the Sub-class is nested inside the Super-class, then t is also a member type of the Super-class." These are two of the most important rules in the theory of classification describing the subclass relationship and transitive class membership.

6 A MODEL OF INHERITANCE

In our presentation, we want to distinguish carefully the notions of classification and inheritance. Classification describes the way in which exact types belong to classes and the way in which classes are nested inside one another. Inheritance describes an extension mechanism for defining more specialised classes incrementally. That is, inheritance is merely a short-hand. This means that any class we could develop incrementally using inheritance must look the same as if we had defined it as a whole, from first principles. As a challenge, we shall attempt to derive the Integer class from the Number class.

Previously, we adopted a simplistic model of inheritance, using record type extension [3]. In this approach, a record type is considered to be a set of fields, which can be extended using the set union operator to create a larger record type, which is therefore a subtype of the original record [2]. The basic model for record type extension is:

However, we found an anomaly when extending recursive record types, which is that fields obtained from the base type were fixed with the wrong types when combined in the extended record type [3]. If we derive an Integer type naïvely from a Number type, we find that the plus method for Integer ends up with intuitively the wrong type signature:

Essentially, the reason why this doesn't work is because we are extending simple types using a subtyping model and are hitting the limits of subtyping in the context of recursion. The self-types of Number and Integer are fixed independently, such that after record union, there is no single, uniform self-type, resulting in a kind of "self-type schizophrenia".

The insight offered by Cook and others [5] is that inheritance should not be modelled as the extension of simple types, but rather it is the extension of the patterns described by their generators. With generators, we are better able to control the binding of the self-types, so that these refer uniformly to the same type in the combined result. Starting with the GenNumber generator, we shall derive a GenInteger generator incrementally from it, but also rebind the self-type of the base generator at the same time:

This yields exactly the GenInteger that we hoped for, equivalent to the generator that we defined from first principles in section 5. Note how the inherited plus method is typed in terms of , the self-type of GenInteger, rather than in terms of , the old self-type of GenNumber. All the methods of GenInteger are uniformly typed in terms of , solving the "self-type schizophrenia" problem.

The trick at the heart of this derivation is the type-instantiation of the GenNumber generator, obtained through the application: GenNumber[]. Recall that GenNumber is a type-function, with a formal argument . When GenNumber is applied to , we substitute {} throughout in the body. The result of the application is a base record of method types: {plus : } which is subsequently unioned with the extension record to yield the result.

To obtain the exact Integer type, we may take the fixpoint of the GenInteger generator that we have just derived, to bind the self-type argument recursively:

This has the type signature that we desire, namely one that is uniformly typed in terms of the Integer self-type, rather than one which is schizophrenic. This seems therefore to be a more satisfactory model for explaining incremental derivations in a class hierarchy.

7 CONCLUSION

We have exposed the heart of the theory of classification, using Cook's F-bounded quantification to express the idea that a class is a family of types that share a minimum common structure. By contrasting the notion of type in older object-based languages with the notion of class in newer object-oriented languages with inheritance, we have argued that extensibility and polymorphism are what properly characterise classes apart from traditional programmer's types. Thereafter, we have used class to refer to a polymorphic family and type to refer to one member of this family.

We developed a second-order model of class membership, in which recursive types with similar structure could be shown to belong in the same class. We extended this to a model of subclassing, based on a pointwise relationship between generators, to show how classes are nested hierarchically. Finally, we showed how generator adaptation provided a more satisfactory model of incremental inheritance, avoiding problems of schizophrenic self-reference. The languages Smalltalk and Eiffel adopt this more sophisticated model of inheritance, in which inherited references to self and the self-type are redirected to refer uniformly to the subclass.

While we dispelled one myth about classes and implementation, it is clear that we have only covered the typeful aspects of classes here. We defer a treatment of the concrete aspects of classes, especially the inheritance of implementation, to a later article. This will help us further in understanding the differences between the classification model of Smalltalk and Eiffel, and the subtyping model of Java and C++.

REFERENCES

[1] A J H Simons, “The theory of classification, part 3: Object encodings and recursion”, in Journal of Object Technology, vol. 1, no. 4, September-October 2002, pp. 49-57. http://www.jot.fm/issues/issue_2002_09/column4

[2] A J H Simons, “The theory of classification, part 4: Object types and subtyping”, in Journal of Object Technology, vol. 1, no. 5, November-December 2002, pp. 27-35. http://www.jot.fm/issues/issue_2002_11/column2

[3] A J H Simons, “The theory of classification, part 7: A class is a type family”, in Journal of Object Technology, vol. 2, no. 3, May-June 2003, pp. 13-22. http://www.jot.fm/issues/issue_2003_05/column2

[4] P Canning, W Cook, W Hill, W Olthoff and J Mitchell, “F-bounded polymorphism for object-oriented programming”, Proc. 4th Int. Conf. Func. Prog. Lang. and Arch. (Imperial College, London, 1989), pp. 273-280.

[5] W Cook, W Hill and P Canning, “Inheritance is not subtyping”, Proc. 17th ACM Symp. Principles of Prog. Lang., (ACM Sigplan, 1990), pp. 125-135.

[6] M Sakkinen, “Disciplined inheritance”, Proc. 3rd European Conf. Object-Oriented Prog., (Nottingham: British Computer Society, 1989), pp. 3-24.

[7] A Snyder, “Encapsulation and inheritance in object-oriented programming languages”, Proc. 1st ACM Conf. Object-Oriented Prog., Sys., Lang. and Appl., pub. ACM Sigplan Notices, 21(11), (ACM Sigplan, 986), pp. 38-45.

[8] P America , “Designing an object-oriented language with behavioural subtyping”, Proc. Conf. Foundations of Object-Oriented Lang., (1990), pp. 60-90.

About the author

Anthony Simons is a Senior Lecturer and Director of Teaching in the Department of Computer Science, University of Sheffield, where he leads object-oriented research in verification and testing, type theory and language design, development methods and precise notations. He can be reached at a.simons@dcs.shef.ac.uk.

Cite this column as follows: Anthony J.H. Simons: “The Theory of Classification – Part 8: Classification and Inheritance”, in Journal of Object Technology, vol. 2, no. 4, July-August 2003, pp. 55-64. http://www.jot.fm/issues/issue_2003_07/column4

Previous column

Next column