The Theory of Classification

Part 10: Method Combination and Super-Reference

Anthony J H Simons, Department of Computer Science, University of Sheffield, U.K.

COLUMN

1 INTRODUCTION

This is the tenth article in a regular series on object-oriented type theory, aimed specifically at non-theoreticians. Recent articles have presented a formal model of classification that differs from the conventional model of types and subtyping [1, 2]. The popular object-oriented languages fall into two groups – those based on simple types and subtyping, such as C++ and Java, and those based on polymorphic classes and subclassing, such as Smalltalk and Eiffel [2, 3]. A programmer’s class has a formal interpretation at the typeful level [2] and a concrete interpretation the implementation level [3]. In the theory, we have been careful to distinguish classification from inheritance.

Classification is the hierarchical relationship between classes, whereby one class is judged to be a subclass of another, according to type rules governing subclassing [2]. Inheritance, on the other hand, is a short-hand mechanism for defining a new class in relation to an existing class¹, specifying what is new or different, and otherwise inheriting all existing features [3]. This presents an interesting formal challenge. If inheritance is indeed merely a short-hand, we should be able to prove in our theory that a class constructed by inheritance is equivalent to a class defined as a whole, from first principles [3]. This challenge is made more complicated by the possibility of method combination, the merging of local and inherited versions of a method. Furthermore, Smalltalk and Java can invoke inherited versions of methods through a pseudo-variable called super. So, our theory must be able to explain the meaning of super, and show how super-method invocations are eventually equivalent to ordinary method invocations.

2 METHOD COMBINATION

So far, our treatment of inheritance and method overriding has assumed that individual methods are always replaced as a whole. For example, in the previous article [3] we replaced an equal method of a two-dimensional point:

in which self refers to a Point2D instance, by an equal method of a three-dimensional point:

in which self refers to a Point3D instance. On the surface, the overriding method appears to be syntactically quite similar to the method that was replaced. In most object-oriented languages, programmers don’t have to write out such replacement methods long-hand; instead the languages offer a short-hand mechanism for adapting the inherited version of the method. The idea is that the overriding method may somehow reuse the code body of the inherited method and need only specify what additional computations take place. This is known as method combination.

In C++ and Eiffel, method combination is achieved simply by naming the local and inherited versions of the method differently. In C++ it is always possible to qualify method names globally by their owning class, in the style: ParentClass::method and ChildClass::method. So, the redefined body of the Child’s method may invoke ParentClass::method explicitly, by referring to its global name. In Eiffel, the name of a method may be locally renamed when it is inherited, in the style: class Child inherit Parent rename method as old_method … end. Then, the body of Child’s method may invoke old_method. In both of these cases, invoking an inherited method inside a redefined version is no different from invoking any other differently-named method.

Smalltalk and Java achieve method combination in a much more interesting way. These languages have a pseudo-variable called super, which somehow allows a programmer to invoke an inherited version of a method inside the redefined version of the same method, without renaming the method. In the theoretical model, we can express Point3D’s equal method more simply as:

In this, super somehow stands for the current object in the context of the parent class. Whereas an invocation self.equal(…) would call the local version, super.equal(…) is deemed to call the inherited version of the same method. It is quite difficult to understand exactly what object super refers to! By the end of this article, we hope to answer this important question. However, one consideration is that the sub-expression: super.equal(p) must eventually be shown to be equivalent to the portion of code in the long-hand version of the method:

3 RENAMING METHODS

To handle method combination in C++ and Eiffel, we must first consider how to rename methods in the theoretical model. In C++, we could either decide that the global names exist permanently as alternative, duplicate names for the same methods, or we could introduce them on demand, when we want to combine methods. In Eiffel, however, we must allow methods to be renamed locally, on demand, during inheritance. Recall that inheritance is modelled as record combination in the theoretical model [3], in which a base record is extended by combining it with a record of extra methods, to yield a derived record:

Intuitively, we now want to rename methods in the base record, then combine this with the extra record. For this, we need a renaming operator ®, and expect to use it in the following way:

in which renamings is a map from original labels to revised labels. In the same style as the union with override operator [3], we may define the renaming operator to accept an object record (a map from labels to methods) and a record of renamings (a map from labels to labels) and return a new object record in which some labels have been replaced:

The top line is a polymorphic type signature [1], saying that ® takes two maps with the individual type signatures and , and returns a map with the signature . The type is the label-type, and is the method-type in the object record. Note how the object-type of the result is unchanged, reflecting that methods have merely been renamed.

The full definition follows. This says that ® takes two argument maps, f and g (with the given types) and produces a result map (the whole expression in braces). This result is the set of all those maplets k v that satisfy the following conditions (after the vertical bar | ). For all original labels h in the domain of the base object f, if h is also listed in the domain of the renamings g, the resulting label k is equal to the translation g(h), otherwise k is equal to the original label h. The corresponding values v are always equal to f(h), as in the original object map. Recall that f(h), g(h) denote range values by appealing to the functional interpretation of maps: f(h) “applies” the map f to the domain value h, yielding the corresponding range value.

We can use this operator to rename methods of an object. Here is a simple Point2D instance:

in which we wish to rename the equal method with a longer, qualified name, old_equal:

The renaming operator ® takes, as its operands, the object and a map of renamings (see bold highlight), yielding an object in which the equal method has been renamed.

4 METHOD COMBINATION WITH RENAMING

This allows us to construct a model of inheritance with method combination supported through renaming. The following is an expression to derive aPoint3D from aPoint2D by renaming its equal method and then supplying a redefined version, which refers back to the old version:

The left-hand operand of the record combination operator ? is a renaming expression (see first bold highlight), in which the equal method is renamed before combination. The right-hand operand of ? is a record of extra methods, in which the redefined equal method invokes the renamed method old_equal in its body (see second bold highlight), thereby benefiting from the more succinct syntax offered by method combination.

We want to show that this method combination syntax is equivalent to a regular, wholesale method replacement. Simplifying the renaming expression yields a base record (see bold highlight):

which in turn may be combined with the record of extra methods (see [3]) to yield the following unusual record, in which inherited self-references are resolved to aPoint2D, while the local self is bound over aPoint3D:

The resulting object has both the renamed method old_equal and the redefined version equal. This is quite normal in languages with renaming. Note that we have deliberately given these two functions different formal argument names, p and q, in order to observe what happens to object references in the next stage. We now want to understand the precise meaning of the redefined equal in detail. To do this, we may simplify the embedded call to the old version of the method (see bold highlight above). This simplification is equivalent to replacing the call by the inlined body of the old_equal method, after substituting the argument {q/p}, viz the evaluation:

This internal simplification is performed exactly like any other function simplification; and may be thought of as an evaluation in which the call is replaced by the body after arguments have been substituted. In this case, the actual argument q (a Point3D instance) is substituted in place of the formal argument p (a Point2D variable) yielding the simplified form:

This demonstrates formally how the derived equal method does indeed compare all of the x, y and z dimensions of the Point3D instance q. However, note how the body of this method suffers from the same kind of schizophrenia that we have noted before [2, 3] when dealing with simple object types. The local binding is self aPoint2D. After the internal simplification, the combined method is comparing mixtures of 2D and 3D points for the equality of their x and y fields! For this reason, we cannot yet regard this kind of method combination as wholly equivalent to defining a replacement method wholesale.

5 FLEXIBLE METHOD COMBINATION WITH RENAMING

The problem is one of ensuring uniform self-reference in both local and inherited methods. Recall that in Eiffel, self-reference is flexible, such that inherited occurrences of self (known as current in Eiffel) are redirected to refer to the derived object. In the formal model, this requires the use of an object generator to express the object definition [3]:

This is different from a plain object in that self is a formal argument of the generator. As such, we can bind it to different values, representing the different objects that self may range over. Inheritance is modelled as an adaptation on generators [3]. We now add the renaming scheme to this model:

The critical difference is in the way self-reference is modified, through the generator application: genAPoint2D(self), yielding an adapted record in which self refers to the new object. After simplifying the renaming-expression (see bold highlights above and below), this yields:

and after record combination, this yields a result in which both equal and old_equal methods co-exist, but all self-reference is now uniform:

We now simplify the body of the equal method, by expanding inline the call to old_equal:

This yields the final result (see highlights) in which self refers uniformly to the current object. This is a satisfactory outcome, in that it meets our requirement that method combination should be provably equivalent to redefining the method wholesale. Because of its special treatment of current, Eiffel’s method combination with renaming follows this semantics.

6 THE MEANING OF SUPER

However, it may be considered inelegant for objects to keep both the old and new versions of a method. Having to rename methods is irksome and keeping both versions is redundant, especially if you only want the old version once, during method combination. For this reason, languages like Smalltalk and Java provide “one time access” to the inherited versions of methods, when redefining the same methods, through a special pseudo-variable called super:

It is clear that super must refer in some sense to the current object, somewhat like self, yet different from the point of view of method lookup. The operational explanation of super-method invocation is typically that “the search for a method starts in the immediate superclass of the class of self” [4]. Other attempts at describing super sometimes say that it is “the inherited object” or “the embedded parent-part of the current object”. In fact, the meaning of super is subtle and varies from language to language, as we shall show below.

We may construct a model of super from the operational description of method lookup. The most obvious way of invoking the “next most general” version of a method in our theory is to ensure that we select it from the “next most general” version of the object-instance with which we are dealing. In other words, if our most recently derived object is expressed as:

such that we would expect derived.method to invoke the latest version of some method, then base.method should be the expression that invokes the next most general version of the same method, skipping over any redefinition supplied in the extra record. From this analysis, it seems clear that super is equivalent to the base object record, in a record combination expression.

In the theory, this object is always supplied as the left-hand operand to the record combination operator (see bold highlights below). Recall that in record combination with simple object records (see section 6 in [3]) we have:

which indicates that this operand is in fact equivalent to an instance of the base type. This gives the meaning of super in Java, a language based on simple types and subtyping in our theory. The variable super corresponds to the embedded parent-part of the current object, self, or more exactly to the inherited parent-part before any fields are replaced during record combination. It has the type super : Point2D and behaves exactly like an instance of the base type, in that self-references in super-methods refer back to super.

On the other hand, in flexible record combination with object generators (see section 7 in [3]) we have the completely different left-hand operand:

which is an adapted record, obtained by applying the base generator to the new self reference (of the derived generator). This gives the exact meaning of super in Smalltalk, a language based on classes and subclassing in our theory. The variable super is like an adapted instance of the parent type, in which self-reference is redirected to refer to the current, derived object. It has an adapted type given by the application of a type generator super : GenPoint3D[Point2D] and behaves differently from an instance of the base type, in that self-references in super-methods refer to the whole of the current, derived object, rather than to an embedded parent instance.

7 METHOD COMBINATION WITH SUPER-REFERENCE

In order to model method combination with super-reference in our theory, we need to introduce the super variable, and bind it to a value standing for the (possibly adapted) parent instance, such that we may refer to super throughout the record combination expression, in the style:

The variable may be introduced as the formal argument of a function: super.(...) which is later bound to a suitable value. The order of variable introduction is dictated by the need for super to be bound outside the scope of record combination, but inside the scope of self:

Here, the expression super.(...) is a function, binding super, whose body is a record combination expression that contains free references to super as intended. This super-function is then applied to the value: aPoint2D. To verify that this is equivalent to regular record combination, we may simplify the super-function application internally, with the binding super aPoint2D, to obtain:

Firstly, we see that super is replaced by the desired parent object on the left-hand side of the combination operator. Secondly, we find that super.equal(…) translates exactly into an expression invoking the equal method of a parent instance, which is promising, since it clearly skips the current version. To simplify this internally, we replace the call by the inlined body of the parent’s method, after substituting the argument {q/p} as before. After record combination:

this yields a non-uniform solution similar to that in section 4 above, but without duplicate versions of the equal method. This model of method combination explains the behaviour of languages like Java, in which super always resolves to an instance of the immediate parent type.

The more flexible kind of method combination with object generators may be modelled as:

in which the super-function is bound differently with super genAPoint2D(self). Note in passing how self must be bound before we can bind super. This time, super does not denote a parent instance, but rather an adapted object, the result of applying the parent generator to self. To explore further what this means, we may simplify the super-function application internally, yielding:

From this, it appears that the super.equal(…) invocation is equivalent to invoking equal on an adapted parent object. This is quite subtle, because self-reference is redirected in this object onto the new self of 3D points. We can illustrate this more graphically by expanding super:

From this, it is clear that super.equal will select the inherited equal method body, in which self refers back to the local Point3D instance. Furthermore, super.equal(q) will produce the argument substitution {q/p}, where q is implicitly a variable of the type Point3D:

When this subexpression is replaced in the body of the local equal method, we obtain a combined equal method, which has consistent self-reference and an argument in the same type:

Method combination using super-reference is thereby proved to be equivalent to redefining the method wholesale. Note how the more subtle semantics of super is needed for this to work out fully. Cook et al. were the first to identify this formal interpretation of super in Smalltalk [5, 6], from the operational description of inheritance in that language.

8 CONCLUSION

We have constructed four different models for inheritance with method combination. Two involve the renaming of methods, in the style of C++ and Eiffel. Two involve the use of the pseudo-variable super, in the style of Java and Smalltalk. While C++ and Java have a simple model of inheritance, based on the extension of object records, Smalltalk and Eiffel have a more subtle model of inheritance, based on the extension of object generators. Hereafter in our theory, we shall refer to the simple extension model as derivation, to distinguish it from “genuine” inheritance, in which self-references are redirected to refer to more specific objects [3, 5].

Considering each approach to method combination individually, the first uses derivation and renaming. The second uses inheritance and renaming, of which Eiffel is the exemplar. The third uses derivation and super-reference, of which Java is the exemplar. The fourth uses inheritance and super-reference, of which Smalltalk is the exemplar. An even simpler model exists for C++, if we allow objects to have two names for each method, one local and one global:

and adopt the convention that only the local names are ever overridden. In this way, we can express the combined equal method for Point3D as:

which simplifies in accordance with our first model, above. The method combination strategies using inheritance were shown to be equivalent to wholesale method replacement, demonstrating again the usefulness of the theoretical model. The model also provided the pseudo-variable super with two semantic interpretations, corresponding to the meanings of this variable in Java and Smalltalk. We also provided an original renaming operator ® to account for Eiffel’s behaviour.

Footnotes

¹ Popular books on object-oriented programming sometimes confuse these notions, referring to the hierarchical relationship as “inheritance”. Strictly speaking, inheritance is just the extension mechanism. However, an inheriting class will typically be a subclass, but only by virtue of obeying the rules about classification [2].

REFERENCES

[1] A J H Simons, “The theory of classification, part 7: A class is a type family”, in Journal of Object Technology, vol. 2, no. 3, May-June 2003, pp. 13-22. http://www.jot.fm/issues/issue_2003_05/column2

[2] A J H Simons, “The theory of classification, part 8: Classification and Inheritance”, in Journal of Object Technology, vol. 2, no. 4, July-August 2003, pp. 55-64. http://www.jot.fm/issues/issue_2003_07/column4

[3] A J H Simons, “The theory of classification, part 9: Inheritance and Self-Reference”, in Journal of Object Technology, vol. 2, no. 6, November-December 2003, pp. 25-34. http://www.jot.fm/issues/issue_2003_11/column2

[4] A Goldberg and D Robson, Smalltalk 80: The Language and its Implementation, Addison-Wesley, 1983.

[5] W Cook, W Hill and P Canning, “Inheritance is not subtyping”, Proc. 17th ACM Symp. Principles of Prog. Lang., (ACM Sigplan, 1990), pp. 125-135.

[6] W Cook and J Palsberg, “A denotational semantics of inheritance and its correctness”, Proc. 4th ACM Conf. Obj.-Oriented Prog. Sys. Lang. and Appl., pub. Sigplan Notices, 24(10), (ACM Sigplan, 1989), pp. 433-443.

About the author

Anthony Simons is a Senior Lecturer and Director of Teaching in the Department of Computer Science, University of Sheffield, where he leads object-oriented research in verification and testing, type theory and language design, development methods and precise notations. He can be reached at a.simons@dcs.shef.ac.uk.

Cite this column as follows: Anthony J.H. Simons: “The Theory of Classification, Part 10: Method Combination and Super-Reference”, in Journal of Object Technology, vol. 3, no. 1, January-February 2004, pp. 43-53. http://www.jot.fm/issues/issue_2004_01/column4

Previous column

Next column