The Theory of Classification

Part 14: Modification and Objects like Myself

Anthony J H Simons, Department of Computer Science, University of Sheffield, U.K.

COLUMN

1 INTRODUCTION

This is the fourteenth article in a regular series on object-oriented theory for non-specialists. Previous articles have built up models of objects [1], types [2] and classes [3] in the -calculus. Inheritance has been shown to extend both types [4] and implementations [5, 6], in the contrasting styles found in the two main families of object-oriented languages. One group is based on (first-order) types and subtyping, and includes Java and C++, whereas the other is based on (second-order) classes and subclassing, and includes Smalltalk and Eiffel. The most recent article demonstrated how generic types (templates) and generic classes can be added to the Theory of Classification [7], using various Stack and List container-types as examples.

The last article concentrated just on the typeful aspects of generic classes, avoiding the tricky issue of their implementations, even though previous articles have demonstrated how to model both types and implementations separately [4, 5] and in combination [6]. This is because many of the operations of a Stack or a List return modified objects “like themselves”; and it turns out that this is one of the more difficult things to model in the Theory of Classification. The attentive reader will have noticed that we have so far avoided the whole issue of object modification. This is, after all, quite an important area to consider, since one of the main benefits of object-oriented programming is to encapsulate state and handle state updates in a clean fashion. In the current article, we consider the whole area of environment modelling and the creation of modified objects. Eventually, this leads to an extension to the theory to handle constructor-methods, through the use of which an object can create another object “like itself”.

2 THE GLOBAL ENVIRONMENT

The whole idea of object modification, modelled as updates to the values stored in attribute variables, is problematic in a functional calculus like the -calculus. This is because pure functional languages do not support the notion of reassignment to variables. This would be a side-effect, and pure functional languages are intentionally free of side-effects, a property known as referential transparency. However, the effect of assignment may be approximated using a global set of variable bindings, called the environment, which is passed from function to function as the program is executed. The environment is an associative map from variable names to their bound values. For example, the following map is an environment which contains two variables p1, p2 which map to records representing simple coordinates:

During program execution, we may want certain statements to update the global environment. Modifications cannot literally change the state of the environment, since we are working in a pure functional language without side-effects; instead, they construct new environments in which appropriate changes have been made. The environment must therefore be passed in and out of each function, since assignments may occur at any point and their effect must be recorded. Every function accepts the environment as an extra first argument. Likewise, every function returns a packaged result, which is a pair of the environment and the function’s usual return value. The caller of a function must unpack the returned result to determine the state of the environment, as well as accessing the ordinary return value.

Initially, the environment is empty. As variables are declared and initialised, these are added to the environment using a function like env-add:

which takes an environment, a variable name and a value to bind to this variable in the environment. The function returns the new environment, in which the old environment env is combined with a maplet from the variable var to the value val. Since the function override operator is used [5], this will ensure that {varval} is added to the bindings in the environment, replacing any existing binding for var. This is useful to model both variable declaration with initialisation, when a variable is first added to the environment, and variable reassignment, when the value associated with a variable is replaced.

At any moment, the environment contains the most recently-bound version of each of the variables. In the body of a function, access to any global variable is modelled by looking up the value stored in the environment. Since the environment is basically a map (which is the same thing as a finite function [1]) this can be done by applying the environment, like a function, to the labels used as variable names. For example, the following expression looks up the value of p1 in globalEnv:

To change the coordinate associated with variable p1, we may execute the expression:

and this rebinds the value of the variable p1, returning a new environment in which p1 maps to a different coordinate instance. This models the notion of reassignment.

3 LOCAL AND GLOBAL UPDATES

Things are made slightly more complicated if we want both global and local variable bindings. If all functions merely had local variables, upon function exit the environment could revert to the environment that was originally passed to the function. This would ensure that the bindings set up on entry to the function were forgotten and any older bindings for the same variables were restored. However, we want the global bindings to persist between each function call. The environment passed to a function may possibly be modified and should therefore be handed back as part of the result.

One possible approach is to try to distinguish between the global environment, and local variables, which are bound on entry to functions in the usual way. The problem with this approach is that a local variable may shadow the name of a global variable in the environment. When such a variable is updated, we would expect the local copy to be modified, since the global variable would be hidden. Upon exit from this scope, the global binding would be restored. However, it is hard to imagine how we could integrate primitive -calculus binding with looking up variables in a constructed environment. In any case, most state variables are introduced as local variables within the scope of some object or function, so it is hard to distinguish between the two kinds of variables in practice.

Another approach is to use a multimap for the environment, that is, a kind of map with duplicated keys, rather like the association list provided in Common Lisp. Whenever a scope is entered and a new variable binding is added, it is inserted ahead of any existing bindings for the same variable. Whenever a value is looked up, the lookup function returns the first bound value it finds, which hides any other older bindings found later in the list. Whenever a scope is exited, all local variables are descoped by explicitly removing the first binding found for each variable, so restoring any older bindings. Whenever a variable is reassigned, the first occurrence of the variable is rebound to the new value; and this works whether the variable is global or local. To do this, we need a family of functions to manipulate the environment:

which behave as described above. The implementations of these functions would use primitive list operations, such as cons, head and tail to search through the lists representing the multimaps. We defer a fuller treatment of this until a later article.

4 MODELLING UPDATES AS NEW OBJECTS

For the moment, the simplest approach is to assume that objects in the theoretical model are “pure functional objects”, that is, all modifications to object state do not literally modify the state of the object, rather they create and return a new object in which the changes are manifest. There is no fundamental reason why an imperative language cannot be approximated by a functional calculus in this way. The only difference is that all state-modifying methods, which are typically void-methods in a concrete language, now have to return a new instance of their owning type. Sequences of modifications have to be modelled as nested method invocations. As an example, consider the following Point type:

which, in addition to the usual x, y and equal methods, has a move method to update its position. This method is typed to return another Point object, reflecting the fact that the modified position is in fact a newly-created instance of Point. Issuing a sequence of move instructions to a Point object p could be represented by the following nested method invocations:

since each move returns a new Point, which becomes the receiver of the subsequent move message. Although, in the model, the final Point instance at (6, 7) is a distinct object from the original p : Point, we can still reason about such sequences of update operations. But there is a catch: while the idea sounds straightforward in principle, it turns out that implementing the move method in practice is quite difficult to accomplish in the theoretical model. So far, none of our object types has had the ability to create new instances “like itself”. As we shall see below, this requires yet another level of recursion, in which objects contain their own object constructors.

5 CONSTRUCTORS FOR OBJECTS

In an earlier article [6], we established the basic strategy for constructing new objects. It involved first constructing the object type, then the object implementation, from generators. The basic Point record type (without the move method) must be constructed from a generator, because it refers to itself recursively:

and the recursion variable is later bound to the record type by taking the fixpoint using Y.

The basic point record instance must also be constructed from a generator, because it refers to itself recursively. A typed object generator is used, so that we can attach types to the bound variables. This is the generator for a specific point instance, at the location (2, 3):

The recursive point instance is then constructed by supplying a type Point as the first type-argument, and then taking the fixpoint using Y, to bind the recursion variable self over the rest of the record:

While this works well for examples of specific points, we wanted to allow the creation of points that were initialised to different coordinates. To do this, we extended the generator to accept an extra initialisation argument, a pair of Integers [6]:

And from this, we could define a simple object constructor, makePoint, which uses the type generator GenPoint and the extended object generator initPoint internally to establish the recursive Point type, and the recursive point instance, respectively:

An example of creating a point instance at a different location is given by:

The main thing to notice about all this is that object generators like initPoint contain within them the structure of the instance that they create. The initPoint function accepts some arguments (a type argument, then a pair of Integers, then the value for self) and returns, as the body, the structure of the point instance. The generator must logically exist prior to the creation of any object instance. It is therefore hard to imagine how we might create an object instance that contains its own generator! This is rather like trying to pull yourself up by your own shoelaces.

6 CREATING OBJECTS LIKE MYSELF

However, as we anticipated in section 4 above, every time the move method is invoked, we require a Point object to create a new instance like itself, except that the x and y coordinates will take on different values. This is exactly like requiring an object to have its own constructor as one of its methods. To see this, we shall change the definition of the Point-type, so that it now also has a move method. Defining the type is straightforward:

However, the implementation is less straightforward. If we could assume that an object constructor makePoint already existed, we could provide the extended generator for moveable points with extra initialisation arguments as the following:

Here, the implementation of the move method has been added, in which the body simply calls makePoint with the same pair of Integer arguments that were given to the move method. This should in principle create and return a new Point instance at the desired location.

However, the above is not yet a legal definition. The reason for this is that makePoint must use initPoint internally to construct the new object instance. As a result, the above implies a recursive definition of initPoint. To see the recursion more clearly, we can replace the occurrence of makePoint by its expansion into generators (this comes from the body of the definition of makePoint in section 5 above):

From the bold highlight, it is clear that initPoint occurs both on the left and right-hand sides of this “definition”. As readers of this series will appreciate, a recursive definition is not a proper definition in the -calculus [1], but merely an equation that must be solved for some value of initPoint.

7 CONSTRUCTOR-LEVEL RECURSION

It is interesting to note that the model now requires recursion on three different levels:

object-level recursion, because objects frequently need to refer internally to self, so that methods can call other methods of the same object;
type-level recursion, because object types frequently define methods that accept and return objects of the same type as themselves;
constructor-level recursion, because objects frequently have methods that construct and return other objects like themselves.

Constructor-level recursion was first identified by Cook and others [8]. In the cited paper, they referred to this initially as “class-level” recursion. Later, this was changed to “constructor-level” recursion, to better reflect the facts [9].

The technique for solving constructor-level recursion is the same one we have used for solving recursive definitions before [1], in which we abstract at the point of recursion and introduce a new variable standing for the recursively-defined thing, here the object constructor for Points. At first, it is tempting to think that we need to introduce a recursion variable standing for the whole of initPoint. However, the type-argument of this function doesn’t enter into the constructor-recursion: we know in advance that we are always going to create things of type , where is eventually bound to Point. The object constructor function is something that takes a pair of Integers and returns genPoint : , a simple object generator for building recursive Point instances after their fields have been initialised. We shall therefore introduce a constructor recursion variable ctor, with the type:

standing for the Point constructor. We will introduce this recursion variable after the type argument (so that will be bound) but before the other arguments from initPoint, because we need to fix the constructor-level recursion before we accept Integer arguments and fix the object-level recursion. The result is a typed generator for an object constructor, which we shall call genInitPoint and which has the type signature:

This looks a little daunting, but essentially it is similar to the type signature for initPoint, with an extra type argument in the signature, giving the type of the recursion variable ctor, standing for the constructor. The full definition of genInitPoint is given by:

In this, ctor is introduced as the extra recursion variable standing for the Point constructor. This allows the use of ctor in the body of the move method. The application of the fixpoint finder Y is essentially to bind the object-level recursion inside instances created by ctor. We shall return to this below. Note that genInitPoint is now properly defined, without having to refer recursively to itself.

8 OBJECTS WITH CONSTRUCTOR METHODS

Point instances, which now have their own constructor method move, are created from this generator in several stages. First, we supply the desired type argument Point:

This yields a typed function in which {Point/} has been substituted throughout. The first argument to this function is the recursion variable ctor. We want to bind ctor recursively over the rest of the body, using the fixpoint finder Y. But first, let us consider the type signature of the above function (after step 1), to see whether taking the fixpoint is a legitimate operation. It has the signature:

in other words, it takes a first argument of the ctor constructor-type and then returns something with exactly the same type signature (if we ignore the bracketing of the remaining types). This is useful, because it satisfies the conditions for a generator. Generators must always have the form: gen:, since, when they are applied to some argument, they must return that argument unchanged [1]. So, the step-1 result is indeed a generator-for-a-constructor, which we can now fix in step 2:

This yields a Point-constructor function beginning (a, b : Integer x Integer) ... and in whose body ctor is recursively fixed to refer to this, the same Point-constructor function. We denote this fact by prefixing the function with ctor, according to convention [1]. Below, we must remember that ctor now refers to this function, the step-2 result. In the next step, we supply the desired coordinate position for a particular point instance:

This yields a function in which {2/a, 3/b} have been substituted, thereby supplying the coordinate position (2, 3) for the first point instance. The step-3 result is a function beginning (self : Point)…, in other words, a simple object generator, whose object-level recursion we can fix in the usual way using Y:

This is now a point instance, in which self refers recursively to this instance, and ctor refers back to the constructor which built this instance! Note how the formula for creating an object that contains its own constructor requires an additional fixpoint operation. The above formula could be rewritten to show all three fixpoints, including the type-level fixpoint:

In this, the innermost fixpoint: [Y GenPoint] fixes the type-level recursion, yielding the recursive Point type. The next outermost fixpoint: Y (genInitPoint[YGenPoint]) fixes the constructor-level recursion, yielding the recursive ctor constructor. The outermost fixpoint: Y (Y (genInitPoint[YGenPoint]) (2, 3)) fixes the object-level recursion, after the constructor ctor has been applied to some initialisation arguments (2, 3), yielding the recursive point instance.

9 UPDATING A POINT

Let us call this initial Point instance p1, and construct it using the formula:

This object has a move method, which contains a recursive reference to the object constructor ctor that was used in the building of p1. We would like to see the effect of invoking the move method on p1, to see what kind of object this returns. We should like it to return a new Point instance at a different location, so we shall call this instance p2:

At this stage, we have to refer back to the definition of ctor to understand how to simplify this any further. We obtain this definition formally by unrolling the value of the variable ctor, which was recursively bound at the end of step 2, above. The following sidebar shows this:

So we can now continue the main simplification of p1.move(4,5) with the substitution:

This is the final result, showing that p2 is another Point instance at the coordinates (4, 5) and with its own copy of the constructor ctor embedded inside its move method. So, we have shown that it is possible to create objects with their own constructor-methods embedded inside them. However, it was theoretically dense and required three different levels of fixpoints.

10 CONCLUSION

We started this article with a discussion of how to model object state updates in the Theory of Classification. In -calculus, variable reassignment is prohibited, but the same effect may be approximated by extending all functions to accept and return an environment argument, which is some kind of map storing the current variable bindings. The machinery for binding and unbinding variables is quite complex: eventually, we must use a multimap and handle all binding, unbinding and lookup explicitly. Furthermore, we need implementations of List-methods like cons, head and tail to manipulate the multimaps, which are essentially association lists with duplicated keys.

An alternative approach is to model state updates as the creation of new objects. A sequence of updates is modelled as a nested series of method invocations. However this requires a new level of sophistication in the model, in which objects contain their own constructors. The major part of this article was devoted to explaining constructor-level recursion. This is the third kind of recursion, after object-level and type-level recursion. With this facility, we were able to provide Point objects with a method: move : Integer x Integer Point, which returns a new Point instance. This same facility would be required to define the List-methods cons and tail, which both return new List instances. So, constructor-level recursion is a generally useful feature, essential for the definition of more complex kinds of datatype. With this facility, we may now provide implementations for the container-classes like List and Stack, which had been deferred in the previous article [7].

REFERENCES

[1] A J H Simons: “The Theory of Classification, Part 3: Object Encodings and Recursion”, in Journal of Object Technology, vol. 1, no. 4, September-October 2002, pp. 49-57. http://www.jot.fm/issues/issue_2002_09/column4

[2] A J H Simons: “The Theory of Classification, Part 4: Object Types and Subtyping”, in Journal of Object Technology, vol. 1, no. 5, November-December 2002, pp. 27-35. http://www.jot.fm/issues/issue_2002_11/column2

[3] A J H Simons: “The Theory of Classification, Part 7: A Class is a Type Family”, in Journal of Object Technology, vol. 2, no. 3, May-June 2003, pp. 13-22. http://www.jot.fm/issues/issue_2003_05/column2

[4] A J H Simons: “The Theory of Classification, Part 8: Classification and Inheritance”, in Journal of Object Technology, vol. 2, no. 4, July-August 2003, pp. 55-64. http://www.jot.fm/issues/issue_2003_07/column4

[5] A J H Simons: “The Theory of Classification, Part 9: Inheritance and Self-Reference”, in Journal of Object Technology, vol. 2, no. 6, November-December 2003, pp. 25-34. http://www.jot.fm/issues/issue_2003_11/column2

[6] A J H Simons: “The Theory of Classification, Part 12: Building the Class Hierarchy”, in Journal of Object Technology, vol. 3, no. 5, May-June 2004, pp. 13-24. http://www.jot.fm/issues/issue_2004_05/column2

[7] A J H Simons: “The Theory of Classification, Part 13: Template Classes and Genericity”, in Journal of Object Technology, vol. 3, no. 7, July-August 2004, pp. 15-25. http://www.jot.fm/issues/issue_2004_07/column2

[8] W Cook, W Hill and P Canning: “Inheritance is not Subtyping”, Proc. 17th ACM Symp. Principles of Prog. Lang., (ACM Sigplan, 1990), pp. 125-135.

[9] W Harris, Typed Object-Oriented Programming: ABEL Project Posthumous Report, Hewlett-Packard Laboratories (1991).

About the author

Anthony Simons is a Senior Lecturer and Director of Teaching in the Department of Computer Science, University of Sheffield, where he leads object-oriented research in verification and testing, type theory and language design, development methods and precise notations. He can be reached at a.simons@dcs.shef.ac.uk.

Cite this column as follows: Anthony J.H. Simons: “The Theory of Classification, Part 13: Template Classes and Genericity”, in Journal of Object Technology, vol. 3, no. 8, September-October 2004, pp. 15-26. http://www.jot.fm/issues/issue_2004_09/column2

Previous column

Next column