1 INTRODUCTION This is the third article in a regular series on object-oriented type theory, aimed specifically at non-theoreticians. Eventually, we aim to explain the behaviour of languages such as Smalltalk, C++, Eiffel and Java in a consistent framework, modelling features such as classes, inheritance, polymorphism, message passing, method combination and templates or generic parameters. This will be the "Theory of Classification" of the series title. Along the way, we shall look at some important theoretical approaches, such as subtyping, F-bounds, matching and, in this article, the primitive object calculus and the fixpoint theorem for recursion. The first article [1] introduced the notion of type from both the practical and mathematical points of view and the second article [2] introduced some examples of type rules for constructing and checking simple expressions. Using a starter-kit containing only set theory and boolean logic, we built models for pairs and functions, eventually encoding objects as records, a kind of finite function mapping from labels to values. However, this is only one of three fundamentally different approaches to encoding objects in the primitive model [3, 4, 5]. The first two are based on set theory and the -calculus [5], the calculus of primitive objects. In this article, we investigate the benefits and disadvantages of different object encodings. 2 EXISTENTIAL OBJECT ENCODING The first encoding style is based on data abstraction [3, 4]. It represents an object as an explicit pair of state and methods (rules for constructing pairs were given in the previous article [2]). In this approach, a simple Cartesian Point type is defined as follows: This definition has the sense of "let there be some representation type An instance of a Point type may be defined with a particular concrete representation (here, we assume that
Existential encoding models the hiding of state, rather like the use of One way would be to define a special method invocation operator "" to hide the ungainly syntax, such that the expression: However, this has several drawbacks. Firstly, separate versions of "" would have to accept objects, messages and arguments of all types, requiring a much more complicated higher-order type system to express well-typed messages. 3 FUNCTIONAL OBJECT ENCODING For this reason, we prefer the second encoding, in which objects are represented as defines inc inside the scope of y. The function Using this encoding, objects can be modelled directly as functions. This may sound strange, but recall how a record is really a finite set of label-to-value mappings, while a function is a general set of value-to-value mappings [2]. Records are clearly a subset of functions. In this view, any object is a function: (a : A).e, where the argument a : A is a label and the function body e is a multibranch if-statement, returning different values for different labels. We can model method invocation directly as function application, for example if we have Point p, then p.x in the program may be interpreted as: p(x) in the calculus. In an untyped universe, untyped functions are sufficient to model objects. However, in a typed universe, records are subtly different from functions, in that each field may hold a value of a different type. For this reason, we use a special syntax for records and record selection [2], which allows us to determine the types of particular fields. In this approach, a simple Cartesian Point type is defined as follows: This definition has the sense of "let An instance of this Point type may be defined as follows: in which xv and yv are state variables in whose scope aPoint is defined. The constructor function make-point from the previous article [2] serves exactly the same purpose as the In this encoding, method invocation has a direct interpretation. In the program, we may have a Point p and invoke p.x; the model uses exactly the same syntax and furthermore, we can determine the types of selection expressions using the dot "." operator from the record elimination rule [2]. Note how, in this encoding, the functions representing methods have one fewer argument each. This is because we no longer have to supply the 4 RECURSION EVERYWHERE Objects are naturally recursive things. The methods of an object frequently invoke other methods in the same object. To model this effectively, we need to keep a handle on This declares An As programmers, we take recursion for granted. However, it is a considerable problem from a theoretical point of view. So far, we have not demonstrated that recursion exists in the model, nor have we constructed it from first principles. Consider that the so-called "definition" of a recursive Point type in the (deliberately faulty) style: is not actually a 5 THE FIXPOINT THEOREM In high school algebra, the trick is to isolate the variable x: the above becomes: x Note that GenPoint is not recursive. GenPoint is a type function - it accepts one type argument, pnt, and returns a record type, in which pnt is bound to the supplied argument. We can think of GenPoint as a type generator (hence the name). We may apply GenPoint to any type we like, and so construct a record type that looks something like a Point. However to obtain exactly the Point record type we desire, we must substitute Point/pnt: which is fine, except that it doesn't solve the recursion problem. All we have managed to do is rephrase it as: Point = GenPoint[Point], with Point still on both sides of the equation. This is nonetheless interesting, in that Point is Point Point
which, finally, is a non-recursive definition of Point. Point is called the and, for readers prepared to attempt the following exercise, you can show that:
6 THE OBJECT CALCULUS The third and most radical encoding changes the underlying calculus on which the model is based. To appreciate this contrast, we must understand something of the -
and function application (known as -reduction):
These notions are familiar to anyone who has ever programmed in a language with functions. The *-reduction rule has the sense: "a function of x with body e, when applied to a value v, is simplified to yield a result, which is the body e in which all occurrences of the variable x have been replaced by v". As programmers, we like to think in terms of passing actual argument v to formal argument x and then evaluating body e. From the point of view of the calculus, this is simply a mechanical substitution, written e{v/x} and meaning "v substituted for x in e"; and "evaluation" simply corresponds to further symbolic simplification. Abadi and Cardelli's theory of primitive objects [5] introduced the -calculus in which the fundamental operations are the construction of objects, the invocation of methods, and the replacement of methods (useful for explaining field updates and overriding):
Primitive operators include brackets [], the sigma-binder -calculus, in that it automatically binds the argument x to the object from which the method is selected. In the expression: o.m, the value of m is selected
in which all methods bind the We cannot dispense with recursion altogether, for the Point type requires another Point as the argument of the equal method. The Point type is defined as:
where -calculus. 7 CONCLUSION We have compared three formal encodings for objects and their types. The existential encoding avoided recursion but suffered from an ungainly method invocation syntax. The functional encoding was more direct, but used recursion everywhere. The primitive object encoding avoided recursion for self-invocation but needed it elsewhere. Choosing any of these encoding schemes is largely a matter of personal taste. In later articles, we shall use the functional closure encoding, partly because it has few initial primitives and reflects the syntax of object-oriented languages directly, but also because the notion of generators and fixpoints later proves crucial to understanding the distinct notions of
REFERENCES [1] A J H Simons, [2] A J H Simons, [3] J C Reynolds, [4] W Cook, [5] M Abadi and L Cardelli. [6] A Church, [7] L Cardelli and P Wegner,
Cite this column as follows: Anthony J. H. Simons: "The Theory of Classification, Part 3: Object Encodings and Recursion", in |