1 INTRODUCTION This is the second article in a regular series on object-oriented type theory, aimed specifically at non-theoreticians. Eventually, we aim to explain the behaviour of languages such as Smalltalk, C++, Eiffel and Java in a consistent framework, modelling features such as classes, inheritance, polymorphism, message passing, method combination and templates or generic parameters. Along the way, we shall look at some important theoretical approaches, such as subtyping, F-bounds, matching and the object calculus. Our goal is to find a Figure 1: Dimensions of Type Checking The first article [1] introduced the notion of type, ranging from the programmer's concrete perspective to the mathematician's abstract perspective, pointing out the benefits of abstraction and precision. From these, let us choose three levels of type-checking to consider: representation-checking (bit-schemas), interface-checking (signatures) and behaviour-checking (algebras). Component compatibility was judged according to whether exact type correspondence, simple subtyping or the more ambitious subclassing was expected. Combining these two dimensions, there are up to nine combinations to consider, as illustrated in figure 1. However, we shall be mostly interested in the darker shaded areas. In this second article, we shall build a typechecker that can determine the exact syntactic type (box 2, in figure 1) of expressions involving objects encoded as simple records. 2 THE UNIVERSE OF VALUES AND TYPES Rather like scratch-building a model sailing ship out of matchsticks, all mathematical model-building approaches start from first principles. To help get off the ground, most make some basic assumptions about the universe of values. Primitive sets, such as
3 RULES FOR PAIRS An immediately useful construction which we do not yet have is the notion of a pair of values, , possibly taken from different types N and M. The type of pairs is known as a This rule is expressed in the usual style of For pair constructions to be useful, we need to know how to access the elements of a pair, and determine their types. We define the "If e is a pair of the product type N x M, then the first projection has the type N, and the second projection has the type M." Note that, in this rule, e is presented as a single expression-variable, but we know it stands for a pair from its type N x M. In both rules, the horizontal line has the force of an implication, which we could also write using 4 RULES FOR FUNCTIONS Consider an infinite set of pairs: . This set is an enumeration of a relationship between Natural numbers and Boolean values - it is in fact one possible representation of the function "If variable x has the type D and, as a consequence, expression e has the type C, we may conclude that a function of x with body e has the function type D C." This rule introduces the -syntax for functions and the arrow-syntax for function types. If you happen to be a hellenophobic The function elimination rule explains the type of an expression involving a function application. In so doing, it also defines the parenthesis syntax for function application: "If f is a function from D C, and v is a value of type D, then the result of the function application f(v) is of type C". This rule also expresses the notion of type-sound function application: it allows f to be applied Do we need rules for multi-argument functions? Not really, because we already have the separate product rules. The domain D in the function rules could correspond, if we so wished, to a type that was a product: D N x M. In this case, the argument value would in fact be a pair v : N x M. We assume that any single type variable in one rule can be matched against a constructed type in another rule, if we so desire. 5 RULES FOR RECORDS Most model encodings for objects [4, 5, 6] treat them as some kind of record with a number of labelled fields, each storing a differently-typed value. So far, we do not have a construction for records in our model. However, consider that a record is rather like a finite set of pairs, relating "If there are n distinct labels , and n values e The corresponding "If e has the type of a record, with n labels , mapping to types T 6 APPLYING THE RULES We have a set of rules for constructing pairs, functions and records. With this, we can model simple objects as records. Ignoring the issue of encapsulation for the moment, a simple Cartesian point object may be modelled as a record whose field labels map to simple values (attributes) and to functions (methods). We require a function for constructing points:
This is a type declaration, stating that names the argument expression The rules permit us to deduce that objects can be constructed using
Is p1.x meaningful, and does it have a type? The record elimination rule says this is so, provided p1 is an instance of a suitable record type. Working backwards, p1 must be a record with at least the type: {... x : X ... } for some type X. Working forwards, p1 was constructed using
Is p1.equal(p2) meaningful, and does it have a type? Again, by working backwards through the record elimination rule, we infer that p1 must have at least the type {... equal : Y ...} for some type Y. Working forwards, we see that p1 has a field equal : Point Boolean, so by matching up these, we know Y Point Boolean. So, the result of selecting p1.equal is a function expecting another Point. Let us refer to this function as f. In the rest of the expression, f is applied to p2, but is this type correct? Working forwards, p2 was defined using 7 CONCLUSION We constructed a mathematical model for simple objects from first principles, in order to show how it is possible to motivate the existence of something as relatively sophisticated as an object with a (constant) state and methods, using only the most primitive elements of set theory and Boolean logic as a starting point. The type rules presented were of two kinds: introduction rules describe how more complex constructions, such as functions and records, are formed and under what conditions they are well-typed; elimination rules describe how these constructions may be decomposed into their simpler elements, and what types these parts have. Both kinds of rule were used in a typechecker, which was able to determine the syntactic correctness of method invocations. The formal style of reasoning, chaining both forwards and backwards through the ruleset, was illustrated. The simple model still has a number of drawbacks: there is no updating or encapsulation of state; there are problems looming to do with recursive definitions; and we ignored the context (scope) in which the definitions take effect. In the next article, we shall examine some different object encodings that address some of these issues. Footnote^{1}Hellenophobe: a hater of Greek symbolsREFERENCES [1] A J H Simons, [2] A J H Simons, [3] L Cardelli and P Wegner, [4] J C Reynolds, [5] W Cook, [6] M Abadi and L Cardelli.
Cite this column as follows: Anthony J. H. Simons: "The Theory of Classification, Part 2: The Scratch-Built Typechecker", in |