Inspector Methods for State Abstraction

Bart Jacobs, Katholieke Universiteit Leuven, Belgium
Frank Piessens, Katholieke Universiteit Leuven, Belgium

REFEREED
ARTICLE

Abstract

Most classes in an object-oriented program provide access to an object's state through methods, so that client code does not depend on and cannot interfere with the object's internal representation composed of felds and internal component objects. In order to extend the benefts of methods to specifcations, method contracts may themselves be expressed using methods, hence providing support for state abstraction in specifcations.

In this paper, we propose an approach to the verifcation of programs that use inspector methods in method contracts and object invariants. Inspector methods must not have side-effects and must not throw exceptions, but they may have parameters, and they may depend on the state of objects passed as arguments. Our approach builds on the Boogie methodology for object invariants and ownership.

Performing state abstraction in a programming language that allows aliasing through object references poses a framing problem. Specifcally, client code needs to be able to tell whether modifying a given object or calling a given method may affect the value of a given inspector method call. We solve this by modeling inspector methods as functions that take as arguments only those parts of the heap on which they depend. Thanks to a novel logical encoding of the heap, we can do this without breaking information hiding, even in cases where inspector methods depend on internal component objects.

The core of our approach has been implemented in a custom build of the Spec# program verifer.

1 INTRODUCTION

Consider the program in Figure 1. Class Cell provides access to the state of a Cell object using method getX. It also uses getX to specify the effect of the class's constructor and of the setX method. This makes it possible to prove the correctness of the client program using a proof that does not depend on the internal representation of the Cell object's state using feld x. As a result, when class Cell's internal representation is changed, only class Cell needs to be reverifed. The client program's proof remains valid.

In this paper, we concern ourselves with how to prove the correctness of programs such as the one in Figure 1.

Note that one of the key problems that needs to be solved by our verifcation approach is the framing problem. Specifcally, in order to prove the assertion in Figure 1 that c2.getX() = 5, it must be encoded in the verifcation logic that c2.getX () does not depend on any objects modfed by method call c1.setX (10).

Figure 1: A class specified using an inspector method, and a client program

The remainder of the paper is structured as follows. We introduce our approach in five versions, each extending the previous one. In Section 2, we address the framing problem in its basic form (Version 1). In Section 3, we introduce the Boogie methodology [1] for object invariants (Version 2) and ownership, and we show how to solve the framing problem for inspector methods that may depend on owned objects (Version 3). In Section 4, we show how we allow inspector methods to depend on the state of objects passed as arguments (Version 4). In Section 5, we deal with
a number of formal details. In Section 6, we describe our approach to inheritance (Version 5). In the final sections, we discuss related work and offer a conclusion.

2 FRAMING

In this section, we show how our approach addresses the framing problem in the simple setting where an inspector method may depend only on the fields of the receiver object. In later sections, we extend the approach to allow inspector methods to depend on fields of transitively owned objects and objects passed as arguments.

Our solution consists of three components: the way the heap is modeled, the way inspector method calls are modeled, and the way non-inspector method calls (which may modify the program state) are modeled.

In our verification logic, we represent the heap as an updatable function Heap that maps object references to object states. For the simple setting we discuss here, an object state is a function that maps field names to field values. Later sections extend the notion of object state to deal with transitively owned objects.

We treat inspector methods as follows. For each inspector method C.m declared in the program, we introduce a function symbol C_m in the verification logic, as well as an axiom, derived from the inspector method's body, that defines the function symbol's meaning. All information on which the inspector method depends is passed as arguments to the corresponding function. Hence, in the simple setting we discuss here, we pass only the receiver's reference and state. For example, the assertion

is modeled as

Notice how the function application takes just the state of c2, not Heap as a whole, as an argument. This is sound thanks to the fact that the inspector method is allowed to depend only on the fields of the receiver. It also takes the receiver's identity c2 as an argument since we do not disallow inspector methods from depending on the receiver's identity.

The final ingredient to our approach is the way non-inspector method calls are modeled in our verification logic. The post-state heap of a non-inspector method call is assumed to be an arbitrary new heap, constrained only by the method's declared postconditions and by an implicit postcondition called the frame condition. The frame condition says that for all objects o that were allocated in the pre-state and that are not listed by the method's modifies clause, the object's post-state is equal to its pre-state. Formally:

where W denotes the set of objects listed by the modifies clause. (We allow only items of the form o.*; in modifies clauses. Also, note that a constructor's modifies clause is always considered to implicitly include this.*.)

We can now prove the program of Figure 1. Let Heap and Heap' denote the heap in the pre-state and the post-state, respectively, of the setX call. We need to prove Cell getX(c2, Heap'[c2]) = 5. From the postcondition of Cell's constructor, we have Cel_getX(c2, Heap[c2]) = 5. Finally, from the frame condition of the setX call, we have Heap'[c2] = Heap[c2], which allows us to arrive at our goal by substitution.

3 OBJECT INVARIANTS AND OWNERSHIP

In this section, we integrate inspector methods with the support for object invariants and ownership provided by the Boogie methodology [1]. The example in Figure 2 motivates and illustrates the approach. (Note: the expression (a : b) denotes the range of integers i where a ≤ i < b.)

An object of class IntList in Figure 2 represents a container for a list of integers. Internally, the integers are stored in an array elems. The array may be larger than the length of the list to minimize the number of heap allocations when adding or removing elements. The actual number of elements is stored in the count field.

Object Invariants

Methods that operate on an IntList object, such as the add method, need to know that count is never negative and never greater than the length of elems. It would be unfortunate to require the method's caller to guarantee this; this would cause the caller to depend on the internals of class IntList. To solve this problem, the Boogie methodology provides a mechanism called object invariants that allows a developer to expose conditions on internal state to clients in an abstracted form. Specifically, it allows a developer to declare an object invariant using the invariant keyword, and in each object, it introduces a special boolean field inv and it restricts modifications of this field and the object's other fields in such a way that inv is true only when the object invariant holds. Consequently, by exposing to clients only the inv field and requiring inv to be true on entry to a method, the method can rely on the internal object invariant without having to expose it. Note that methods are not allowed to assume without proof that object invariants hold on entry to the method; this would be unsound because of possible re-entrancy [1].

An object o for which o.inv is false is called mutable; if o.inv is true, the object is called valid. An assignment to a field o.f is allowed only when o is mutable. Field inv is initially false. It may be read only in method contracts, not in program code. Also, it may be updated only through the special statements pack o; and unpack o;. These statements behave as follows (where Inv(o) denotes o's object invariant):

That is, the pack o; operation checks that o is mutable, and that o's object invariant holds. It then marks the object as valid. The unpack o; operation checks that o is valid. It then marks the object as mutable. Provided that an object o's object invariant depends only on the fields of o, the semantics of pack and unpack together with the restriction that target objects of fi eld assignments must be mutable guarantee that whenever the object is valid (i.e. o.inv is true), its object invariant holds. We call this property the soundness of the object invariant methodology.

Figure 2: A class that illustrates object invariants, ownership, parameterized inspectormethods, inspector method preconditions, and derived invariants. (Note: reference types are non-null types by default.)

Inspector methods and object invariants Non-inspector methods that rely on an object's invariant need to require the object's validity as a precondition. Whereas for non-inspector methods we provide the option of either requiring validity of a given object or not requiring it, for inspector methods we always require validity of the receiver object. That is, each inspector method implicitly gets a precondition saying that the receiver object is valid. We made this choice because supporting inspector methods that do not require the validity of their receivers would complicate the approach, and scenarios where the abstraction provided by inspector methods is required but the abstraction provided by object invariants is not are probably rare. Note that this means that inspector method calls on this typically cannot be used in intermediate assertions (such as assert statements or loop invariants) in a method body, since the receiver is typically unpacked at the start of the method body and packed at the end. However, typically no other objects are unpacked so inspector method calls on other objects can still be used in these places.

Since an object invariant could be evaluated in a state where it does not hold, we do not allow inspector method calls on this in an object invariant. However, in addition to an object invariant, a class may declare a derived invariant, using one or more derived_invariant declarations. If a class declares a derived invariant, this implies a proof obligation that the derived invariant follows from the object invariant. Contrary to an object invariant, a derived invariant may include inspector method calls on this. If a class mentions private fields in its object invariant, it must declare the object invariant itself private, and as a result, other classes cannot use it in proofs. Still, the class can expose information to other classes in the form of public derived invariants, provided that the information is stated in terms of public inspector methods.

We could have allowed postconditions on inspector methods instead of, or in addition to, derived invariants. This would be equivalent in terms of expressiveness. However, since such postconditions would in general have to mention other inspector methods, to express relationships between inspector methods, it seems more natural to centralize this information at the class level.

Derived invariants (or an equivalent mechanism) serve to reduce specification effort. For example, if class IntList did not declare a derived invariant, each method that takes an IntList object list as an argument would have to specify 0 ≤ list.getCount() in its precondition and postcondition.

Ownership

Does the client program provided in Figure 2 verify? Without an ownership system, the answer would be yes, regardless of whether IntList's constructor copies xs into a new array or simply stores a reference to xs into the elems field. Clearly, this is unsound.

The cause of this unsoundness is the fact that the getItem inspector reads the elements of elems, even though, as discussed in Section 2, the function generated for the logical encoding of getItem takes only the state of this, i.e., Heap[this], not the state of the elems array, as a parameter.

In order to allow inspector methods like getItem, which depend on the state of objects other than the receiver object, we apply the Boogie methodology's ownership system, where an object can own other objects. Specifically, whenever an object o is valid, it owns the objects referred to by o's rep fields. For example, in Figure 2, whenever an IntList object o's inv field is true, o owns the array pointed to by o.elems.

We allow inspector methods to depend on their receiver object, as well as any objects directly or indirectly owned by it. However, in the verification logic, we still wish to pass just the state of the receiver object as an argument, as opposed to passing an additional argument for each owned object. The reason is that rep fields are typically private fields, so requiring clients to pass an additional argument for each rep field in inspector method function applications when verifying their code would break information hiding.

To make this work, we change the logical encoding of the heap. Object references still map to object states, but the notion of object state is extended to deal with ownership. An object o's object state is extended to contain a copy of the state of each of o's owned objects. Specifically, for each rep field o.f, we introduce an additional field o.f_state which maps to a copy of the state of the object referred to by o:f whenever o.inv is true. Of course these additional fields exist only in the logical encoding; they do not exist at run time. We do not change the program's run time semantics.

This extended notion of object state allows inspector method results to be defined entirely in terms of the state of the receiver object. For example, the axiom that defines the function symbol for inspector method getItem is as follows:

Clearly, using a copy of an object's state instead of the original is sound only if the copy is up-to-date whenever it is used. This is exactly what the heap consistency theorem below shows. The proof of this theorem relies on another aspect of the Boogie methodology. The methodology does not allow updates to fields of committed objects, i.e. objects owned by other objects.

An object p is committed if and only if there is some valid object o that has a rep field o.f such that o.f = p. However, to simplify the verification logic, we track whether an object o is committed explicitly in the form of a boolean field o.committed. Like the inv field, this field can be read only in method contracts and cannot be assigned to explicitly in code.

To deal with ownership, we extend the meaning of the pack and unpack commands as in Figure 3.

Figure 3: The pack and unpack commands in the ownership system

Heap consistency As explained above, if the body of an inspector method dereferences an owned object this.f , we retrieve its state from a special field this.f_state instead of looking it up in the heap as usual. This is sound because this is rep-consistent:

Definition 1 (rep-Consistency). An object o is rep-consistent if, for each non-null rep field o.f , it holds that

In fact, we have the following theorem:

Theorem 1 (rep-Consistency). In each program state of each execution of each valid program, each valid object is rep-consistent.

Proof. Since the restrictions on field assignments and the evolution of inv and committed bits in our approach are as in the Boogie methodology [1], we may assume the Boogie methodology's known properties. In particular, we know that objects pointed to by rep fields of valid objects are valid and committed, and that committed objects have unique owners.

We prove the theorem by induction over the length of an execution. This holds for the empty execution, since in the initial program state no object is valid. Now consider a non-empty execution. We now look at the final command performed in this execution:

A field assignment p.g := v;. We know that p is mutable and rep-consistency involves only valid objects; therefore, this command does not invalidate the theorem.
A pack p; operation. This operation establishes the rep-consistency of p, and it does not break the rep-consistency of the objects pointed to by the rep fi elds of p since changing o.committed does not in uence the rep-consistency of an object o.
An unpack p; operation. Since p is made mutable, the theorem no longer applies to p. Also, since committed objects have unique owners, changing the committed bits of p's owned objects does not invalidate the rep-consistency for any valid object o ≠ p.
No other commands modify existing objects.

Ownership and frame conditions Recall from Section 2 that each method gets
an implicit postcondition, called the frame condition, that encodes in the verification
logic that some objects are not modified by the method. The frame condition given
in Section 2 was:

In the presence of ownership, a different frame condition is required. Specifically, we wish to allow a method that lists an object o in its modifies clause to modify not just o, but objects directly or indirectly owned by o as well. (We cannot require the method to list these owned objects in the modifies clause because of information hiding.) To achieve this, we follow the Boogie methodology [1] in allowing the method to modify any object that is committed in the pre-state, in addition to the objects listed in the modifies clause:

4 MULTI-DEPENDENT INSPECTOR METHODS

In this section, we show how the approach supports multi-dependent inspector methods, i.e. inspector methods that depend on the state of objects passed as arguments, in addition to the state of the receiver object.

A motivating example is shown in Figure 4. It shows part of the Microsoft .NET Framework's support for restricting the access partially trusted code has to system resources¹. A PermissionSet object holds the permissions assigned to a given piece of code. Permissions are represented using objects that implement interface Permission. For example, a SecurityPermission object may represent permission to run, or permission to skip bytecode verification (or both, or neither).

Interface Permission declares an inspector method isSubsetOf that returns whether one permission of a given type is implied by another permission of the same type. Also, class PermissionSet declares an inspector method contains that returns whether the set contains a given permission.

Figure 4: An example demonstrating multi-dependent inspector methods and quanti fi cation over object states

Importantly, in the .NET Framework Permission objects are mutable. That is, a given Permission object may be made to represent different permissions at different points in time. However, a PermissionSet object contains specific permissions, not Permission objects. Therefore, the contains inspector method must be allowed to depend on the state of the Permission object passed as an argument, not just its identity.

For example, in the piece of client code shown at the bottom of Figure 4, permission to execute is added to a permission set, and the contains method returns true for the Permission object that represents this permission. However, if subsequently permission to skip verification is added to the Permission object, calling contains with the same object returns false. This shows the need for multi-dependent inspector methods.

The following issues arise when multi-dependent inspector methods are admitted: how are calls of such methods translated into the verification logic, how do we ensure consistency of the resulting logic, and how do we specify the abstract state of an object using multi-dependent inspector methods.

An inspector method is allowed to depend on the state of the receiver object, the objects that are passed as arguments for parameters marked state, and their transitively owned objects. A call of an inspector method is translated into an application of the corresponding function symbol with the following arguments:

The first two arguments to the function symbol application model the receiver's identity and state, respectively.
For each argument a to the inspector method call, there is an argument to the function symbol application that models the value a.
In addition, for each argument a to the inspector method call for a parameter marked state, there is an argument to the function symbol application that models the state of the object a.

For example, a call p.isSubsetOf(q) is translated to

An inspector method implicitly gets a precondition that says that each argument a for a state parameter must be valid (i.e., a.inv is true).

Consistency of the verification logic For each inspector method, an axiom is added to the verification logic that defines the corresponding function symbol. We restrict inspector method bodies to be of the form {return E;}, so that we can translate an inspector method whose body is {return E;} into an axiom

where [[E]] denotes the translation of programming language expression E to a term in the verification logic. A potential problem with this approach is that the resulting set of axioms may be inconsistent. Notice that this is the case only if there is an inspector method call that does not terminate. To ensure that inspector methods always terminate, we restrict which method calls may appear inside inspector method bodies. Specifically, if a method call o.m₁(...) appears in the body of an inspector method m₂, then m₁ must be an inspector method and the declaring class of m₂ must transitively import the static type of o. A type may import another type explicitly using an import declaration. Also, when a class C implements an interface I, I implicitly imports C (sic). For example, in Figure 4, PermissionSet imports Permission and Permission imports SecurityPermission. (We discuss the import relation in the presence of subclassing in Section 6.) It is checked at load
time that the import relation is acyclic. This ensures that inspector method calls always terminate.

Discussion As stated above, in order to be able to easily generate axioms for inspector method functions and to ensure that the generated axioms are consistent, we restrict inspector method bodies to be of the form {return E;} and we do not allow recursive inspector methods. These restrictions mean that not all methods can be used as inspector methods, even if they are side-effect-free and read only state on which they are allowed to depend. For example, a method that returns the i'th element of a linked list cannot be used as an inspector method.

One can relax these restrictions by using a more complex approach for checking read and write effects and termination. Also, an abstraction relation needs to be provided separately and verified against the method body.

Quantification over object states We wish to make it possible for a postcondition of a non-inspector method to fully specify the post-state of each object o that is modified by the method. It may do so by specifying the value of each inspector method call on o. If an inspector method has parameters, this requires quantifying over the possible argument values. For example, in the IntList example (Figure 2), both the constructor and the Add method have an ensures clause that quantifies over a range of integers, to serve as the argument to the getItem inspector method.

If an inspector method takes an object reference as an argument, one may quantify over object references, using the following syntax:

Variable o ranges over both unallocated and allocated objects, so that if the inspector method whose value is being defined by the quantification is called with an argument object that is allocated after the quantification is asserted, the quantification applies to that call as well. However, expression E is not allowed to access the state of o and E cannot assume the validity of o; therefore, o cannot be passed as an argument for a state parameter.

To support full specification of the abstract state of an object that has inspector methods that take object states as arguments, we support quantification over object states, using the following syntax:

Variable o again ranges over both unallocated and allocated object references, but when the state of o is inspected, it is not looked up in the heap; rather, the state, too, is considered to be bound by the quantification, and it ranges over all possible valid object states. That is, the above quantification translates into the verification logic as follows:

where valid(o,s) denotes that object o's invariant holds in object state s.

For example, referring to Figure 4,

is encoded as

5 FORMAL DETAILS

Well-formedness conditions Method preconditions and postconditions, object invariants, and derived invariants must be pure expressions, and the body of an inspector method must be of the form {return E;} where E is a pure expression. A pure expression is a Java expression that does not contain object or array creations, simple or compound assignments, or increment or decrement operators, and that calls only inspector methods. This ensures that evaluation of pure expressions has no side-effects and that they can be translated easily into first-order logic. Pure expressions are verified to check that they do not throw exceptions.

Additionally, object invariants and derived invariants may depend only on the fi elds of this and on the fields of objects transitively owned by this. An object invariant cannot assume that this is valid; therefore, it cannot include inspector method calls on this. It can, however, include inspector method calls on objects pointed to by rep fields of this. Also, as stated before, inspector method bodies may depend only on the fields of the receiver object, objects passed as arguments for parameters marked state, or objects transitively owned by these objects.

Soundness of derived invariants The declaration of a derived invariant in a class C generates a proof obligation saying that it holds for all valid objects of class C. For discharging this obligation, one may assume that all derived invariants declared by classes that precede C in the import relation and applicable to the owned objects of o hold for those objects. To show that this proof rule is sound, we prove the following theorem:

Theorem 2 (Derived invariants). In each execution state, for each valid object, each derived invariant applicable to it holds.

Proof. By induction on the length of the execution. The only interesting case is when the last operation is a pack o; operation. The induction hypothesis allows
us to apply the aforementioned proof obligation to prove that all derived invariants
applicable to o hold.

Dynamically bound inspector method calls An inspector method call may be either statically or dynamically bound. For example, calls of inspector methods of final classes are statically bound, and calls of inspector methods of interfaces are dynamically bound. (In the presence of subclassing, methods may generally be called both ways. We discuss subclassing in Section 6.)

In contrast with a statically-bound inspector method, the function for a dynamicallybound inspector method is not defined once and for all by the method's declaration; rather, it is partially defined by each non-abstract inspector method that overrides it. Specifically, for each non-abstract inspector method m in class C, and each dynamically-bound inspector method m in class or interface T that it overrides, an axiom is generated that says that if the target object's type is C, both functions coincide. Formally:

6 SUBCLASSING

In this section, we describe an extension of our approach that allows a superclass to hide its internal representation not only from client code, but from code in subclasses as well.

Our approach is based on the observation that if we want to enable subclasses to use an abstract view of superclass state, then objects are no longer the unit of abstraction. Indeed, we must subdivide an object along the classes that contribute state to it. We call each such subdivision an object frame (or frame for short). That is, a frame (or frame reference) is a tuple (o,C) consisting of an object reference o and the name of a class C of which o is an instance. If a class D extends a class C which extends class Object, then an object whose type is D consists of three frames:
(o,D), (o,C), and (o, Object).

We update our heap representation accordingly. Rather than mapping object references to object states, in our new encoding the heap maps frames to frame

Figure 5: Example illustrating the approach for verification of programs with subclassing. An instance of type Cell may hold an arbitrary nonnegative integer value, which may be retrieved using inspector method getX and set using method setX. Both class Cell and class MyCell implement the Cell type. Class Cell does so by storing the value in a field x, whereas MyCell does so by storing the value plus one in its superclass frame. While contrived, this example shows how our approach supports the clean separation of the two aspects of subclassing: interface re-implementation and implementation inclusion. There need be no relationship between how the superclass and the subclass implement the superclass's interface; in particular, if the subclass chooses to re-use the included superclass implementation, it need not do so by direct delegation. This clean separation promotes modularity, i.e. separate development and evolution of superclass and subclass code. Note: all methods in this example are virtual and subclass methods override the corresponding superclass methods.

states. The frame state for a frame (o,C) consists of the values of the fields of o declared in C.

In previous sections, we achieved abstraction of object state for the sake of client code by introducing per-object inv and committed fields, introducing pack and unpack operations on objects, introducing an ownership relation between objects, caching owned object state in owner objects, and passing object states as arguments in inspector method function applications. Similarly, to achieve abstraction of superclass state for the sake of subclass code, we introduce per-frame inv and committed fi elds, pack and unpack operations on frames, and an ownership relation between
frames, and we cache owned frame state in owner frames and we pass frame states as arguments in inspector method function applications. Note that in the absence of subclassing, we again obtain the approach of the previous sections as a special case.

Frame ownership The definition of the ownership relation between frames involves a few design issues. One is: are there ownership relationships between the frames of a given object? In order to allow the subclass object invariant and subclass inspector methods to depend on superclass state, we consider a valid subclass frame to own its superclass frame. In fact, each class other than Object gets a special rep field called super that refers to the superclass frame. (Note that this field is exceptional in that it contains a frame reference instead of an object reference.) It follows that packing a subclass frame requires that the superclass frame is valid and uncommitted and commits it. Also, if the most derived frame of an object (i.e., the frame corresponding to the object's run-time type) is valid, it transitively owns all of the object's frames and encapsulates the entire object's state.

There is one other design decision involving the ownership relation between frames. Specifically, if a rep field o.f declared in a class C of a valid object o points to an object p, which ownership relationship does this give rise to? In order to allow the object invariant and the inspector methods of class C to perform dynamically-bound inspector method calls on p, which access p's most derived frame, we consider frame (o,C) to own p's most derived frame. Due to the previous design decision, it follows that (o,C) transitively owns all of p's frames.

Static and dynamic binding As pointed out earlier, in general inspector method calls may be statically bound or dynamically bound. Calls of abstract methods are always dynamically bound and calls of private methods are always statically bound, but for a non-private non-abstract inspector method of a non-final class, we have a choice of statically binding some calls of such methods and dynamically binding others. (How we make this choice is discussed below.) Since at run time, dynamically bound calls of a given method may bind to different methods and yield different results from statically bound calls of the same method, we encode them differently in the verification logic. Specifically, for calls resolved at compile time to a method m of a class or interface T, we encode statically bound ones as T_m(...) and dynamically bound ones as T_m_D(...). Function symbol T_m is defined fully by the body of method m in T; function symbol T_m_D is defined partially by each non-abstract method that overrides it, as explained earlier.

In the encoding of a statically bound call to an inspector method m of class C on an object o, the frame state passed as the state of the receiver is always the state of frame (o,C), whereas in the encoding of a dynamically bound call, the frame state passed is always the state of the most derived frame. (Also, an argument for a parameter marked state in a call of a multi-dependent inspector method is always interpreted as the most derived frame.)

The object invariant and the derived invariants declared in a class C apply to frames (o,C). Since they must depend only on the state of the frame to which they apply (plus transitively owned frames), they must not include dynamically bound inspector method calls on this. Therefore, we always interpret inspector method calls on this in object invariants and derived invariants as statically bound calls. Inspector method calls may occur in preconditions of inspector methods and in preconditions and postconditions of non-inspector methods. (Recall that inspector methods have no postconditions.) It is crucial for the soundness of verification that there be no confusion as to whether these calls are statically or dynamically bound. Our approach is sound regardless of which choice is made, so long as the choice is the same when verifying the caller and when verifying the callee. Note in this regard that it is sound, and often appropriate, to make different choices for statically and dynamically bound calls of the method in whose contract the inspector method call appears. This is sound so long as it is verified that the contract for dynamically bound calls is implied by the contract for statically bound calls under the assumption that the run-time type of the receiver equals its static type. The method body need then only be verified against the contract for statically bound calls.

The question remains as to how the choice of static or dynamic binding of inspector method calls in method contracts is made. Java specifies that a call in program code is dynamically bound unless the call is a super call or the method being called is private. However, this approach is not always appropriate for inspector method calls in method contracts. For example, consider the getX() call in the contract of method setX of class Cell in Figure 5. Interpreting the call as dynamically bound would be approriate for dynamically bound calls of setX , but not, for example, for the super.setX call that occurs in class MyCell . Indeed, super.setX (5) ensures super.getX () = 5, not getX () = 5.

Therefore, we interpret calls in contracts differently from calls in program code. Also, the interpretation is different depending on whether the contract is used for a statically bound call or a dynamically bound call. The rule is as follows: all calls are bound dynamically, except for super calls, calls of private methods, or calls on this in contracts for statically bound calls. Calls whose target expression is not this (explicitly or implicitly) or calls on this in contracts for dynamically bound calls are bound dynamically. To paraphrase, binding of inspector methods is as per Java,

except that a call on this in a contract for a statically bound call is bound statically. For example, for the purpose of verifying the super.getX call in Figure 5, the getX call in the contract of Cell.setX is interpreted as a statically bound call. Note that this rule ensures that the contract for dynamically bound calls is equivalent to the contract for statically bound calls under the assumption that the receiver's run-time type is equal to its static type.

The import relation In the presence of subclassing, the import relation which we use to ensure termination of inspector method calls, and consistency of their axiomatization, must be refined. In particular, for each class, we introduce two nodes in the import graph, which we shall call the static node and the dynamic node. An interface only has a dynamic node. The rules are as follows:

The dynamic node of C imports the static node of C.
If a class C declares an import T; declaration, this means the static node of C imports the dynamic node of T.
If a class D extends a class C, this means the static node of D imports the static node of C, and the dynamic node of C imports the static node of D.
If a class C implements an interface I, this means the dynamic node of I imports the static node of C.
If a class C declares an inspector method whose body includes a dynamically bound inspector method call resolved at compile time to a method of a type T, then the static node of C must transitively import the dynamic node of T.
If a class C declares an inspector method whose body includes a statically bound inspector method call declared by a class D, then the static node of C must import the static node of D.

This ensures that there can be no inspector method call cycles within an inheritance hierarchy.

Dynamic invariants As stated above, a class C may declare a derived invariant, using the derived_invariant keyword. This invariant applies to frames (o_C) only, and inspector method calls in derived invariants are always interpreted as statically bound calls. This means that derived invariants declared by a class C are useful for a subclass of C when performing super calls, or if C is a final class, but not for client code that accesses an instance of C through dynamically bound calls.

To convey properties of and relationships between dynamically bound inspector methods, a class or interface may declare one or more dynamic invariants, using the dynamic_invariant keyword. A dynamic invariant declared by a type T isenforced against all non-abstract classes that implement or extend T. It follows that a dynamic invariant declared by a type T holds for all instances of T.

Note: dynamic invariants do not subsume derived invariants; for example, an abstract class that implements some of its inspector methods may declare a derived invariant to specify a relationship between the inspector methods it implements.

Method inheritance As in the Boogie methodology [1], we do not allow method inheritance as such. That is, each class must override all visible methods of its superclass. This rule is crucial for the soundness of our approach since our approach is based on the assumption that if a dynamically bound call binds to a method m declared by a class C, then the static type C of m's receiver is equal to its run-time type. This is true only if m was not inherited from C by the receiver's run-time type.

However, to reduce programming overhead, we follow the Boogie methodology in generating a default override for each method that is not overridden explicitly. The body of the default override for a method m not explicitly overridden by a class C is of the form

(or similar if m has parameters or a return value). Note that default overrides are subject to verification just like explicitly declared methods. If a default override fails to verify, an explicit override must be provided.

7 RELATED WORK

The Boogie methodology Our approach is based for object invariants, ownership, and method framing on the Boogie methodology [1]. In order to add support for inspector methods, we applied the following modifications:

In the Boogie methodology, the heap is encoded in the verification logic as a function that maps an object reference and field name combination to a field value. We encode the heap as a function from frame references to frame states. (Note: the notion of object frames was introduced in [1], under the name class frames, but in [1] it was not used as the basis for the encoding of the heap.)
We extend the semantics of the pack command to store a copy of each owned frame's state in a special field of the owner frame. This allows inspector method calls to be encoded as function applications that take frame states.
In the Boogie methodology, method frame conditions are expressed as equalities between field values. We express method frame conditions as equalities between object frame states. Together with the previous modification, this enables a theorem prover to carry information regarding inspector method call return values across method calls.
Instead of introducing a single committed bit and a single inv field per object, and storing in the inv field the name of the most derived type whose object frame is valid, we introduce separate inv and committed bits in each object frame.

Method calls in specifications Darvas and Müller [4] identify and propose solutions for problems that arise when method calls are used in specifications. Specifically, the authors show how to deal with abrupt termination, object creation, and inconsistent axiomatization due to unsatisfiable postconditions. Methods called in specifications must be pure, which means they do not modify existing objects. Inspector methods are like pure methods, with the additional constraint that they depend only on the state of the receiver object and objects passed as arguments for parameters marked state (and their transitively owned objects), so that they can be used for state abstraction. A restriction of our approach is that we do not allow inspector methods to declare a postcondition and that we derive the axiom that defines the corresponding function from the inspector method's body rather than its postcondition. We avoid the abrupt termination issue by verifying that the inspector method body does not throw any exceptions.

Darvas and Müller's solution to the object creation issue is to pass the heap as an argument to the function and have the function return a new heap together with its result value. We did not adopt this solution because it is incompatible with our approach to framing; specifically, our approach requires that only the state of the dependee objects is passed to the inspector method's function, rather than the entire heap. Therefore, we disallow object creation in inspector methods.

We avoid the problem of inconsistent axiomatization by imposing a partial order on classes and by allowing nested inspector method calls to proceed along this partial order only. This ensures that inspector method calls always terminate and have a return value under Java semantics, and the return values thus obtained always satisfy the system of equations that defines the inspector method functions of a given program.

Cok [3] supports pure methods that throw exceptions, which we do not. However, we believe it is preferable to simply rule out exceptions in inspector methods.

State abstraction in ownership systems Müller [9] combines a notion of model fi elds with an ownership type system called Universes. Model fields are comparable with parameterless inspector methods. The Universes ownership system is less exible than that of the Boogie methodology; for example, it does not allow ownership transfer. On the other hand, Müller allows model fields of an object o to depend on fi elds of peer objects of o, i.e. objects that have the same owner as o, provided that the model field definitions are visible to the field declarations, i.e. both are in the same module. We intend to add support for visibility-based inspector methods to our approach as future work.

Leino andMüller [8] achieve state abstraction by combining the Boogie ownership system with another notion of model fields, similar to but distinct from that of Müller [9]. Model fields in [8] are encoded in the verification logic as if they are stored in the heap along with concrete fields. Each model field declaration specifies a model fi eld constraint that serves as an abstraction relation. The constraint for a model fi eld o.m needs to hold only when o is valid. As part of executing a pack statement on an object o, the constraints of the model fields of o are checked and, if they do
not hold, a new value that satisfies the constraint is assigned to the model field. If no such value exists, the pack statement is considered invalid.

A constraint may underspecify a model field, and subclasses may strengthen an inherited model field's constraint. As a result, packing an object for a subclass may assign a new value to a model field declared in a direct or indirect superclass. An underspecified model field is similar to an abstract inspector method with a dynamic invariant, and a strengthening of an inherited model field's constraint by a subclass is similar to an inspector method that overrides an abstract inspector method. However, \overriding" a fully specified model field with another differently fully specified model field, similar to overriding a non-abstract inspector method, is not supported in [8]. This means that a subclass is forced to adopt fully specified public superclass model fields as part of its own abstract state, without the ability to provide a different abstraction function. Alternatively, if a class leaves a model fi eld underspecified and does not tie it to its own concrete state, so that subclasses have maximum freedom in providing an abstraction function, then the class cannot use the model field to abstractly expose its own state. Therefore, it also is not able to fully implement methods specified using the model field. As a result, in practice, classes with underspecified model fields are effectively abstract. In other words, [8] does not fully support specification of classes that may be used both for direct instantiation and as superclasses, while leaving subclasses free to provide their own abstraction functions for superclass model fields. (For example, see class Cell in Figure 5.)

To support the kind of specifications enabled in our approach by parameterized inspector methods, such as the getItem method in the IntList example of Figure 2, [8] would have to use model fields containing special-purpose immutable objects such as immutable list objects (known as model types in JML [7]).

Ownership-free approaches Kassios [6] also uses abstraction functions, but instead of an ownership system he proposes dynamic frames to abstractly specify an abstraction function's dependencies and a mutator method's effects. Dynamic frames are themselves abstraction functions that return sets of locations. A module specification may specify a frame (i.e., an upper bound on the set of locations that an abstraction function depends on) for each abstraction function separately.

The dynamic frames approach subsumes our approach on an abstract level. However, it is formulated in the context of an idealized logical framework; for example, it does not show how to apply the approach to Java-like inheritance. Also, the proposed approach has not been applied in the context of an automatic program verifier.

Parkinson and Bierman [11] and Parkinson [10] extend separation logic with abstract predicates and apply it to Java to achieve state abstraction for Java programs. Abstract predicates are similar to inspector methods that return a boolean value. However, as in the case of the aforementioned dynamic frames approach, Parkinson and Bierman do not restrict the set of locations that an abstract predicate may depend on. Rather, it is up to client code to track the separation between abstract predicates. Parkinson and Bierman solve the problem of well-definedness of abstract
predicates by allowing abstract predicates to appear inside other abstract predicates only in positive positions, and by taking the least fixpoint of a set of abstract predicate definitions as their meaning. Subclassing is addressed by introducing abstract predicate families; that is, an abstract predicate name may be subscripted by a class name, and a separate definition may be given for each subscript. For example, the abstract predicate saying that a Cell instance o holds the value v could be written cell_type(o)(o,v).

The use of abstract predicates in method contracts typically requires the use of universally quantified logical variables whose scope extends across both the preand post-state. For example, the contract for a method that that increments the value of a cell would say that for each value v, if cell (o,v) holds in the pre-state, then cell(o,v + 1) holds in the post-state. This could be a disadvantage compared to model fields or inspector methods, in particular for run-time checking.

Redundant invariants Our derived invariants and dynamic invariants are similar to JML's [7] redundant invariants, in that an object's redundant invariants must be implied by its invariants. However, the interaction between JML's different kinds of invariants and JML model fields is not clear. Specifically, on which invariants is a JML model field's represents clause, which defines its abstraction relation, allowed to depend to prove absence of evaluation errors such as null dereferences or divisions by zero? And which invariants are allowed to mention model fields of this?

8 FUTURE WORK

We are investigating relaxations of the requirement that nested inspector method calls follow a partial order. One relaxed rule would be that for each nested call, the callee must depend on fewer objects (or object frames) than the caller. Or alternatively, that the size of the argument list, defined as the total number of object states, including duplicates and internal owned object states, must decrease.

Further relaxations would include allowing direct recursion controlled by a measure function, and allowing statements, including loops, in inspector method bodies.

Another area of extension is to allow inspector methods to depend on non-owned peer objects of classes declared in the same module.

9 CONCLUSION

We proposed an approach to the verification of object-oriented programs that use inspector methods for state abstraction in specifications.

We solve the problem of encoding in the verification logic whether a given method call or field assignment affects a given inspector method call's return value, by

modeling the heap as a function that maps object references to object states,
logically (but not physically) storing a copy of the state of owned objects in special fields of the owner object, and
encoding inspector method calls as function applications whose arguments include the object states on which the inspector method depends.

We support multi-dependent inspector methods, i.e. inspector methods that depend on the state of objects passed as arguments. To enable the specification of the return values of all possible calls of a multi-dependent inspector method, we support quantification over object states.

Our approach to subclassing is to separate its two aspects: superclass interface re-implementation and superclass implementation inclusion. Subclasses may override superclass inspector methods; our approach preserves soundness by binding inspector method calls statically or dynamically as appropriate. A distinction is made between derived invariants, which apply only to the declaring class, and dynamic invariants, which apply to subclasses as well.

We implemented the core of our approach in a custom build of the Spec# program verifier [2]. This will allow us to gain experience and validate the approach on larger programs.

Currently, we restrict inspector method bodies to be of the form {return E; } and we forbid recursive inspector methods. We hope to relax these restrictions in future work.

A soundness proof of the approach is available [5].

Acknowledgements

The authors thank Rustan Leino, Peter Müller, and Jan Smans, and the participants of the EighthWorkshop on Formal Techniques for Java-like Programs (FTfJP 2006), where this work was originally presented, as well as the anonymous referees of this paper, for helpful comments and discussions. Bart Jacobs is a Research Assistant of the Fund for Scientific Research - Flanders (Belgium) (F.W.O.-Vlaanderen).

Footnote

¹ We changed names to conform to Java naming conventions.

REFERENCES

[1] Mike Barnett, Robert DeLine, Manuel Fähndrich, K. Rustan M. Leino, and Wolfram Schulte. Verification of object-oriented programs with invariants. Journal of Object Technology, 3(6):27{56, June 2004. Special issue: ECOOP 2003 workshop on FTfJP.

[2] Mike Barnett, K. Rustan M. Leino, and Wolfram Schulte. The Spec# programming system: An overview. In CASSIS 2004, volume 3362 of LNCS. Springer, 2004.

[3] David Cok. Reasoning with specifications containing method calls and model fields. Journal of Object Technology, Vol 4, No.8, pp 77-103, October 2005.

[4] Ádám Darvas and Peter Müller. Reasoning about method calls in JML specifications. In Francesco Logozzo, editor, Proceedings of the Seventh Workshop on Formal Techniques for Java-like Programs (FTfJP 2005), 2005.

[5] Bart Jacobs and Frank Piessens. Inspector methods for state abstraction: soundness proof. Available at http://www.cs.kuleuven.be/~bartj/.

[6] Ioannis T. Kassios. Dynamic frames: Support for framing, dependencies and sharing without restrictions. Technical Report 528, Dept. of Computer Science, University of Toronto, July 2005.

[7] Gary T. Leavens, Albert L. Baker, and Clyde Ruby. Preliminary design of JML: A behavioral interface specification language for Java. Technical Report 98-06-rev28, Department of Computer Science, Iowa State University, July 2005.

[8] K. Rustan M. Leino and Peter Müller. A verification methodology for model fields. In Proc. ESOP, volume 3924 of LNCS. Springer-Verlag, 2006.

[9] Peter Müller. Modular Specification and Verification of Object-Oriented Programs, volume 2262 of LNCS. Springer-Verlag, 2002.

[10] Matthew J. Parkinson. Local reasoning for Java. PhD thesis, Computer Laboratory, Cambridge University, 2005.

[11] Matthew J. Parkinson and Gavin M. Bierman. Separation logic and abstraction. In Proc. POPL, 2005.

About the authors

Bart Jacobs is a PhD student at the Department of Computer Science. He can be reached at bartj@cs.kuleuven.be. See also http://www.cs.kuleuven.be/~bartj/.

Frank Piessens is a Professor at the same Department. He can be reached at frank@cs.kuleuven.be. See also http://www.cs.kuleuven.be/~frank/.

Cite this article as follows: Bart Jacobs and Frank Piessens: "Inspector Methods for State Abstraction", in Journal of Object Technology, vol. 6, no. 5, Special Issue ECOOP 2006, June 2007, pp. 55-75 http://www.jot.fm/issues/issue_2007_06/article1