.NET TECHNOLOGIES 2004 WORKSHOP, PILSEN

Experience Integrating a New Compiler and a New Garbage Collector Into Rotor

Todd Anderson, Marsha Eng, Neal Glew, Brian Lewis, Vijay Menon, and James Stichnoth , Microprocessor Technology Lab, Intel Corporation

PDF Version

Abstract

Microsoft’s Rotor is a shared-source CLI implementation intended for use as a research platform. It is particularly attractive for research because of its complete implementation and extensive libraries, and because its modular design allows different implementations of certain components such as just-in-time compilers (JITs). Our group has independently developed our own high-performance JIT and garbage collector (GC) and wanted to take advantage of Rotor to experiment with these components in a CLI environment. In this paper, we describe our experience integrating these components into Rotor and evaluate the flexibility of Rotor’s design toward this goal.

We found it easier to integrate our JIT than our GC because Rotor has a well-defined interface for the former but not the latter. However, our JIT integration still required significant changes to both Rotor and our JIT. For example, we modified Rotor to support multiple JITs. We also added support for a second JIT manager in Rotor, and implemented a new code manager compatible with our JIT. We had to change our JIT compiler to support Rotor’s calling conventions, helper functions, and exception model. Our GC integration was complicated by the many places in Rotor where components make assumptions about how its garbage collector is implemented, as well as Rotor’s lack of a well-defined GC interface. We also had to reconcile the different assumptions made by Rotor and our garbage collector about the layout of objects, virtual-method tables, and thread structures.

1 INTRODUCTION

Rotor, Microsoft’s Shared Source Common Language Infrastructure [7, 9], is an implementation of CLI (the Common Language Infrastructure [6]) and C# [5]. It includes a CLI execution engine, a C# compiler, various tools, and a set of libraries suitable for research purposes (it omits a few security and other commercially important libraries). As such, it provides a basis for doing research in CLI implementation, and Microsoft encourages such use of Rotor. For a number of years, our group has been researching and implementing managed runtime environments for Java and CLI on Intel platforms. Recently, as part of this effort, we developed a high-performance just-in-time compiler (JIT), called StarJIT [1], that can compile both Java and CLI applications, and a high-performance garbage collector (GC), called GcV4. Because Rotor provides a complete platform for CLI experimentation, we set out to integrate StarJIT and GcV4 with Rotor on the IA-32 architecture and to see how well our technologies work in the Rotor environment. The integrated Rotor/StarJIT/GcV4 uses StarJIT instead of FJIT (Rotor’s own JIT), GcV4 instead of Rotor’s own garbage collector, and Rotor as the core VM. Ideally the latter would be unchanged. This paper describes our experience and presents our observations on the suitability of Rotor as a research platform.

Figure 1: ORP’s use of interfaces for modularity

StarJIT and GcV4 were originally developed for use with our virtual machine, ORP (the Open Runtime Platform [3]). ORP was first designed for Java and later adapted to support CLI as well. One of ORP’s key characteristics is its modularity: ORP interacts with JITs and GCs almost exclusively through well-defined function-call interfaces. Each component uses these interfaces to request action from other components. For example, ORP calls an interface function to request a JIT to compile a method or to request the GC to allocate heap storage. Similarly, a JIT can request ORP to look up method names, and the GC can have ORP enumerate the on-stack and in-register object references during a garbage collection. As shown in Figure 1, ORP’s use of interfaces cleanly separates the core VM from particular JITs or GCs, and enables component substitutability. The only exceptions to ORP’s strict use of interfaces are ORP’s assumptions about the layout of a small number of performance-critical data structures including object headers, vtables (virtual-method tables), and some GC information stored in vtables. Cierniak et al.describe in detail how ORP uses interfaces to support flexibility while preserving high application performance [3].

We hoped the use of these interfaces by StarJIT and GcV4 would simplify their integration into Rotor. Rotor also has a well-defined JIT interface, but it lacks a well-defined interface for garbage collectors. Some of Rotor’s interfaces are defined directly in terms of internal data structures and other details of the VM, but others are more abstract, using opaque handles and separating the VM cleanly from other components. Using these abstract interfaces, JITs such as StarJIT can be built independently of Rotor itself, and loaded as DLLs (dynamically-linked libraries) at runtime.

Our ultimate goal is to see how well StarJIT’s and GcV4’s Java optimizations apply to CLI and what further optimizations for CLI can be developed. Star- JIT includes advanced optimizations such as guarded devirtualization; synchronization optimization; Class Hierarchy Analysis (CHA [4]); elimination of runtime null pointer, array index, and array-store checks; and dynamic profile-guided optimization (DPGO). GcV4 performs parallel sliding compaction to maximize application throughput. StarJIT and GcV4 can collaborate to insert prefetching based on dynamic profiles of cache misses [2]. All these optimizations are important to managed languages like Java and C#.

Overall, JIT integration was more straightforward than GC integration because the Rotor JIT interface is well defined. In contrast, integrating the GC required many intricate changes that were interspersed throughout the Rotor source code. In both cases, however, we found our work complicated by missing functionality. We start by describing our integration effort at the conceptual module level, then delve into the details of which methods and data structures were modified to make integration possible at for the JIT during both compile and run time, and for the GC.

In the descriptions in the remainder of this paper, we explicitly provide the names of Rotor data structures and source files, to provide specific landmarks for others who would like to make similar modifications and experiments on the Rotor code base.

2 INTEGRATION AT THE MODULE LEVEL

A key goal of our JIT and GC integration efforts was to minimize changes to Rotor’s code base, especially the core VM. We realized some modifications to Star-JIT and GcV4 would be necessary as a result, so our secondary goal was to avoid making extensive changes to these modules.

Figure 2 shows the general structure of Rotor. The core VM components that, for example, load assemblies and types, return information about methods and fields, and handle exceptions are highlighted in the middle of the figure. Rotor’s GC is shown on the left-hand side, as is the garbage collected heap. The three components that together implement Rotor’s FJIT (“fast JIT”) and manage its compiled code are highlighted on the right-hand side of the figure.

Figure 2: The structure of Rotor including the three components of FJIT

JIT-Related Modifications

Rotor divides the compilation and management of compiled code into three components: JITs, JIT managers, and code managers. A JIT compiles CLI bytecodes into native code. A JIT manager allocates and manages space for a JIT’s compiled code, data, and exception-handler and garbage-collection information. A code manager is responsible for stack operations involving the frames of compiled code that it manages. Each of these components implements the associated Rotor interface, which is a C++ abstract base class. The Rotor JIT design is general, and there is no reason why it cannot support multiple JITs, multiple JIT managers, multiple code managers, JITs that share JIT and code managers, et cetera. Currently, Rotor has one JIT, two JIT managers, and one code manager.

To implement a JIT, JIT manager, or code manager, one writes a C++ class that implements the appropriate interface (abstract base class). Rotor also defines another interface layer on top of the JIT interface. This layer allows JITs to be implemented in DLLs and hides the details of Rotor’s types for classes, methods, fields, et cetera, with the use of handles such as CORINFO_CLASS_HANDLE, CORINFO_METHOD_HANDLE, and CORINFO_FIELD_HANDLE. Throughout the paper, we use the term JIT interface to refer to this layer. In contrast, Rotor’s JIT manager and code manager interfaces have no such layer above them and use Rotor’s internal data structures directly, making them difficult to place in DLLs.

We found that most of the StarJIT integration effort centered around the JIT interface, which is defined in corjit.h and corinfo.h. These files define a number of interface classes, all of whose names begin with the letter I (e.g., ICorClassInfo). The JIT must implement the interface class in corjit.h and can communicate with the VM using the interface classes in corinfo.h.

To date, we have succeeded in using Rotor’s existing interface functions unmodified for method compilation. However, to do this we had to disable some StarJIT optimizations that will require extensions to this interface. For example, CHA requires the JIT to examine the currently loaded class hierarchy to detect whether a particular method in a class has been overridden by a subclass. While Rotor’s JIT interface allows exploration up the class hierarchy, it currently does not allow exploration down the class hierarchy, precluding StarJIT’s CHA.

Over the years, we have implemented at least four separate JITs for ORP. One of our most valuable JIT debugging tools has been support for multiple JITs. This approach allows several different JITs to be present in the system at the same time and allows different methods to be compiled by different JITs. Although our multiple JIT support allows more than two JITs to coexist, for our integration work we use just two JITs: StarJIT and Rotor’s built-in FJIT. We define an ordering for the JITs: the experimental JIT (StarJIT) comes first, the next JITs are more stable (not applicable for this project), and the most reliable JIT (FJIT) comes last. Rotor first calls StarJIT to compile each new method. If this is unsuccessful (either because of missing StarJIT functionality, or because StarJIT generates incorrect code, or because StarJIT otherwise refuses to compile the method), it returns a special error code and Rotor calls FJIT. If there were more than two JITs in the system, the VM would call each JIT in order until one successfully compiled the method. This support for multiple JITs allows us to continue debugging the experimental JIT without being stuck if it fails to compile some method because of a hard-to-fix bug. We can continue to identify and fix other problems with the experimental JIT while waiting for a fix to the first problem.

StarJIT includes a method table mechanism for providing fine-grain control over which methods it chooses to compile and which it rejects. This mechanism is controlled via a property file loaded by StarJIT at runtime. Typically, we first run StarJIT configured to generate a method table file. This file is simply a dump of all the methods names, one per line of the file. Then we can configure StarJIT to compile specific methods only by specifying the lines in the file that name the methods we are interested in. For example, the configuration string“METHODS=methods.txt:10-20,30,40-50” instructs StarJIT to compile only the methods listed in lines 10–20, line 30, and lines 40–50 of the methods.txt file, and to reject all other methods. We use binary search to isolate bugs. If we know that StarJIT has a bug when run with certain lines of the file, we can run with half of these lines specified. If the bug manifests then we repeat with these lines, otherwise we repeat with the other half. The configuration string also allows specifying methods by name, in addition to referencing methods listed in the file. This technique of debugging a new JIT with the use of method tables and a robust backup JIT proved to be invaluable to our integration effort.

GC-Related Modifications

Unlike for JITs, there is no clean interface in Rotor for a garbage collector to communicate with the rest of the system. The Rotor GC is responsible for both object allocation and garbage collection, and also interacts with the threading subsystem. As such, it has many touch points with the VM and more extensive modifications of Rotor were required for integrating GcV4.

Garbage-collection problems can be notoriously difficult to debug, since a problem introduced during a collection may not manifest itself until much later, and the method where the problem manifests itself may have little to do with the method in which the problem actually arose. For debugging such problems, we found it useful to use built-in Rotor functionality for forcing collections at more regular intervals. Rotor has a GCStress parameter that can be given various settings. One especially useful setting forces a collection every time an object is allocated. This setting often causes garbage collection problems to show up soon after they occur, when the information needed to debug them is still available.

3 JIT COMPILE-TIME INTERFACE

As previously mentioned, a major part of the StarJIT integration was adapting StarJIT to Rotor’s JIT interface. This adaptation included implementing the function to compile a method, and modifying StarJIT to use the set of functions that Rotor provides for querying classes, fields, methods, et cetera.

Although the StarJIT integration is still under development, we have successfully compiled and run enough programs that we believe the integration is nearly complete. Despite some initial difficulty understanding the semantics of a few of Rotor’s JIT interface functions, our experience has been predominantly positive. This section discusses our modifications and the problems we found.

Supporting the JIT Compile-Time Interface

StarJIT already includes an internal interface, VMInterface, that it uses to isolate itself from any particular VM. The ORP version of StarJIT, for example, is built with an ORP-specific implementation of this interface. The main part of our effort was spent implementing a Rotor-specific implementation of VMInterface.

VMInterface includes about 160 methods. The majority of these methods resolve classes and get information about methods, fields, and other items during compilation. One VMInterface method returns the address of the different runtime helpers, and is described in detail in the next section. Various StarJIT optimizations are supported by other VMInterface methods. One such support method returns a method’s execution frequency and is used in profile-based recompilation.

Most of the VMInterface implementation for Rotor was straightforward. However, the VMInterface implementation is not yet complete—we have not implemented specialized support for certain optimizations that are currently disabled.

To support StarJIT’s requirements, we found two cases where it was necessary to define new data structures in the VMInterface implementation to augment the corresponding Rotor information. In the first case, while Rotor provides a way to break types down, it does not provide a way to build them up. At start up time, StarJIT builds data structures that represent the primitive types, but Rotor provides no way for another component to obtain handles (CORINFO_CLASS_HANDLE) for these types. This means that StarJIT’s type data structures cannot simply be Rotor handles, and instead we designed a RotorTypeInfo data structure that includes enough information for StarJIT’s needs. We can build one of these structures without a Rotor class handle. A RotorTypeInfo cannot answer all queries on types, but in these cases StarJIT would have been given a Rotor handle and Rotor can be queried instead.

In the second case, StarJIT needs the type of the this argument for many methods. In Rotor, this type cannot be obtained using the signature information (CORINFO_SIG_INFO) for a method. Our solution is to represent a method’s signature using a tuple that contains both a CORINFO_SIG_INFO (for arguments other than this) and a CORINFO_METHOD_HANDLE (to get the type for this). This tuple and the RotorTypeInfo tuple are similar to the OpType tuple class used in Rotor’s built-in FJIT.

Conclusions About the JIT Compile-Time Interface

In summary, we found Rotor’s compile-time JIT interface (ICorJitInfo) generally well designed. However, some information needed for optimizations is missing. It was also necessary to work around some limitations such as the inability of a JIT to get handles for primitive classes. We have the impression that ICorJitInfo is narrowly defined to provide just the functionality needed for FJIT. While this makes the interface simple, it complicates adding new, more optimizing JITs to Rotor.

The ICorJitInfo class inherits from a number of abstract superclasses that each define functions in various areas of compile-time information (such as methods, modules, fields) and areas of runtime information (such as helper functions and profiling data). We expect to add support for our optimizations by adding a new superclass. This will contain, for example, methods to get class hierarchy and profilebased recompilation information.

The lack of documentation about Rotor’s internals was another obstacle. While the book, Shared Source CLI Essentials [9], is a great help, too often we resorted to experimentation to discover what Rotor functions to use. To be more widely successful as a VM intended for research, Rotor needs better documentation.

4 JIT RUNTIME INTERFACE

Besides the compile-time cooperation described earlier, StarJIT and Rotor must also cooperate at runtime. For example, although StarJIT generates code for managed methods, StarJIT and its generated code rely on Rotor for VM-specific operations such as object allocation. Similarly, the Rotor VM handles stack unwinding and root-set enumeration but it relies on the JIT to interpret individual stack frames. This section describes the runtime support needed to integrate Star- JIT into Rotor.

Helper Calls

The JIT-compiled code of StarJIT and Rotor’s FJIT both rely on calls to runtime-helper routines to perform VM-specific operations and to perform some commonly-used complex operations. These helper calls do such things as allocate objects, throw exceptions, do castclass or isinst operations, and acquire or release locks. Common complex operations include 64-bit operations on a 32-bit architecture. Rotor provides a mechanism to query for helpers in its ICorInfo interface. StarJIT’s Rotor-specific VMInterface, in turn, maps StarJIT helpers to Rotor ones. During our integration work, we encountered several issues specific to helper calls. In most cases, we were able to solve these issues within the Rotor-specific VMInterface layer.

The first issue we encountered involved the different calling conventions used by ORP and Rotor. StarJIT had been hardwired to use the ORP conventions when calling VM helper functions as well as other managed code. Instead, we modified StarJIT to use a new interface that allows the VM to abstractly describe the calling conventions to the JIT for any call. This description includes a specification of which arguments are passed in which registers, which are passed on the stack and in which order, how return values are returned, and whether the caller or callee pops arguments from the stack.

A second issue we discovered involved differences in both the required parameters and their order for different helpers. For example, ORP’s rethrow helper requires the exception as a parameter but Rotor’s does not. In addition, ORP and Rotor’s castclass helpers have the object and type descriptor in different orders. We considered the use of wrapper stubs to convert between one set of conventions and the others. However, these wrappers complicate stack unwinding and incur additional performance overhead. Instead, we modified StarJIT via #ifdef to use Rotor’s conventions. A more general approach would have an interface that allows the VM to communicate the parameter conventions for the VM helper calls to the JIT.

There are a couple of differences between Rotor and ORP related to typespecific helpers. A number of helpers, including the ones for object allocation, type checks, and interface table lookups, involve types that are known at compile time. In these cases, Rotor returns different helpers for different types, based on a type passed in at compile time. Accordingly, we modified the helper function lookup in StarJIT’s VMInterface to require a type for all type-related helpers. There are also differences in exactly which of several type-related data structures are passed at compile time or runtime to these helpers. We abstracted this detail into VMInterface so that the VM-specific code can give StarJIT the correct data structure to pass.

Another challenge involved helpers that StarJIT expected that were not provided by Rotor. In most cases, these were helpers for 64-bit integer operations (e.g., shifts) not provided by Rotor. In these cases, the helper could easily be implemented within the Rotor-specific VMInterface. Some other cases reflect a more serious mismatch between StarJIT and Rotor. For example, Rotor provides an unbox helper that performs the necessary type check on a reference and then unboxes it. In StarJIT, however, the type check and the actual unbox are broken into separate operations at an early point with the hope of statically removing the type check via optimization. StarJIT expects a helper to perform the unbox-specific type check but generates a simple address calculation to do the actual unboxing. Rotor, on the other hand, only provides a helper to perform the entire unbox. For now, we use the castclass helper instead to perform the unbox type check. However, this approach fails when the unboxed reference is a boxed enumeration type and will have to be corrected.

Finally, there are a number of helpers that Rotor provides that are not currently invoked by StarJIT. Some of these additional helpers are provided only to simplify portability: without them, Rotor’s FJIT would need assembly sequences specific to IA-32 and to PowerPC. Other helpers assist in debugging, while still more support additional functionality such as remoting. Up to this point, none of the applications that we have tried to execute with StarJIT have needed the additional functionality provided by these helpers. However, in the future, we plan to extend ORP’s VMInterface to enable StarJIT to query the VM and discover which of these additional helper functions must be called.

Code and JIT Managers

As part of our implementation of the multiple JIT support, we found we needed to use the other JIT manager in Rotor. We could not use a second instance of FJIT’s JIT manager because its implementation uses global variables to, for example, map program counters to methods and to manage memory. Two instances would have conflicting uses of these variables.

Another part of the runtime interface concerns stack walking activities such as root-set enumeration, exception propagation, and stack inspection. The Rotor design, like many other VMs, divides this task into one part that loops over the stack as a whole and another part that deals with individual stack frames. The loop part is in the VM proper and rightly so. Conversely, processing an individual stack frame depends upon the JIT’s stack conventions (e.g., where local and temporary variables of reference type are located and the location of callee-saves registers) and therefore requires the JIT’s cooperation. In Rotor, all processing of individual stack frames is done by the code manager.

The code manager that comes with Rotor makes many assumptions about JIT-compiled code for the IA-32 architecture:

The code for each method is expected to consist of a prologue, followed by the body, followed by an epilogue.
Only one epilogue may exist, and it must appear at the end of the compiled code.
The prologue and epilogue are precisely defined code sequences; no deviations are allowed.
Only ebp and esi are saved and available for use; ebp is used as a frame pointer, while esi is always a valid object reference (but possibly NULL). Registers ebx and edi may not be used.
The security object is at address ebp-8.
JITs give root-set information to the JIT manager in the form of an info block, which the JIT manager then passes to the code manager during root-set enumeration. This information is expected to match the particular structure of Rotor ’s JIT.

These assumptions of Rotor’s code manager fundamentally conflict with those of StarJIT. We therefore decided to write our own code manager. This code manager has to be part of the VM, but we decided to try emulating Rotor’s interaction with the JIT by having this new code manager simply convert all its calls into calls to a runtime manager placed in the same DLL as the matching JIT. We defined an interface along the lines of corjit.h, and we allow a DLL to export a runtime manager as well as a JIT. The resulting architecture of Rotor with StarJIT is shown in Figure 3. Note that StarJIT uses a separate set of the three JIT-related components from FJIT, and that StarJIT’s JIT and Code Manager components communicate with the actual implementations that are loaded from a separate StarJIT DLL.

Figure 3: The structure of Rotor with StarJIT loaded from a separate DLL

We found this approach generally straightforward. However, the parameters passed to different code manager methods are inconsistent. For example, the method UnwindStackFrame gets an ICodeInfo object, which can be used to identify the method and some of its attributes, but FixContext does not. Also, these methods need to know the current values of registers for the frame that they are unwinding, fixing up, or enumerating the roots of, and there are different types of contexts for FixContext versus UnwindStackFrame and most of the other methods. We decided to reflect these inconsistencies in the external interface. Since StarJIT’s runtime interface is more uniform and requires the method handle for the method of the frame, we used the info block to pass the missing information from compile time to run time.

Another minor point is that UnwindStackFrame is sometimes called with the context esp equal to either the address just above the arguments of the out-going call, or the lowest address of the out-going arguments. In general, there is no way to tell which of the two cases holds. This situation is fine if frame pointers are used; the context ebp can be used to find everything in the frame. However, requiring frame pointers on IA-32 reduces the number of usable registers from 7 to 6. For now, we have modified StarJIT to use frame pointers.

Exception Handling

Another significant difference between Rotor and StarJIT concerns the details of exception propagation. Here, the differences stem directly from the characteristics of CLI and Java. In CLI, there are exception handlers, filters, finally blocks, and fault blocks. Each of these is a separate block of bytecode from the region being protected, and control cannot enter these blocks except through the exception mechanism. Conversely, in Java, there are only exception handlers and these protect a region of bytecode. When an exception is caught in Java, control is transfered to a handler address which can be anywhere in the method’s bytecode.

Since StarJIT was developed against the interfaces of ORP, which originally supported Java and was later adapted to also support CLI, StarJIT’s design reflects the Java exception mechanism. First, StarJIT implements finally and fault blocks by catching all exceptions and then rethrowing them. This behavior is close to but not exactly that required by the CLI specification, although it is correct for code compiled from C#. Second, there is a particular bytecode for leaving an exception handler and returning to the “main” code (a leave). Rotor requires the JIT at such a bytecode to call the runtime helper EndCatch. This helper cleans up stack state generated by the VM for exception handling and ensures that finally blocks are called. We modified StarJIT to call this helper since ORP does not have a corresponding helper. Finally, Rotor needs an exception handler to be compiled to a contiguous region of native code and it needs to know the start and end addresses of that region. StarJIT knows the start address, but not the end address, and might rearrange blocks so that a handler is no longer contiguous. We do not have a solution for this problem yet. For now, we give a zero end address—this causes Rotor to compute incorrect handler nesting depths, but otherwise seems to have no ill effect.

Conclusions About the JIT Runtime Interface

Rotor could be significantly improved through better support for multiple JITs. Key additions would be the following, all of which we have prototyped:

A mechanism for loading more than one JIT from a DLL and trying each JIT in turn when compiling a new method.
The ability to load an independently written JIT manager and code manager from a separate DLL (presumably packaged in the same DLL that holds the corresponding JIT). These should interact with the VM through an abstract interface such as those in corjit.h and corinfo.h.
A more consistent set of parameters for the code manager functions, to make for a more uniform interface.

Our experience integrating StarJIT with Rotor also led to changes in Star-JIT. For example, StarJIT’s VMInterface had to be generalized to better support requests for type-specific helpers. We also found that StarJIT should allow calling conventions to be specified by the VM. Currently, we use #ifdefs in StarJIT’s source code to control calling conventions, but this makes the code hard to maintain and the resulting code less flexible. If StarJIT queried the VM about the calling conventions to use, it could adapt itself dynamically to the needs of the VM. Also, the design of a clean and flexible runtime helper interface is an interesting problem, and one we would like to address.

5 GC INTEGRATION

Rotor includes a C++ interface class, GCHeap, that declares most of the GC-related methods that the rest of Rotor uses. However, this GC interface is not as complete, explicit, or cleanly-defined as its JIT interface. In addition, Rotor does not support the dynamic loading of garbage collectors from DLLs. As a result, to integrate our GcV4 garbage collector into Rotor, we needed to add the GcV4 code directly to the Rotor VM code base. Our integration work involved three sets of changes: revising the implementation of Rotor’s GC interface to use GcV4, changing the rest of Rotor to use a new GC with different assumptions, and modifying GcV4 to run inside Rotor. This section discusses our experience integrating this collector, including the issues we encountered and our solutions.

Probably the most significant issue we found is that Rotor exposes too much about the implementation of its collector to other components in the system. For example, examining the methods in Rotor’s GCHeap interface class reveals that Rotor assumes a generational collector that treats large objects differently than small ones, and that allows clients to query whether an object is part of the ephemeral generation. Much of this is likely to change if Rotor’s GC is replaced with another GC. As another example, the Rotor VM uses knowledge about the collector’s implementation to allow JITs to emit optimized code. The VM’s function JIT_Trial-Alloc::EmitCore can be called by JITs to emit code for the allocation fast path for many types of objects. That code assumes intimate knowledge of the GC’s data structures and object-allocation strategies.

Although Rotor’s GCHeap interface does not hide enough about the implementation of its GC, the GCHeap interface did help to reduce the number of changes we needed to make to the rest of Rotor. GCHeap is a public interface that gives other components access to most of Rotor’s memory management functionality, for example, object allocation and the registration of objects to be finalized. The GCHeap implementation often makes calls on a low-level C++ class, gc_heap, that exposes the low-level methods and data structures of Rotor’s own GC implementation. By modifying GCHeap’s implementation to call GcV4 methods instead of those in gc_heap, we minimized the number of changes required to Rotor and were able to localize many of the changes to just the GC-related files.

Other changes were needed to the Rotor VM. For example, we added calls to initialize and close down GcV4. We modified Rotor’s thread constructors and destructors to keep GcV4 up-to-date with respect to thread existence. Finally, we modified JIT_TrialAlloc::EmitCore to no longer make assumptions about the collector’s data structures. Instead, the allocation stub emitted by EmitCore now first calls a “fast” GcV4 allocation function that succeeds if sufficient memory is readily available, but returns NULL otherwise. If this fast allocation routine fails, the stub emitted by EmitCore falls through to a slow-path allocation routine that may trigger a garbage collection. While we initially hard-coded the name of the GcV4 fast-path allocation routine in EmitCore, we soon realized that this interaction between EmitCore and the GC could be generalized by adding a method to GCHeap that returns a pointer to the GC’s fast allocation function.

Similarly, GcV4 expects the VM to supply a number of functions. One especially important function, used at the start of a garbage collection, requests that the VM stop all threads and enumerate all roots. Since stopping (and restarting) threads in Rotor requires a very specific sequence of events, we reused much of the existing Rotor code for this purpose. We also reused the two CNameSpace methods GcScanRoots and GcScanHandles to do root-set enumeration by passing them our own GcV4 callback function instead of Rotor’s one.

Figure 4: Rotor and ORP expect different object layouts; our final layout

Integration Issues and Solutions

In the course of our integration, we found a number of conflicts between the assumptions made by GcV4 and Rotor about the layout of several key data structures. These are listed below along with our solutions.

Object Layout. Since GcV4 was originally developed for ORP, GcV4 expected objects to use ORP’s memory layout. Moreover, GcV4 assumed that each object began with a pointer to the vtable, followed immediately by ORP’s multi-use obj_info field. This field holds synchronization, hash code, and garbage collection state, and so resembles Rotor’s sync block index. However, Rotor places other object data at a four byte offset while Rotor expects the sync block index to be at a four byte negative offset from the start of an object. Realistically, too many parts of Rotor depend on this layout to change it. Also too many parts of Rotor use the sync block index in ways incompatible with GcV4’s use of the obj_info field, so mapping obj_info to the sync block index is not a solution. The different layouts expected by Rotor and ORP are shown in the left-hand side of Figure 4.
Our solution was to place the ORP obj_info field before each object, at a negative eight offset from the object’s vtable pointer. This offset does not conflict with any part of Rotor’s object layout. As a result, no Rotor component is aware of the extra field. The right-hand side of Figure 4 shows our final object layout layout.
Vtable Layout. GcV4 assumed that the first four bytes of each vtable is a pointer to a structure containing GC-related information that indicates, for example, whether the object contains pointers and if so, the offset of each pointer. The start of Rotor’s MethodTable structure, however, contains the component size (for array objects and value classes), the base size of each instance of this class, and a pointer to the corresponding class structure (EEClass). There are many places in Rotor that assume specific offsets to these fields, so changing the field layout would raise many problems.
Storing the pointer at a negative offset from the start of the vtable is also not an option. That would interfere with Rotor’s CGCDesc and CGCDescSeries structures, which are stored before the vtable if the class contains pointers. These structures are used by FJIT as well as Rotor’s collector, so we could not use that space for our pointer.
We solved this by reserving space in Rotor’s MethodTable class at a sufficiently high offset to avoid conflicts with Rotor’s fields.
Thread Layout. GcV4 assumed that a portion of each thread’s data structure is storage reserved for its use, which is is an essential part of ORP’s object allocation and garbage collection strategies. However, Rotor does not have an analogous field in its thread data structure. Our solution was to add the extra storage at the end of thread objects.

Conclusions About the GC Integration

Rotor should expand and abstract its GC interface so that it more resembles its JIT interface. The parts of the current interface that are only applicable to one style of garbage collector should be removed and replaced with more generic versions that support the introduction of different kinds of garbage collectors. Moreover, Rotor should use functions to abstract the interactions between the collector and other Rotor components. These functions would hide details about the collector’s implementation and help to make explicit the assumptions it makes. Such a GC interface would make it easier to modify the collector and to experiment with new implementations without affecting other components. Our experience with ORP’s GC interface has been strongly positive, and it has allowed us to use several different collector implementations without changing the VM or JITs.

To enable easier GC experimentation, it would help if Rotor’s GC could be dynamically loaded like its JITs. New collectors could be plugged in to Rotor including ones tailored for particular needs, such as when an application needs short GC pause times more than high throughput. Changing Rotor to dynamically load its GC would also help to minimize assumptions made by the VM or other components.

To minimize the problems caused by assumptions about the layout of key data structures, Rotor’s GC interface could include a function that returns the offset of such fields as its sync block index. This would avoid other components assuming a fixed constant for that value. A similar interface function would also help ORP, GcV4, and StarJIT by reducing the number of their layout assumptions.

6 STATUS AND FUTURE WORK

When we started our integration work, we wondered how suitable Rotor would be as a research platform. That is, how difficult would it be to add our optimizations and what changes to Rotor would be needed to support them? Our plans were to add StarJIT and GcV4, then later implement in Rotor a number of optimizations such as our synchronization techniques, prefetching, and DPGO. This paper describes the approaches we took to integrate StarJIT and GcV4, and our experience with that effort.

The StarJIT integration was straightforward except for a few issues. While most of the changes needed were within StarJIT, we found that we had to modify Rotor to add support for multiple JITs and to add a new code manager for StarJIT. We also needed to support another JIT manager in Rotor—because we could not create another instance of FJIT’s JIT manager since its implementation depends on global variables. Although Rotor allows JITs to be loaded dynamically, and communicates with those JITs using its abstract JIT interface, Rotor does not allow JIT or code managers to be loaded dynamically. Adding new code or JIT managers requires modifying Rotor itself, although abstract interfaces for these managers could be added to Rotor without much trouble. Later, we expect to add support for some of the more sophisticated StarJIT optimizations such as DPGO by augmenting Rotor’s JIT interface with a new abstract superclass that defines the required functions.

We found that adding a new garbage collector to Rotor was much more difficult than integrating a new JIT. Rotor does not have a clean interface for GCs that resembles its JIT interface. Its GCHeap class, for example, exposes details about the GC’s implementation that are used by several other parts of the system including FJIT, so adding a different implementation required changing those parts. We tried to minimize the changes to Rotor, but a number of changes were needed, for example, to have Rotor call functions in the GC interface that GcV4 exports. Both Rotor and GcV4 make assumptions about the layout of objects and virtualmethod tables, so it was necessary to modify our GcV4 implementation to place the fields that GcV4 needs (such as one used to hold a forwarding pointer during collections) in locations that do not conflict with fields required by Rotor.

Our work integrating StarJIT and GcV4 with Rotor is ongoing. We can run a number of test programs and are currently getting our modified Rotor to work with the C# version of the SPEC JBB2000 benchmark [8]. Our plans for StarJIT include adding support for pinned objects and full support for CLI exceptions (such as filters), as well as support for our optimization technologies such as DPGO and prefetching. We are optimistic about being able to complete this work and look forward to exploring other opportunities for improving Rotor’s performance.

REFERENCES

[1] A.-R. Adl-Tabatabai, J. Bharadwaj, D.-Y. Chen, A. Ghuloum, V. Menon, B. Murphy, M. Serrano, and T. Shpeisman. "The StarJIT Compiler: A Dynamic Compiler for Managed Runtime Environments". Intel Technology Journal, 7(1), February 2003. Available at http://intel.com/technology/itj/2003/volume07issue01/art02_starjit/p01_abstract.htm.

[2] A.-R. Adl-Tabatabai, R. Hudson, M. Serrano, and S. Subramoney. "Prefetch injection based on hardware monitoring and object metadata". In SIGPLAN Conference on Programming Language Design and Implementation, Washington, DC, USA, June 2004.

[3] M. Cierniak, M. Eng, N. Glew, B. Lewis, and J. Stichnoth. "Open Runtime Platform: A Flexible High-Performance Managed Runtime Environment". Intel Technology Journal, 7(1), February 2003. Available at http://intel.com/technology/itj/2003/volume07issue01/art01_orp/p01_abstract.htm.

[4] J. Dean, D. Grove, and C. Chambers. "Optimization of object-oriented programs using static class hierarchy analysis". In Proceedings of European Conference on Object-Oriented Programming, pages 77–101, Aarhus, Denmark, Aug. 1995. Springer-Verlag (LNCS 952).

[5] ISO/IEC 23270 (C#). ISO/IEC standard, 2003.

[6] ISO/IEC 23271 (CLI). ISO/IEC standard, 2003.

[7] Microsoft. Shared source common language infrastructure. Published as a Web page, 2002. See http://msdn.microsoft.com/net/sscli.

[8] Standard Performance Evaluation Corporation. SPEC JBB2000, 2000. See http://www.spec.org/jbb2000.

[9] D. Stutz, T. Neward, and G. Shilling. Shared Source CLI Essentials. O’Reilly, Mar. 2003.

About the authors

Todd Anderson is a staff researcher in Intel’s Programming Systems Lab after joining Intel in 1999. He received his M.S. and Ph.D. degrees in Computer Science from the University of Kentucky. Todd has worked in a variety of areas including distributed file systems, distributed computing, IETF forwarding/control separation and is currently focused on memory management in managed runtime environments.

Marsha Eng is a researcher in Intel’s Programming Systems Lab. Marsha joined Intel in 2001, with an M.S. degree in Computer Engineering from the University of California, San Diego, and a B.S. degree, also in Computer Engineering, from the University of Washington.

Neal Glew is a staff researcher in Intel’s Programming Systems Lab. He received a Ph.D. degree in Computer Science from Cornell University in January 2000.

Brian Lewis is a senior staff researcher in Intel’s Programming Systems Lab. Brian joined Intel in 2002. He previously worked at Sun, Olivetti Research, and Xerox. While at Sun Microsystems Laboratories, Brian worked on the development of virtual machines for several languages. He also worked on techniques for binary translation as well as portions of the Spring research operating system. Brian received a Ph.D. and M.S. degree in Computer Science and a B.S. degree in Mathematics from the University of Washington.

Vijay Menon is a staff researcher in Intel’s Programming Systems Lab. He received a B.S. from the University of California, Berkeley in Electrical Engineering and Computer Science and a Ph.D. from Cornell in Computer Science. His current research interests include program analysis, dynamic compilation, and managed runtime environments.

James Stichnoth s a senior staff researcher in Intel’s Programming Systems Lab. Jim joined Intel in 1997, with a Ph.D. degree in Computer Science from Carnegie Mellon University and a B.S. degree in Computer Science from the University of Illinois at Urbana-Champaign. He has worked extensively on Virtual Machines, Just-In-Time Compilers, and development of enterprise Java applications. Jim currently leads a group researching Virtual Machine technology.

Cite this document as follows: Todd Anderson, Marsha Eng, Neal Glew, Brian Lewis, Vijay Menon, James Stichnoth: "Experience Integrating a New Compiler and a New Garbage Collector Into Rotor", in Journal of Object Technology, vol. 3, no. 9, October 2004, Special issue: .NET Technologies 2004 workshop, pp. 53-70. http://www.jot.fm/issues/issue_2004_10/article3