Flexible Language Interoperability
Torbjörn Ekman,
Dept. of Comp. Sci.,
Lund University
Peter Mechlenborg,
DAIMI,
University of Aarhus
Ulrik Pagh Schultz,
Maersk Institute,
Univ. of South Denmark
|
 |
REFEREED ARTICLE

PDF Version |
Abstract
Virtual machines raise the abstraction level of the execution environment at the cost
of restricting the set of supported languages. Moreover, the ability of a language
implementation to integrate with other languages hosted on the same virtual machine
typically constrains the features of the language. In this paper, we present a highly
flexible yet efficient approach to hosting multiple programming languages on an objectoriented
virtual machine. Our approach is based on extending the interface of each
class with language-specific wrapper methods, offering each language a tailored view
of a given class. This approach can be deployed both on a statically typed virtual
machine, such as the JVM, and on a dynamic virtual machine, such as a Smalltalk
virtual machine.
We have implemented our approach to language interoperability on top of a prototype
virtual machine for embedded systems based on the Smalltalk object model, which
provides interoperability for embedded versions of the Smalltalk, Java, and BETA
programming languages.
1 INTRODUCTION
Virtual machines are at the foundation of modern, object-oriented languages such as
Java and C# . Virtual machines raise the level of abstraction and enable dynamic
features such as code mobility, which is central to the emerging trend of pervasive
computing. The virtual machines designed to support Java and C# (JVM and
CLR) have significant differences, but are nonetheless designed to support very
similar languages with similar object models [3, 13]. Although only the CLR is
designed to support multiple programming languages, both virtual machines have
been targeted by a multitude of languages [3, 8, 1, 6, 16]. Languages that compile
to the instruction set and object model of the target virtual machine have obvious
advantages compared to interpretation: apart from an order of magnitude in terms
of performance, compilation is also essential for enabling two-way interoperability
with other languages, e.g., the ability not only to consume classes implemented in
other languages but also to produce classes that can be used from other languages.
The mapping from language to virtual machine must support the semantics of
the language and is normally designed to produce code that is as efficient as possible.
Nevertheless, this same mapping is critical for interoperability with objects
implemented in other languages. Naming is a simple example of a potential conflict, and is a fundamentally challenging issue when considering languages with different
naming schemes, such as Java with type-based overloading and Smalltalk with
keyword-based selectors. To enable programming to an interface independently of
the implementation language, naming must somehow be transparent across languages,
which constrains the mapping of each language to the virtual machine. Differences
in type systems are also dependent on the language mapping. For example,
preserving static typing in a Java object interacting with Smalltalk requires dynamic
checking of parameter and return types. For the sake of performance, it is critical
that such dynamic checking is done only when required, that is, when crossing language
barriers. In general, language implementors are faced with a fundamental
choice: either obey some common language mapping, potentially constraining the
language implementation in terms of features or efficiency, or use a dedicated language
implementation, obscuring the appearance of functionality made available to
other languages.
We have developed a highly flexible approach to multi-language virtual machines
based on run-time or compile-time extension of class interfaces to support integration
between heterogeneous languages. Language integration is achieved without enforcing
a common implementation-level language subset, by enabling components to
exchange implementation requirements using a protocol based on shared, languageindependent
interface definitions. Our system thus provides a very high degree
of freedom for the language implementor while preserving component encapsulation
and ensuring full interoperability between classes implemented in different languages.
The contributions of our work are as follows.
- Simple and efficient approach to language interoperability based on run-time
or compile-time extension of method tables to accommodate an open set of
programming languages.
- Rich cross-language interaction including cross-language inheritance and runtime
type checking for preserving static typing invariants.
- Experimental demonstration of the feasibility of our interoperability approach
using realistic implementations of the Smalltalk, Java and BETA programming
languages.
We have implemented our approach to language interoperability on top of a prototype
dynamic virtual machine for embedded devices. This virtual machine currently
supports embedded versions of the Smalltalk, Java, and BETA programming languages.
Motivation
Our work has been conducted in the context of the Palpable Computing (PalCom)
project (www.ist-palcom.org) which concerns, among other things, the development of a virtual machine for embedded systems. This virtual machine, named PalVM,
is based on a Smalltalk object model, but also provides support for the Java and
Beta programming languages [9, 4, 14]. Software is currently being developed in
all three languages, but language integration is ad-hoc based on language-specific
foreign function interfaces that result in a strong coupling between implementations
otherwise isolated by interfaces.
 Figure 1: Network communication in multiple languages?
As an example, consider the classes shown in Figure 1. The Java class Socket provides a basic means of communication using the UDP protocol. We wish to
use this class from Smalltalk, in the implementation of the class Connection which
encapsulates a remote destination. Both classes are compiled to the same virtual
machine instruction set, and both compilers use similar calling conventions, but this
integration is nevertheless non-trivial. The naming conventions between Java and
Smalltalk are incompatible, and runtime type checking is required to preserve static
type checking within Java. As a starting point, we could modify the implementation
of Connection to use the Java naming convention when calling methods; to support
overloading, the Java compiler embeds types into the selector name. Moreover,
we would probably need to convert the Smalltalk ByteArray object given as an
argument to the Java native representation. Suppose however that we also wish to use the class MessageRedirector implemented in BETA, which wraps a socket
and redirects messages to a new address. Following the same path, this class must
presumably now also use the Java naming convention to be compatible with the
existing implementation. This renaming is obviously contagious and detrimental to
defining a clean API using the proper naming conventions of each language (e.g.,
selectors in Smalltalk versus type-based overloading in Java). The Java naming
conventions cannot be used globally because they are incompatible with Smalltalk
and BETA (the former has no types while the latter for example allows multiple
return values). Furthermore, this approach makes it difficult to reuse code, e.g.,
replacing the Socket class by another class implemented in a different language.
As an alternative to manually implementing glue code for integrating implementations
across languages, we have implemented a language interoperability framework
which allows classes such as the ones shown in Figure 1 to interoperate. More
generally, in the PalCom project, software is developed in different programming
languages and deployed as fine-grained components. Exchangeability of components
is essential, as is runtime efficiency due to limited resources of embedded
systems. In general, we require the following properties from a solution: (1) Sharing of classes and object instances by supporting method invocation and inheritance
between different languages. (2) Insulation for method calls that cross languages to
allow run-time checks of parameters and return values as well as conversion of value
objects. (3) Scalability to a large, open set of languages that are not aware of each
other. (4) Flexible naming to make classes written in one language natural to use
from other languages. (5) Interchangeability of a class with classes written in other
languages but respecing the same interface. Moreover, since we work in an embedded
setting, efficiency in terms of memory usage and execution overhead is essential.
We note that the .NET approach fails on items (2) and (4) (see Section 6 for a
discussion), and that item (5) is essential for properly supporting polymorphism in
a multi-language setting.
2 LANGUAGE INTEROPERABILITY
Object-oriented languages vary significantly in terms of basic features such as inheritance
(single, multiple, mixins, . . . ), how objects are created (constructors or
factory methods), typing system and so on. Nevertheless, it is our hypothesis that
most object-oriented languages can be compiled to a given virtual machine so long
as it does not enforce a strict coupling between interface and implementation. Examples
of such virtual machines are the JVM and the CLR (where interfaces can be
implemented by any class containing the required methods) but also any dynamically
typed virtual machine (here the concept of an interface is completely dynamic).
Naturally, certain features such as dynamic typing lends itself best to a dynamically
typed virtual machine; we return to this issue later, but basically it is mostly independent
of our approach. In general, given a set of object-oriented languages
that compile to a given virtual machine, our goal is to enable basic interoperability between these languages, in the form of message passing and inheritance. This
basic interoperability scheme is characterized in terms of a basic object model that
dictates both what a language must provide to be accessible to classes implemented
in other languages and also what features a language can consume from classes
implemented in other languages.
On a virtual machine where different languages are compiled to the same basic
object model, interoperability becomes an issue of duck typing: “if it walks like a
duck and it talks like a duck then it is a duck.”1. In other words, if objects implemented
in one language can be made to behave as objects implemented in another
language, then they can be considered to be objects implemented in that language.
Thus, the complexity of supporting a very variable set of language features basically
boils down to having an interoperable naming scheme. On a typed virtual machine
such as the JVM or the CLR, interfaces are used to express this common method
naming scheme at the virtual machine level. Nevertheless, fixing a specific naming
scheme limits compiler writers and impedes enriching cross-language interaction
with features such as insulation. Rather, our approach relies on independent naming
schemes for each language based on a symbolic, language-independent Interface
Description Language (IDL) description of classes.
The issue of how to support different basic libraries, thread models etc. is not
considered relevant for our work: memory constraints dictate that a single library
must be used, but it is nevertheless highly relevant to make this library convenient
to use from each language supported by the virtual machine.
Basic example
As an introductory example of our approach to language integration, consider a
Smalltalk virtual machine targeted by Smalltalk and Java (e.g., the PalVM virtual
machine). Java basically compiles to this dynamically typed virtual machine by
encoding type names into selector names, e.g., the signature of the Java method“String substring(int index)” can be encoded as “substring String int:”. The
unit of deployment supported by this virtual machine is a component, which encapsulates
a set of classes. In the context of our work, the classes contained within a
component must all be implemented in the same language, but can export interfaces
using a language-independent notation (e.g., an IDL).
The Smalltalk class Point shown in Figure 2, left trivially compiles to this virtual
machine. This class defines x and y coordinates, accessor methods, and a textual display
method. The class Point is contained within the component Geometry (which is
implemented in Smalltalk). The class is made accessible to other languages through
the component IDL interface graphics shown in Figure 2, right. This IDL interface
defines the operations of the class including the types of the arguments and the return value. (For this example we assume that the types int, String, and PrintStream are identical in all languages supported by the virtual machine.) By implementing the graphics IDL interface, we say that Geometry becomes a producer of graphics.
 Figure 2: Point class in Smalltalk and interface in IDL

Figure 3: The class Point before and after linking with a Java component
Cross-language message passing
Let us now assume that the IDL interface graphics is consumed by another component
named Client implemented in Java, as shown in Figure 3, left. This figure
shows the component Geometry being used by the component Client, but prior to
IDL interface linking; interface linking is performed when a component is resolved
(which is at runtime on PalVM). After interface linking, the class Point contains a
number of new Java support methods that are inserted into the class by the component
loader during interface linking, as shown in Figure 3, right.
For each Smalltalk method defined by the IDL interface, a wrapper method is
introduced into the class Point, using the naming convention employed by the Java
compiler. There are two different kinds of wrapper methods: alias methods for
providing Java-specific names for Smalltalk implementation methods and insulation
methods for performing Java-specific processing around Smalltalk implementation
methods. Alias methods can be implemented using a standard wrapper method, but
on PalVM an alias method is implemented as an extra entry into the method table, pointing to an implementation method in the same class. Such VM-supported alias
methods provide a lightweight approach to decoupling language implementations
without requiring dynamic generation of bytecode. As an example, consider the IDL
method reset (see figure 2) which Java expects to have the name “reset_ _void_”
(the Java naming convention embeds argument and return types into the name.)
This dependency is handled by the component loader during interface linking by
introducing an alias into the concrete Smalltalk class Point:

This alias makes the concrete Smalltalk method available using the Java naming
convention. (We here use a Smalltalk pseudo-syntax to indicate aliases.)
Insulation methods
Insulation methods allow arbitrary code to be executed when crossing language
barriers. This can for example be used for checking the parameter types before a
method call proceeds to the original method in the producer, and checking the type
of the return value before returning to the consumer. As an example, consider the
IDL method getX (see Figure 2) in graphics. Since Java is statically typed, the type
of the return value from getX should be checked. Therefore a consumer insulation
method is inserted into Point, with the name getX_ _int_, which is defined as follows:

This method uses a static method from the class IDLJava to check the return type
of the method. The alias is used to bind the Java naming convention to the
Smalltalk naming convention without requiring dynamic code generation (the insulation
method can then be statically generated in advance by the Java compiler).
Using a similar approach, producer insulation methods can check the types of parameters
when calling Java code from another language.
Cross-language inheritance
A class described by an IDL interface can be subclassed by a class from a different
component, implemented in another language. This also makes the subclass
a consumer of the superclass. Figure 4, left, shows the class ColorPoint (defined
in the component ColorCanvas) which subclasses Point. This class introduces an
additional field for representing the color, and overrides the IDL method toString.
The overriding method is introduced into ColorPoint under the Java-based name“toString_ _String_”. Calling the method display on this object must cause the
newly defined method to be subsequently called, and not toString defined in Point,
e.g. Point should become a consumer of ColorPoint, regarding toString.
 Figure 4: The class ColorPoint which subclasses the class Point
The effect of making Point a consumer of ColorPoint is that a wrapper method for toString is
inserted into ColorPoint, thereby overriding the original Smalltalk implementation.
The resulting class can now be used from Java. In general, we refer to the set of
methods that all correspond to a single IDL method as a method group. Subclasses
should always override all methods in a method group.
A common object model
Our approach enables integration of components according to their IDL declarations.
The IDL defines the PalCom Common Type System (PCTS) which is the
common object model that the different programming languages in PalCom use to
communicate. To be interoperable with other languages, the features of a language
must thus be expressible in terms of the PCTS. We have chosen a minimal common
type system but with both interface and implementation inheritance, allowing us to
express both interface implementation and code reuse across languages (the latter is
essential for supporting frameworks). The choice of a minimal type system simplifies
the overall system and in particular simplifies the consumer role (because there are
fewer features with which to integrate) at the cost of complicating the producer role
(because there are fewer features to use when expressing the produced entity).
The PCTS has a common superclass Object and a set of primitive types which
are represented by the corresponding Smalltalk classes, including strings, and integers.
Based on these primitive types, concrete classes and interface classes can
be declared. Similarly to Java, there is single inheritance for concrete classes and
multiple inheritance for interface classes. A class contains a number of named operations
with named and typed parameters and a typed return value. Concrete classes
can moreover contain static operations. Parameter and return types are required
to be invariant across subclasses. Constructors can be specified for a concrete class
(constructors have the reserved name init in the IDL). There is no concept of fields,
so accessor methods must always be used to access fields across languages.
The PCTS type Object maps to the Smalltalk class Object which is the common
superclass of all object instances in PalVM. Parameters or return values of the type Object must thus either be converted to a local type or the language implementation
must accept arbitrary foreign objects from other languages. Objects can be
wrapped inside a proxy which provides reflective access to the foreign object. Alternatively,
language support methods, described in Section 3, can be used to adapt
arbitrary objects from other programming languages to make them compatible with
the language implementation.
3 FLEXIBLE LANGUAGE INTEROPERABILITY FRAMEWORK
Our approach to language interoperability is implemented in the flexible language interoperability
framework (FLIF). We now define the FLIF algorithms for supporting
cross-language method invocation and cross-language inheritance. Details regarding
our implementation are covered later.
We note that although our approach to language interoperability is expressed
in terms of classes organized into components, a component model is not essential.
The specific role of a component is to aggregate classes into a unit that is treated
uniformly, and to provide information about the set of interfaces provided by the
classes contained in the component. Thus, the component model need not be supported
by the virtual machine, but can simply be implemented by a mapping from
classes to their IDL interfaces and implementation language.
Cross-language method invocation
A component encapsulates a set of classes and contains interface export and import
annotations as well as an annotation identifying the language-specific name mapping
used in the implementation of the component. When the component is loaded, its
classes are processed by FLIF, which manipulates the representation of the classes
according to their relations to components implemented in other programming languages.
In more detail, a component may export selected classes through an IDL
interface, which makes the classes available to other components implemented in
any language supporting FLIF.
Notation. We use X: L = (A1;...;An) to denote a component X implemented in
the language L with annotations Ai. In this context, the relevant annotations are
import and export. An import annotation import I from X specifies consumption
of the interface I from the component X. Interfaces are specified in an interface definition
language (IDL), but may be automatically generated. An export annotation

Figure 5: Enabling cross-language method calls
specifies production of IDL interface I binding each IDL class Ci in IDL interface I to
a concrete class . An IDL interface defines concrete IDL classes Ci and (Java-style)
IDL class interfaces , as follows:
An IDL class simply defines a set of operations: C = (p1,...,pr).
The operation names in IDL classes are abstract in the sense that each language
has a specific mapping from IDL method names to concrete method names such
that any IDL operation name can be mapped to a unique concrete method name.
The concrete method names generated from the IDL for a given language must be
equivalent to the method implementation names generated from source code written
in this language, so that implementation language neutrality is retained (e.g., IDLgenerated
names are equivalent to native names).
A producer component contains a set of classes and an annotation. The component
annotation contains both the IDL interfaces and a binding from IDL classes to
concrete classes of the component; the concrete class can either be defined locally
or in a class required by the component. Consumer components contain a set of
annotations describing which IDL interfaces the component depends on.
When a component is resolved by the virtual machine, its dependencies on other
components are also resolved. In particular, IDL-defined dependencies on other
components are resolved, as follows. A component X written in language LX may
import IDL interfaces from any number of components. Let Y be one such component
which exports the IDL interface I:

For each IDL class C in the interface I, let C be implemented by the concrete class C’ written in the language LY and nested in the component Y. The processing of an
operation pi of class C’ taking parameters P is illustrated in Figure 5, and consists
of three steps that modify C’:
- Insertion of a producer wrapper method piLY ; the function
performs parameter insulation specific to the language LY.
- Insertion of a consumer wrapper method piLX which calls the producer wrapper method through an alias piLXA; the function
performs return value insulation specific to the language LX. The alias is used to decouple the implementation
of the consumer wrapper from the producer language.
- Insertion of the alias piLXA linking piLX to piLY .
It is the responsibility of the consumer wrapper method to call piLXA. If insulation is
not required for a given method, a direct alias is used instead as an optimization. On
a statically typed virtual machine, an interface containing these language-specific
methods would also be added to the class.
The class modifications are performed by the FLIF framework using a languagespecific
name mapper object obtained from the annotation on each component, as
described in Section 5. The language-specific name mapper object is also responsible
for introducing language-support methods into classes when appropriate, for
example on PalVM to allow a Smalltalk object to be used as a Java object (e.g., be
the equivalent of an instance of java.lang.Object).
The overall effect of this process is that wrapper methods are created for each
method in each IDL class, and language support for the language of the consumer
is established in the producer. After IDL dependencies have been resolved, simple
cross-language method invocation is supported from the consumer component to the
producer component. Insulation methods are named according to the calling conventions
of other languages, and hence are never used from consumers implemented
in the same language.
Cross-language object instantiation
A concrete IDL class can be instantiated as an object from any consumer language;
it is up to the concrete language to represent the constructor. Concretely, our
Java mapping uses standard Java constructors, whereas an instance method named init is generated for Smalltalk and Beta (neither of which have an explicit notion
of constructors). Constructor chaining as known from Java is supported but not
enforced between languages, which means calling the superclass constructor is up
to the programmer if not enforced by the compiler. If no constructor is specified in
the IDL, it is up to each language mapping to generate a default constructor.
Cross-language implementation inheritance
IDL class inheritance allows a subclass to not only extend the superclass with additional
state and behavior but also to override existing behavior. The PCTS objectmodel considers all fields private to the object, e.g., the fields of an object can only
be accessed from within methods that have been invoked on this object. Thus,
cross-language interoperability is indifferent with regards to adding new fields in a
subclass. New methods introduced in a subclass may need to call methods from the
superclass, which is supported by making the class a consumer of its superclass. In
effect, we use a black-box approach to inheritance where a consumer class can only
see those properties of the superclass declared in the IDL interface. Overriding existing methods from the superclass means that all consumers of
the superclass should also be able to use the overridden method. Conceptually,
the subclass overrides the method from the IDL class. In more detail, the entire
method group (the concrete method from the superclass and all of its consumer
wrapper methods) must be overridden. This overriding is implemented by introducing
wrapper methods for the overriding method into the subclass, one for each of
the languages used by the consumers of the superclass.
Concretely, consider a component Y exporting the interface I. The interface I is
imported by a number of components, Z1 to Zn, written in different languages. The
method group of operation p in C then consists of the original method p’ in C’ and
wrapper methods p’Z1 ,. . . ,p’Zn generated for the Zi components:

Now let X be a component that imports Y but also defines a subclass D of C that
overrides p with pD. The overriding method pD should be callable from Y and all
consumers of Y. This property is ensured by making all the consumers of Y, and Y itself, into consumers of X, regarding p. Following the algorithm described in
Section 3 ensures that wrapper methods p’, p’Z1 ,. . . ,p’Zn will be inserted into D,
and as a result, all consumers of Y and Y itself will be able to call pD.
Cross-language interface inheritance
In addition to IDL classes, the PCTS also supports IDL class interfaces. Supporting
subclassing of IDL class interfaces is very similar to supporting ordinary subclassing,
as described in the previous section. The difference is that IDL class interfaces are
an abstract construct, and therefore do not have a direct concrete counterpart.
Consequently, the only producers of IDL class interfaces are concrete classes that
implement the IDL class interface.
Analogously to the description given in the previous subsection, consider the
following definitions:

As before, the component X imports Y, but the class D is a now a subclass of the IDL
class interface CI. Being a subclass of CI, D is also a producer of CI. For this reason,
all consumers of CI are also consumers of D. Following the algorithm described in 3
ensures that D is usable from all languages that contains consumers of CI.
4 LANGUAGE INTEGRATION EXAMPLES
We now describe how language integration works for the PalVM versions of Smalltalk,
Java and BETA.
Flexible language integration for Smalltalk
Making Smalltalk a consumer in FLIF is done as follows. Concrete IDL classes in
PCTS can be mapped directly to Smalltalk classes, as described in Section 3. As
for class interfaces, since Smalltalk does not have an interface construct, implementing
an IDL class interface reduces to ensuring that all operations in the IDL class
interface are also present in the class that is implementing the IDL class interface.
When acting as producer, Smalltalk blocks are mapped to an interface with a single
operation named value.
While an untyped language can consume IDL interfaces produced by both typed
and untyped languages, it is difficult for typed languages to consume IDL interfaces
produced by untyped languages. The only type guarantees that an untyped language
gives is that every value has the type Object. To enable automatic generation of
more precise interfaces, type information can be annotated on method declarations
and used to generate IDL interfaces produced by Smalltalk. This optional typing [7]
annotation is not checked by the compiler and has no effect on the runtime semantics,
but is included in the component annotation. In effect, the annotation provides a
structured means for the programmer to document the interface of each Smalltalk
class, but is also used when integrating Smalltalk with other languages.
Flexible language integration for Java
Java packages are used to denote components, meaning that each Java package is
compiled to a separate component. The Java compiler automatically integrates with
FLIF both as a consumer and as a producer. Insulation methods are used to ensure
static typing of Java code; for a given component the Java compiler generates
consumer and producer insulation methods stored in a separate class. IDL classes
and IDL class interfaces are automatically imported from IDL descriptions in components
during interface linking. An IDL interface is visible as a package in Java
and all types defined in that interface are members of that same package. The Java
front-end takes these interfaces into account during name binding and type checking
and can thus statically type-check Java applications that consume classes written in other languages. Full static semantic analysis can thus be done for the Java example
in Figure 3, even though the superclass is implemented in Smalltalk. To perform
the semantic checking, the compiler automatically reads the description of the IDL
interface graphics. The compiler automatically produces IDL interfaces for Java classes. A Java
component will thus have an IDL interface generated for each package. Only language
elements that are supported by PCTS are exported, e.g., nested classes are
not included. From a consumer point of view, the developer need not be aware of the
IDL at all since it integrates transparently with Java. When acting as a producer,
the developer needs to ensure that APIs to classes that are intended to be used by
other languages can be expressed in IDL and the PCTS.
Flexible language integration for BETA
For the larger part, the BETA language integrates straightforwardly with the FLIF
on PalVM. When BETA plays the role of consumer, a BETA external class pattern
is used to describe concrete IDL classes. Here, a single pattern represents the class
and encapsulates a pattern for each method. BETA however does not support
multiple inheritance nor Java-style interfaces, so IDL class interfaces are represented
separately from the class hierarchy. Such interface types are obtained using special
cast operations introduced into all concrete classes.
When BETA plays the role of producer, only patterns nested directly within
the “program” pattern can be used from other languages (this corresponds roughly
to plain, non-inner classes in a Java program). Such a pattern P must contain a
nested pattern P' for each method declared in the IDL interface being produced;
the nested patterns P'i can then be used as methods from other languages. When
an instance of the concrete IDL class bound to P is being created, the origin is
implicitly passed as an argument by a custom constructor introduced into P (this is
possible because of the restriction to a unique origin reference, the program pattern
instance). Nested patterns not playing the role of methods must be obtained by
calling factory methods.
Note that in this paper we do not investigate the type system. See future work
for ideas on how type-checking can be optimized when integrating BETA objects
with a statically typed language.
Super vs. inner
Rather than the traditional super method combinator, BETA uses inner, which
propagates calls from superclass to subclass. For example, a function Inc which
computes f(x) = x+1 and a function IncAndDouble which computes f(x) = (x+1)*2 could be defined using method combination, as follows:

Invoking the method f on an instance of IncAndDouble dispatches to the pattern f in the superclass; only when the statement inner is executed does control dispatch
to the method f of IncAndDouble. The BETA compiler generates code for inner as
follows. The pattern f is represented using a method f that creates an instance of
the class Inc.f and then calls the method do on that instance. This method calls a
hook method inner, which is empty in Inc.f. The pattern IncAndDouble compiles
into classes IncAndDouble and IncAndDouble.f; the latter is a subclass of Inc.f. This
subclass overrides the method inner with the implementation of “i*2 -> i”.
Following the notion of black-box inheritance introduced in Section 3, we integrate
inner and super as follows. When a BETA pattern is subclassed through
FLIF, we allow all methods produced by the BETA pattern to be overridden using
the standard semantics of the consumer language. This means that a BETA method
which has been overridden is not called unless a super-send is generated from the
subclass. Conversely, when a a BETA pattern is a subclass of a class produced
through FLIF, BETA uses override semantics when methods from the superclass
are redefined.
FLIF limitations for BETA
There is no run-time cross-language type checking of formal parameters or return
values for calls to methods from other languages; we expect insulation methods can
be used exactly as for Java to provide a similar functionality. Moreover, external
class declarations cannot be generated automatically from IDL interfaces, but must
be written manually. The module system of BETA currently makes it impossible
to generate a set of external class declarations in a separate file that can both be
used from the program and refer to types in the rest of the program. We plan to
overcome this restriction by extending the BETA compiler to use a simple inclusion
mechanism for auto-generated files, similar to #include from C.
5 IMPLEMENTATION
Runtime infrastructure
FLIF is implemented in Smalltalk and has been integrated as a post-processing
phase for the PalVM component loader (also implemented in Smalltalk, as opposed
to the class resolver which is implemented by the virtual machine). When a component
has been loaded at run time, it is passed to the FLIF IDL-mapper object.
Based on the component annotations, all producers are inserted into the FLIF interface
repository. The repository contains a symbolic representation of the IDL interfaces with references to the concrete components and classes that implement
the interface. Once all producers have been placed in the interface repository, each
consumer in the component is associated with the corresponding producer, in effect
building a graph of dependencies. To complete the resolution process, any producer P (either a concrete IDL class or an IDL class interface) that has been modified is
processed. A producer may have been modified either because it is new, because a
new subclass has been added, or because new consumers have been associated with
it. Processing of P proceeds as follows. The set of implementation languages of the
consumers of P is computed, and each consumer of P is extended to support the
entire set of languages. Extension of a consumer proceeds as explained in the two
previous sections. Annotations on the producer and consumer components identify
a language-specific name mapper class which contains static methods that map IDL
names to language-specific names. Classes are processed with superclasses before
subclasses, to ensure that the language support is in place before processing the
subclass. Of static and dynamic virtual machines
Our approach to language interoperability is implemented on PalVM, a dynamically
typed virtual machine. Nevertheless, our approach is not tied to specific features
of the PalVM virtual machine, and could be used on a more traditional virtual
machine, e.g. on Sun’s KVM, albeit with slightly different semantics, depending
on the properties of the virtual machine. Virtual machines for statically typed
languages typically do not support adding new methods at runtime, which makes
it impossible to add new language support to already loaded classes. Nevertheless,
the JPDA (Java Platform Debugger Architecture) could probably be used to reload
classes with additional interfaces and methods added, allowing dynamic loading to
be used with our approach; a detailed treatment of this implementation approach is
however out of the scope of this paper. Moreover, we note that due to its dynamic
nature, PalVM in principle allows arbitrary cross-language inheritance relationships
to be simulated, as long as the subtype comparison operation (e.g., “instanceof” in
Java) is reified as an operation that can be overridden by FLIF. On a statically typed
virtual machine such as the JVM the same degree of flexibility is not possible: at
the virtual machine level cross-language inheritance can only be expressed in terms
of interfaces.
Name spaces
FLIF is targeted towards an open set of languages, and so it is important that
adding new languages does not cause conflicts with already existing languages. A
potential problem is if two languages use the same naming convention for methods.
If two languages map an IDL method name to the same concrete name, only one of
the languages can be supported by that class. To avoid this issue we require that all languages supporting FLIF use a unique prefix for all method names; we refer to
such a prefix as a name space.2
Status of the implementation
The FLIF has been used to implement the run-time environment for the PalCom
environment. The Smalltalk part of the class library totals 5kLOC, the Java part of
the class library totals 13.7kLOC, but only around 0.5kLOC has been implemented
in BETA for PalVM. FLIF is implemented in roughly 1000 lines of Smalltalk code.
We note however that apart from the language-specific name mappers, the rest of the
framework is language-independent. The size of each of the language-specific name
mappers for Smalltalk, BETA, and Java is between 70 and 100 lines of Smalltalk
code.
Regarding the virtual machine, the only non-standard primitive operations we
require is the ability to create method aliases and the ability to copy methods from
one class to another (which may involve updating literal and symbol tables, depending
on the implementation of the virtual machine). The virtual machine is
currently implemented in two versions, a Java implementation (for desktop experiments)
and a C++ version (for deployment in embedded systems). Nevertheless,
due implementation differences regarding the component model in the C++ version
of PalVM, our full implementation currently only runs on the Java-based version
of PalVM. The Java-based version of PalVM uses objects to represent integers and
stack frames, making it much less efficient than the C++ version, and moreover
causing a significant part of the execution time to be dedicated to integer arithmetic.
On this version of PalVM, FLIF currently causes a significant overhead for
component loading, but does not impose an overhead at runtime, apart from insulation
methods. Preliminary experiments with FLIF for the C++ version of PalVM
reveal a minimal initialization overhead and again no runtime overhead apart from
insulation methods. Regarding memory usage, method definitions can be shared
when making aliases or when copying methods, so the only additional overhead is
the additional space consumed in the method tables. The space overhead is currently
36 bytes per alias in the C++ version of PalVM, but we expect that this
figure can be significantly reduced by improved implementation techniques.
A Smalltalk IDL compiler, a build utility for generating components by invoking
compilers for different languages, and a generic component IDL annotation facility
has been implemented in roughly 1300 lines of Common Lisp code. The Smalltalk
and BETA compilers have been implemented by other members of the PalCom
project, whereas the Java compilation system (described in the appendix) has been
developed by one of the authors of the paper.
6 RELATED WORK
CLR and JVM
The CLR is designed to provide a language-neutral environment, and relies on the
CLS specification to define a common subset for language interoperability [11, 3].
The CLR however has no language-independent notion of method names, which
means that e.g. Smalltalk selector names cannot be used from other languages without
introducing a coupling on the implementation language. Also, insulation methods
cannot be used directly with the CLS, since these methods only should be used
when invoking methods from a different language. Generating differently named
methods for language-internal calls and language-external calls would alleviate this
problem, but impede polymorphism and perhaps even require the programmer to
explicitly differentiate between language-internal and language-external code. Generating
methods with both naming conventions would provide the desired behavior,
but is essentially an ad-hoc approach to the more general approach presented in
this paper, except that we provide a general framework for runtime language interoperability
to reduce the burden of implementing a new, interoperable language.
Regarding dynamically typed languages, since the CLR is statically typed, it is illsuited
to representing dynamically typed languages. Moreover, statically compiled
languages with more advanced type systems are encumbered by redundant dynamic
type checks.
The SML language can be implemented efficiently and mapped to a CLS-compatible
compilation scheme, as shown with SML.NET by Benton et at [6]. SML structures
can be exported as classes and CLS-compatible classes can be imported as SML
structures, given certain restrictions on the external classes. In general, C# classes
are mapped to multiple SML entities, depending on how the features of the class
should be used. Exporting SML code causes glue code to be generated, and so
we can see this as an ad-hoc version of our more general approach, as discussed
in the previous paragraph. Naming issues are not explicitly addressed, since only
integration with C# is investigated as opposed to arbitrary languages running on
the CLR.
Jython and IronPython are implementations of the Python language running on
top of JVM and CLR, respectively [8, 1, 17]. Both systems achieve performance
comparable to the native Python virtual machine, and both systems allow Python
programs to act as consumers for Java/C# classes [12]. The techniques used in
implementing these systems are mostly unpublished, but basically consist of using
static and dynamic code generation to optimize for common cases with generic
(and highly inefficient) fall-backs for uncommon cases [2]. We speculate that both
systems benefit from running on highly optimized virtual machines with mature
dynamic compilers and efficient garbage collectors, features apparently not found
in the C implementation of Python. Moreover, in both cases Python cannot act as
a producer; the interpreter can be embedded into Java or C# programs, or in the case of Java, Python classes can be cross-compiled to Java code (but then they are
no longer Python classes).
Smalltalk has been ported both to the JVM and the CLR, but is currently only
available for CLR [16]. Smalltalk classes are compiled directly to CLR classes, and
every selector declared in a class of the system is installed as a method in the
Smalltalk object class, with a default behavior of “does not understand.” Thus,
new classes cannot be added dynamically. Smalltalk can act both as producer
and consumer, but using a naming convention which gives a high coupling to the
implementation language when Smalltalk acts as a producer, and with partly autogenerated
selector names when Smalltalk acts as a consumer.
Language integration
We support language integration at component granularity, similarly to standard
component models such as CORBA and COM . The concept of an IDL was pioneered
by CORBA with a focus on client-server systems, but CORBA objects differ significantly from standard objects, and CORBA moreover introduced a significant
overhead in terms of time and space. COM defines a binary representation of components,
and allows a tight integration with languages compatible with the COM
object model such as C++ and Visual Basic; we believe that our approach to language
interoperability would be useful with COM to provide the same advantages
as on PalVM.
The integration of languages with very different runtime environments has been
investigated by Mecklenburg [15]. This approach is similar to CORBA, using stub
methods and translation of objects when crossing language boundaries (although
proxy objects can also be used). A language-independent object definition language
is used, similarly to our use of IDL, but we can exploit the homogeneity of PalVM,
which makes our approach more lightweight. As an example, foreign objects can be
referenced directly instead of going through copies or proxies.
We use a simple black-box approach to unifying super and inner method combination,
by restricting inner to only be used within languages that use this method
combinator (in this case, BETA). Alternatively, super and inner can be used together
to provide more flexible method combination, as shown by Goldberg et al [10].
Here, inner dominates super so that super only dispatches to methods not already
included in an inner chain. This approach is the dual of our approach, however our
approach is much simpler and arguably more pragmatic from a language integration
point of view, since super is a better fit for most object-oriented languages.
The idea of using a framework to arbitrate method names at run time depending
on the specific languages being used has also been investigated by Barrett et al in the
context of polylingual systems [5]. Here, the PolySPINner tool determines, for a given
program written in multiple languages, the set of object types that are being used in
the program. Compatible object types are then matched across different languages,
and the implementation of each method is modified to consult the runtime language
arbiter at each invocation, to decide whether to perform data conversion. Thus,
PolySPINner can be said to rely on a call-by-value semantics (always converting
objects) whereas FLIF primarily relies on a call-by-reference semantics with an
option of performing value conversion between languages. Moreover, PolySPINner
has a significant runtime overhead compared to FLIF, but on the other hand does
not rely on a virtual machine.
7 CONCLUSION AND FUTURE WORK
Virtual machines for object-oriented languages provide numerous advantages in
terms of adaptability and robustness, but tend to be dedicated to a restricted object
model. In this paper, we demonstrate how a virtual machine can host languages
with different object models and yet still provide interoperability between these languages.
Language interoperability is achieved with only minor restrictions on how
each language is compiled to the platform and notably with only a minor implementation
effort for each language. We believe such flexible language interoperability
to be essential for embedded devices where sharing of implementation is critical to
reducing resource consumption.
In terms of future work, we are interested in allowing a tighter integration between
specific pairs of programming languages. For example, cross-language type
checking can often be eliminated between Java and BETA since both languages are
statically typed. Moreover, by using runtime reflection during interface linking to
inspect the exact types of BETA classes with covariant attributes, type checking
can be reduced even further e.g. for container classes (note that this optimization
is in general only possible at runtime during interface linking, since the type of a
covariant attribute only is fixed for a given pattern instance).
FOOTNOTES
1 It has been implemented in the MagicDraw tool from NoMagic, Inc.
2 We note that such a naming scheme is trivial to optimize with regards to memory consumption
in the virtual machine, given that the common prefix can be stored in the component annotations.
REFERENCES
[1] Jython home page. www.jython.org.
[2] Personal correspondence between a paper author and Dino Viehland on the
IronPython mailing list.
[3] Common Language Infrastructure, ECMA-335. ECMA International, 3 edition,
2005.
[4] Ken Arnold, James Gosling, and David Holmes. The Java Programming Language.
Addison Wesley, third edition, 2000.
[5] Daniel J. Barrett, Alan Kaplan, and Jack C. Wileden. Automated support
for seamless interoperability in polylingual software systems. In SIGSOFT ’96: Proceedings of the 4th ACM SIGSOFT symposium on Foundations of software
engineering, pages 147–155, New York, NY, USA, 1996. ACM Press.
[6] Nick Benton, Andrew Kennedy, and Claudio V. Russo. Adventures in interoperability:
the SML.NET experience. In PPDP ’04: Proceedings of the 6th
ACM SIGPLAN international conference on Principles and practice of declarative
programming, pages 215–226, New York, NY, USA, 2004. ACM Press.
[7] Gilad Bracha and David Griswold. Strongtalk: typechecking Smalltalk in a
production environment. In OOPSLA ’93: Proceedings of the eighth annual
conference on Object-oriented programming systems, languages, and applications,
pages 215–230, New York, NY, USA, 1993. ACM Press.
[8] Microsoft Corp. IronPython Workspace Home. www.gotdotnet.com.
[9] Adele Goldberg and David Robson. Smalltalk-80: The language and its Implementation.
Addison Wesley, 1983.
[10] David S. Goldberg, Robert Bruce Findler, and Matthew Flatt. Super and
inner: together at last! In OOPSLA ’04: Proceedings of the 19th annual ACM
SIGPLAN conference on Object-oriented programming, systems, languages, and
applications, pages 116–129, New York, NY, USA, 2004. ACM Press.
[11] Jennifer Hamilton. Language integration in the common language runtime. SIGPLAN Not., 38(2):19–28, 2003.
[12] J. Hugunin. IronPython: A fast Python implementation for .NET and Mono.
In PyCon 2004, Washington, D.C., March 2004. http://www.python.org/pycon/dc2004/papers/.
[13] Tim Lindholm and Frank Yellin. Java Virtual Machine Specification. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.
[14] Ole Lehrmann Madsen, Birger Møller-Pedersen, and Kristen Nygaard. Object-Oriented Programming in the BETA Language. ACM Press/Addison Wesley,
1993.
[15] R. W. Mecklenburg. Towards a Language Independent Object System. PhD
thesis, University of Utah, June 1991.
[16] The Refactory, Inc. #Smalltalk. http://www.refactory.com.
[17] G. van Rossum and F.L. Drake. The Python Language Reference Manual.
Network Theory Ltd, September 2003.
About the authors

|
|
Torbjörn Ekman is a Research Fellow in the the Programming
Tools Group at University of Oxford, UK. He received a PhD from
Lund University in 2006. His research interests include extensible
compilers, scriptable refactorings, domain-specific languages,
and aspect oriented programming. He can be reached at torbjorn@comlab.ox.ac.uk |

|
|
Peter Mechlenborg received a Master from the University of
Aarhus in 2006 and is now employed in the Danish company
Mu ApS. He is currently working in the area of highly scalable and
fault tolerant systems for the financial sector using Erlang, Common
Lisp and domain-specific languages. He can be reached at peter@mu.dk |

|
|
Ulrik Pagh Schultz is an Associate Professor in the Software Engineering
Group at the Maersk McKinney Moeller Institute, University
of Southern Denmark. He received a PhD from the University of
Rennes I in 2000. His research interests include software engineering
for modular robotics, program generation and transformation, and
domain-specific languages. He can be reached at ups@mmmi.sdu.dk
|
Cite this article as follows: Torbjörn Ekman, Peter Mechlenborg, Ulrik Pagh Schultz: "Flexible
Language Interoperability", in Journal of Object Technology, vol. 6, no. 8, September-October 2007, pages
95-116, http://www.jot.fm/issues/issue_2007_09/article2
|