The Tale of Java Performance

<!--#set var="title" value="Journal of Object Technology - The Tale of Java Performance, Osvaldo Pinali Doederlein" -->
<!--#include virtual="/include/wide_header.html" -->
<div align="center"> 
  <table border="0" cellpadding="0" cellspacing="2">
    <tr> 
      <td class="text"> <div align="right"> 
          <table border="0" cellpadding="0" cellspacing="2" width="179">
            <tr> 
              <td> <p class="text"><a href="../column2">Previous column</a></p></td>
              <td align="right"> <p class="text"><a href="../column4">Next
                     column</a></p></td>
            </tr>
          </table>
          <hr>
          <table border="0" cellpadding="0" cellspacing="2" width="100%">
            <tr> 
              <td valign="top"> <h1>The Tale of Java Performance</h1>
		<br>
                <p class="text"><strong>Osvaldo Pinali Doederlein</strong>, Visionnaire S/A, Brasil<br>
                </p></td>
              <td align="right" valign="top"><img src="/images/graph/line20h.gif" alt="space" width="20" height="5" border="0"></td>
              <td align="right" valign="top"><span class="toptitle">COLUMN</span><br>
                <br> 
		<a href="../column3.pdf"><img src="/images/logos/pdficonsmall.gif" alt="PDF Icon" width="22" height="24" border="0"><br>
                </a><span class="text_small">PDF Version</span></td>
            </tr>
          </table>
        </div>
        <p>&nbsp;</p>
        <h4>Abstract</h4>
        <p>The Java platform introduced Virtual Machines, JIT Compilers and Garbage
          Collectors to the masses and to mainstream software development. Demand
          and competition drove impressive improvements in the performance of
          Java implementations, and while the state of the art can be learned
          from JVM research papers and product benchmarks, we offer a &#8220;Making
          Of&#8221; exposing the challenges, tensions and strategies behind this
          history, extrapolating to similar platforms such as Microsoft .NET&#8217;s
        CLR.</p>
        <hr noshade width="80%" size="1"> 
        <h3>1	BRIEF HISTORY OF JAVA IMPLEMENTATIONS</h3>
        <p>Java, like most new languages, suffered from immature implementations
          and very weak performance in the early days. The unprecedented success
          of early Java, though, should teach us the first important lesson about
          performance &#8211; it&#8217;s not always critical. The Web was expanding
          like the Big Bang, the world demanded features like secure, portable,
          Internet-friendly code. Even at that time, (then-) interpreted languages
          like Visual Basic or PERL were highly popular. Moreover, implementations
          always improve and Moore&#8217;s Law is always doing its magic. When
          I first used Java (1.0-beta) my desktop PC was a Pentium 166MHz with
          64Mb RAM. Right now I&#8217;m working on a 1,6GHz Pentium-IV with 640Mb:
          coincidentally, an exact order of magnitude better in both speed and
        space (if we&#8217;re not too picky about hardware performance factors).</p>
        <p>Java
          was not born to handle tiny web page embellishments forever, so it
          had to evolve to support everything from smart card applets to scalable
          enterprise applications. Moore&#8217;s Law barely compensates the distance
          from the &#8220;Nervous Text&#8221; applet to any current J2EE server.
          More than size, change in applicability imposes more and more efficiency
          constraints. Java debuted as a &#8220;glue&#8221; language, and as
          a front-end language; in these cases, most time is spend waiting user
          input or invoking external (fast) code, like GUI toolkits or network
          stacks. As soon as the language becomes successful, it must be intrinsically
          fast, not only look fast in easy scenarios (like servers that don&#8217;t
          care for large loading time or runtime footprint). Successful languages
          are always forced into domains their creators didn&#8217;t dream of
          (and didn&#8217;t design for), and the initial trade-offs are either
          removed or replaced by new ones as implementers strive to make developers
          happy.</p>
        <h4>VM performance, By-the-Book</h4>
        <p>Portable code and Virtual Machines exist
          since the sixties, so by 1996 the field was already mature with superb
          pervious art in interpreters,
          Just In Time compilers, Garbage Collectors, and more general items
          such as threading or multiplatform frameworks. Java introduced at least
          one significant performance challenge, security; but in retrospect,
          the costs of security only appeared important due to the overall bad
          performance of early implementations. Right now, only in the J2ME space
          the security features cause concern and drive new optimizations like
          pre-verification. Therefore, most of the exciting history of JVM optimisation
          resembles a Hollywood-style remake of an old movie by a famous director
          with lots of money: the result is a blockbuster loved by the public,
          even if some self-defined elite prefers the original black-and-white
          production.</p>
        <p>Not all is remaking, though. The big news in Java history,
          of course, is becoming a mainstream platform. Even acknowledging that
          many successful
          real-world applications were built with precursor technologies, there
          is no comparison with Java, remarkably in the economics &#8211; very
          relevant if we consider that top performance costs huge investments
          from commercial implementers (even though most JVM performance tricks
          are rooted on academic research). Thanks to all previous work, the
          formula for the first wave of JVM improvements (the &#8220;JDK 1.1.x
          generation&#8221;) could be summarized as: implement the techniques
          that worked before for the other guys.</p>
        <ul>
          <li><strong>High Performance Interpretation.</strong> The classloader can perform
            a few easy optimisations as bytecode-to-bytecode transformations,
            like devirtualizing
              calls to final methods. An optimal interpreter is typically generated
              at loading time with the best Assembly code for the machine.</li>
          <li> 
            <strong>Direct References.</strong> The Classic VMs implemented references as indirect
            handles, making GC very simple but code slower. This was fixed in Sun&#8217;s
            EVM (aka &#8220;Solaris Production&#8221;), IBM JDK 1.1.6 and finally,
            Sun&#8217;s Java2 releases.</li>
          <li> <strong>Just-In-Time Compilation and Generational GC.</strong> The first
            generation of JIT compilers could do local optimisations and offered
            enough speed
              for applets. Late in the JDK1.1 cycle, the Sun EVM and the IBM
            JDK introduced stronger JITs and also the first decent GCs for Java.
            These
              were times of fast research and poor stability, so most JITs were
            disabled by default.</li>
          <li> <strong>Library Optimisations.</strong> Sun proceeded with many general
            enhancements during the 1.1 series, while Microsoft improved low-level
            libraries
              like the AWT to extract better performance in the Windows OS.</li>
        </ul>        
        <p>This set
          of improvements delivered a Java that was usable for many desktop applications
          and even some servers &#8211; mostly I/O-bound
          apps, like two-tier Servlet-based apps that are a thin bridge between
          a relational database and a web browser &#8211; provided that one didn&#8217;t
          need to serve more than a handful of simultaneous users.</p>
        <h4>Static Compilers</h4>
        <p>People wanted to build large applications with Java
          long before Sun delivered the required performance. Sun initially focused
          on the Solaris
          SPARC platform with the EVM, but if the plan was luring hordes of Windows
          users into Solaris, it certainly didn&#8217;t work. The high-profile,
          next-generation HotSpot project aimed high and would only bear fruit
          after JDK1.2, which created a big market opportunity for others. If
          the VMs couldn&#8217;t show competitive performance, the obvious solution
          was &#8220;generating native code&#8221; with conventional compilers
          that could employ expensive global optimizations just like any other
          language. The main strategy was fixing the inadequacies of the existing
          VMs:</p>
        <ol>
          <li><strong>Stressing global optimizations</strong> that benefit from closed-world
              analysis, whereby all code used by the application is fixed and
            known to the
              compiler.</li>
          <li><strong>Creating high-performance runtime support</strong> in areas where
              the JDK was deficient, like threads, networking and memory management
              / GC.</li>
        </ol>        <p>The business plan was good, but the market was hard: TowerJ (a
          pioneer and leading product) went broke; SuperCede / JOVE was just
          killed by
          Instantiations; IBM&#8217;s HPCJ is long retired (except for AS/400),
          and NaturalBridge&#8217;s BulletTrain technology should belong to somebody
          else by the time you read this. The only commercial vendor left is
          Excelsior with JET, while the Free Software GCJ silently improves.
          Some difficulties made the success of static Java compilers harder
          than in other languages:</p>
        <ul>
          <li><strong>Compliance.</strong> Platform-specific deployment is something unholy
            in Java; the certification rules enforce items like bytecode compatibility
            and
              fully dynamic classloading. The JDK sources could not be reused
            by non-licensed/certified products, so these companies had an option
            between
              clean-room rewriting of the libraries, or an uncomfortable dependency
              on the JRE (the license does not allow partial redistribution,
            and full compatibility with its native libraries is a pain).</li>
          <li> 
            <strong>Funding.</strong> Products were created by small companies that couldn&#8217;t
            compete with giants; JVMs improved faster and eventually regained leadership
            in most areas. IBM was the only big player to make a static compiler,
            but their commitment to J2EE probably killed HPCJ as much as the performance
            of its own JIT. See <a href="#gu00">[Gu00]</a> for a history of IBM JDK&#8217;s evolution.</li>
          <li> 
            <strong>Strategy.</strong> Static compilers excelled in the server side, but that was
            a bad long-term plan. The introduction of J2EE spelled bad news by
            stressing dynamic behaviour. Products tried to counter this with either
            (a) shared libraries, which requires dropping the closed-world optimizations
            that are a core sales pitch; or (b) a hybrid model, mixing native code
            with bytecode supported by an interpreter or JIT compiler &#8211; an
            apparently good idea that didn&#8217;t suffice to save TowerJ.</li>
          <li> 
            <strong>Recession + commoditization (free high-performance JVMs).</strong> The result
            is a small market, and some vendors practiced heavy pricing when they
            held a large performance advantage, which probably helped to kill them
            in the long run. But it&#8217;s possible that some vendors couldn&#8217;t
            afford to amortize R&amp;D costs much slower, not to mention giving
            products away.</li>
        </ul>        
        <p>Static compilers still have advantages: Native code is
          the best obfuscation you can find; runtime footprint is smaller as
          it doesn&#8217;t carry
          a sophisticated compiler; robustness tends to be superior, as compilers
          often have bugs themselves and static compiled code can be more thoroughly
          tested against these; deployment is often easier; loading is faster;
          and code performance is better for applications that are not affected
          by the weaknesses but do benefit from their strengths &#8211; roughly,
          non-dynamic apps that benefit from closed-world optimisations, and
          small apps for which loading time is important.</p>
        <p>There is a place for
          static Java compilers in the future, but ironically, just like the
          JVMs, their new sweet spot may be very different from
          the original plans. JET is now making strides to better support desktop
          GUIs (with Swing and SWT), and GCJ (plus Classpath and GTK) will likely
          become a good alternative for Linux-centric GUI applications and console
          tools. Additionally, there&#8217;s a trend of blurring the line between
          static and JIT compilation. JET, just like TowerJ before it, can mix
          static and JIT-compiled code to support dynamic loading, and conventional
          VMs with JIT compiler may cache generated code, a similar technique
          differing mostly in the deployment format and code management. Microsoft&#8217;s
          CLR uses JIT caching for the .NET platform, and in the Java domain,
          IBM developed this model in <a href="#serr00">[Serrano00]</a> and now JET offers it too.</p>
        <p>It&#8217;s
          worth noting that other languages enjoyed portable code decades before
          Java, but &#8220;portability&#8221; usually meant &#8220;many
          CPUs and OS&#8217;es&#8221;, not &#8220;many vendors and releases&#8221;.
          I wonder how successful could Smalltalk have been if VisualAge, VisualWorks,
          Dolphin and others could agree on standard bytecode and comprehensive
          frameworks. Looking at Java&#8217;s WORA, the network effect created
          by vendor independence certainly contributed more to its success than
          support for any non-Wintel architecture. It&#8217;s enlightening to
          see that users of more pure OO languages like Smalltalk, and more dynamic
          languages like LISP or Haskell, never have philosophical issues against
          static compilers, monolithic deployment or native interfaces. The strategy
          of WORA was important to bring Java to the point where it is today,
          but it&#8217;s likely that from now on the pragmatic deviations will
          only raise &#8211; the recent acceptance (by the community) of IBM&#8217;s
          SWT, a non-Pure replacement for AWT and Swing, is another important
          sign of this tendency.</p>
        <h4>Magic VMs</h4>
        <p>The advents of Java2 (JDK1.2.0+), Sun HotSpot and IBM JDK,
            raised Java to previously undreamed-of performance, and has caught
            many hackers
            by surprise. Even these days, it&#8217;s not very hard to find programmers
            that regard impossible for a language like Java to approach &#8220;native
            languages&#8221; in speed. In the other extreme, some Java advocates
            claim JVMs already beat C for anything, or will do soon. These are
            typically enthusiasts writing weak benchmarks and having little knowledge
            of the challenges still faced by Java. Between these extremes is
            the current status of Java, and we get there with a second batch
            of improvements,
            now bringing some innovations to the realm of VM-based OO systems:</p>
        <ul>
          <li><strong>State of the Art GC, Threading and I/O.</strong> JVMs improved on
            enhancements introduced by static compilers. Current JVMs offer mature
            implementations
              of everything from basic generational collectors to SMP-hot parallel
              and concurrent collectors <a href="#floo01">[Flood01]</a>. For threading, fast monitors,
              multiple threading models (thin, native, N-M mappings) and thread-local
              heaps. Only for I/O, Sun chose a novel approach with JDK1.3&#8217;s
              New I/O libraries, providing asynchronous I/O and faster native
              interfaces (the static compilers could not afford to create new
              APIs, so their
            approach for I/O was making the thread-per-connection cheaper).</li>
          <li> 
            <strong>Profile-driven optimisers.</strong> The hardest problem for JIT compilers is
              the need to work fast. Profile-driven VMs implement mixed-mode execution
              <a href="#ages00">[Agesen00]</a>, initially running bytecode (or a fast-JITted code) that&#8217;s
              instrumented for profiling. Only critical methods (&#8220;hot spots&#8221;)
              are compiled with full optimisation. The JIT can use expensive optimizations
              for these because only a fraction of all code is made of hot spots,
              but their optimisation delivers virtually the same performance as full
              optimisation. This is a &#8220;10%/90%&#8221; rule that, much like
              the Generational Hypothesis is suspicious when first heard but proves
            valid surprisingly often.</li>
          <li> 
            <strong>Speculative Optimisers.</strong> Java shares with other high-level OOPLs the
              costs of polymorphism and mandatory safety checks. Speculative optimisers
              find opportunities that depend on an optimistic (but not proven) assumption,
              e.g. &#8220;this polymorphic method is never overridden&#8221;, and
              optimise anyway. If the optimiser makes a bad bet, this is trapped
              either by a precondition inserted in the compiled code, or by the
              classloader. This forces the program to one of three solutions:
            <ul>
              <li>Fall back to
                  a &#8220;slow version&#8221; of the same code (a more
                  traditional technique known as <em>versioning</em>, often used for loop
                  optimisations); this requires a check before the optimised
                  code, e.g. ensuring that
                  a receiver is an instance of the predicted class (or a subclass
                that does not override the invoked method).</li>
              <li> Perform <em>patching</em>, overwriting
                  critical instructions to remove the optimisation. This often
                  requires stubs to make patching easier; for
                  example, a speculative devirtualization could be implemented
                  as a call to a &#8220;trampoline&#8221; that jumps to one specific
                  method, so the deoptimization is implemented by overwriting this
                  jump with
                  the
                address of another stub that does dynamic dispatch.</li>
              <li> Perform <em>deoptimization</em>,
                    dropping to interpreted/unoptimized code (at least until
                new optimisation can be done). This approach produces
                    better code than others when all goes well, because the compiled
                  method is simpler (without any checks, stubs or slow versions);
                  inlining is
                  more efficient.</li>
            </ul>
          </li>
          <li> 
            <strong>State of the Art, Conventional Optimisations.</strong> The former enhancements
              allowed JIT compilers to catch up with the harder optimizations. In
              the case of HotSpot, Sun apparently prioritised the next-generation
              optimisations over more conventional ones, so many of the easier optimisations
              were missing! In the first releases, HotSpot excelled in &#8220;heavily
              OO code&#8221; due to aggressive optimisation of method calls; in &#8220;low
              level code&#8221; scenarios though (integer and FP arithmetic, array
              manipulation or loops), HotSpot lagged until recently behind even the
            ancient Symantec JIT included in Sun&#8217;s own JDK1.1.8.</li>
        </ul>        
        <p>Profile-based
          and Speculative JITs like HotSpot and IBM JDK are often seen as the
          Holy Grail of Java performance. <a href="#holz94">[H&ouml;lzle94]</a> is the
          root of dynamic optimisation. We should then analyse the disadvantages.
          For profiling, the only problem is the need to run the code in &#8220;slow
          mode&#8221; until hotspots can be identified; this produces a relatively
          long warm-up period (loading time is short, but additional wait is
          required until cruise speed). Footprint is small, so this can be used
          by J2ME VMs like Sun&#8217;s recent CLDC HotSpot.</p>
        <p>Deoptimization is
          very expensive, so the technique depends a lot on profiling and some
          global analysis. It&#8217;s also very difficult
          to implement: if Thread A is halfway through an <em>m<sup>fast</sup>()</em> when Thread
          B does something that invalidates it (typically, classloading), as
          soon as any side effect can propagate across threads, A&#8217;s activation
          of <em>m<sup>fast</sup>()</em> cannot proceed safely. The JVM is forced to freeze all application
          threads; find these activations in their call stacks; and update stack
          frames so the execution continues in the equivalent point of <em>m<sup>slow</sup>()</em>.
          This On-Stack Replacement operation is difficult because <em>m<sup>slow</sup>()</em> and
          <em>m<sup>fast</sup>()</em> may have different control and data structure, so the compiler
          emits metadata describing this mapping at each call site. This metadata
          occupies memory, so OSR dependant optimisations cannot be employed
          too generously; one of the space-saving distinctions between the Server
          and Client editions of HotSpot is that Client lacks OSR.</p>
        <p>In addition
          to deoptimization, HotSpot supports the inverse transition: interpreted
          activations can be patched to continue in the middle of
          the generated native code, so when <em>m<sup>slow</sup>()</em> is optimised, execution
          jumps immediately to <em>m<sup>fast</sup>()</em> and the benefit doesn&#8217;t need to
          wait for the next entry to the method. This is only relevant for very
          big, monolithic methods, present mostly in badly designed code and
          microbenchmarks.</p>
        <p>Aggressive devirtualization and inlining <a href="#detl99">[Detlefs99]</a>,
          as implemented by HotSpot, is both seductive and debatable. Some benchmarks
          show huge
          improvement; best-case scenario is when (a) the speed of a critical
          loop depends on inlining of a polymorphic method, and (b) multiple
          overrides of this method block more conservative techniques, and (c)
          the critical code always calls the same override so speculative inlining
          will work. The sceptics will say that the overhead of deoptimization
          is bigger than the benefits in the general case, mostly because any
          well-written code will avoid polymorphism inside critical loops (the
          first optimisation lesson one learns with OO programming). On the other
          hand, an increasing rate of poorly-optimised source code is produced
          by increased time pressure and use of reusable components tuned for
          flexibility (e.g., if your critical loop uses collections like <span class="code2">java.util.List</span>,
          it&#8217;s hard to avoid polymorphic calls).</p>
        <p>The impressive evolution
          of Java created a tendency of justifying any performance limitation
          as: &#8220;Sun just didn&#8217;t optimise this
          yet&#8221;. The priority given to next-generation tricks in HotSpot,
          in detriment of conventional optimisations, created a perception that
          any missing optimisations are only missing due to not being critical
          to Sun&#8217;s strategy (e.g., J2EE apps), but can be added anytime
          if needed. If they did the hardest stuff like speculative inlining,
          they can obviously do all simpler optimisations, right? This thinking
          seems to be validated as only the latest Sun releases show good performance
          for many low-level benchmarks, because only in the 1.4.x series Sun
          added critical optimisations for loop unrolling, array checks elimination,
          and FP. The optimist expects Java to become fast in everything without
          major spec changes.</p>
        <p>Moore&#8217;s Law helps Java as more time-consuming
          optimisations can be supported by JIT compilers every year, without
          increasing loading
          time. The IBM JDK implemented most low-level optimisations first and
          its performance has been stable for some time, so it seems that we&#8217;re
          pretty close to the upper limits imposed by Java&#8217;s current specs.
          Most further improvements should bring relatively small improvements,
          or big improvements for a relatively small niche that depend a lot
          on a very specific peephole optimisation.</p>        <h4>Server versus Client</h4>
        <p>Advanced compilers always carry a cost in footprint
          or compilation time; this is true for JITs as well as in traditional
          ahead-of-time
          compilers. Advanced runtime elements like GC will also perform some
          tradeoffs. The first releases of Sun HotSpot delivered much better
          performance, provided that one could put up with the extra memory usage
          and loading time. The IBM JDK is another speed champion, but has an
          even bigger footprint. Add this to the new, heavyweight Swing libraries,
          and desktop Java virtually died while the Java industry shifted gears
          towards the server-side. Sun reacted by splitting HotSpot in two editions.
          The Client VM (default option) omits the most expensive optimisations
          and favours flat profile code (compiles more methods, less aggressively);
          the Server VM <a href="#pale01">[Paleczny01]</a> (&#8220;<span class="code2">-server</span>&#8221;)
          does all optimisations and favors &#8220;spiky&#8221; profiles. The
          latest releases of the IBM JDK adopted a similar strategy; a new switch
          (&#8220;<span class="code2">-Xquickstart</span>&#8221;)
          tunes the VM for better startup and responsiveness. IBM implemented
          this just in time for its own first significant Java desktop app, Eclipse.</p>
        <h4>Lightweight
          Java</h4>
        <p>All discussion so far is specific to J2SE VMs, where fast CPUs
          and large RAM support sophisticated runtimes and speed/space tradeoffs.
          Life is tougher below J2SE. Sun defines subsets for different domains
          and capacities; this started with PersonalJava and JavaCard, but Sun
          later opted for a more structured specification. The result is J2ME,
          now the main architecture for light Java. J2ME <a href="#greh02">[Grehan02]</a> has a growing
          number of &#8220;configurations&#8221; (base VM, language and APIs
          for a class of hardware), &#8220;profiles&#8221; (high-level, horizontal
          APIs like GUI), and &#8220;optional packages&#8221; (vertical, or less
          fundamental APIs, like wireless networking). Each combination of configuration
          + profile defines a stable platform (optional packages are installable
          on demand), so devices can support a quasi best-fit configuration,
          avoiding the proliferation of vendor-specific platforms that plagues
          embeddable OSes and is a real problem with installable or mobile applications.</p>
        <p>The
          higher-level specs, like PersonalJava, are generous subsets of J2SE,
          which means enough functionality that the hardware must be in
          the high-end of the sub-PC realm, and the same basic techniques can
          be used. A PersonalJava VM may not afford a state-of-the-art JIT compiler,
          but it will certainly afford a conventional JIT. In the micro extreme,
          JavaCard (a tiny platform for smart cards) preserves little beyond
          the language syntax: no floating point or 64-bit types; no multithreading,
          classloading or (optionally) GC &#8211; objects should be preallocated,
          and limited support for the otherwise-heretic manual memory management
          is provided. Execution is of course interpreted.</p>
        <p>It&#8217;s safe to
          state that the reason for the success of these platforms was Sun&#8217;s
          pragmatic decision to not enforce WORA beyond physical barriers; the
          standards cut not only on system libraries, but also
          in core language and architectural features. There are no revolutions
          for performance; Sun adopted the usual strategies in embedded systems:</p>
        <ul>
          <li><strong>Cut as much features as possible.</strong> Some APIs are completely
            dropped, others are simplified (e.g., convenient method overrides
            that only
              provide default values for some parameters are removed, leaving
            only the base method).</li>
          <li> <strong>Allow fine-grained tuning.</strong> Embedded JVMs allow device designers
            to fine-tune internal VM settings and options, like memory quotas
            for
              JIT compilation. </li>
          <li><strong> Provide specialised libraries for performance-critical
            tasks.</strong> For example, J2ME offers multimedia and game-oriented APIs
            that are cut to fit the
              needs and resources of small devices. These APIs offer good performance
              and footprint as their implementation will usually be a thin wrapper
              over system APIs.</li>
          <li> <strong>Precompute things in development time.</strong> Java bytecode needs
            to be verified at loading time, but the standard verifier is relatively
            big and slow
              for the lesser devices. J2ME apps use a pre-verification tool that
              augments the bytecode with annotations, so the VM can use a much
            smaller and faster verifier. (This also benefits J2SE, so JDK1.5
            will add the
              feature to standard JVMs <a href="#jsr202">[JSR202]</a>.) Similar techniques can help
            JIT compiler optimisations <a href="#azev99">[Azevedo99]</a>, so the harder phase of optimisation
              can be precomputed too; for instance, the pre-optimiser detects
            that
              some array access is always valid (a potentially expensive analysis),
              then it annotates the bytecode with metadata describing this so
            the JIT compiler uses this information to generate better code very
            easily.</li>
        </ul>        <p>The
            latter optimisation is tricky in Java due to its security architecture:
          annotations (for security, optimisation or any other purpose) cannot
          be trusted more than bytecodes are, so the annotations themselves must
          be validated and this should require less effort than the expensive
          analysis that they encode. The most innovative aspect of the pre-verification
          (and possibly, pre-optimisation in future JVMs) lies in the easily-validated
          annotations.</p>
        <p>The first embedded JVMs (like Sun KVM) were interpreted,
          but JIT compilers followed, and in addition to the design optimisations
          above, vendors
          added a host of implementation optimisations:</p>
        <ul>
          <li> 
            <strong>Bytecode optimisation.</strong> Post-procesors can use static analysis to reduce
            bytecode size with simple optimisations (like removing unused methods)
            and replacement of symbolic data with shorter identifiers (e.g., <span class="code2">myClassFieldWithBigName</span>            
			becomes <span class="code2">m0</span>). These optimisations are available for J2SE programs, typically
            implemented by obfuscators, but rarely used as the performance benefit
            is not worth the trouble with good JITs. For J2ME though, it&#8217;s
            a standard practice.</li>
          <li> <strong>Static compilation.</strong> Native code may be deployed to the
            device.</li>
          <li> <strong>JIT compilers.</strong> In their J2ME incarnations, JIT compilers
            have to drop expensive optimisations and be very well-behaved concerning
            resource
              usage. Profile-based optimisation of hot-spot methods is implemented
              by some products.</li>
          <li> <strong>Relaxed threading.</strong> Some VMs implement cooperative threading,
            which saves footprint if the underlying OS does not provide preemptive
            threads.</li>
          <li><strong> Cooperative resource management.</strong> The JIT compiler typically
            shares memory with the application; i.e., temporary compilation data
            is allocated
              from the Java heap. Both JIT and GC try to work when the application
              is idle or using little resources, and ideally both can be preempted
              by application threads.</li>
          <li> 
            <strong>Careful heap configuration.</strong> Under Java&#8217;s dynamic classloading,
            bytecode, native code and JIT metadata may need to be produced (or
            garbage-collected) at any time. VMs like Sun&#8217;s CLDC HotSpot
            implement homogeneous storage: a single heap for application objects,
            code and
            VM overheads; this reduces slack and simplifies all memory management
            (the catch: all these elements, including native code, must be relocatable
            to allow compaction). Other VMs offer tools to precompute the space
            required by each area, adding this data to deployment files, but
            this is less flexible, just like precompiled native code.</li>
          <li> 
            <strong>Mangling the JVM and OS.</strong> Some products (like SavaJe, Symbian and JBed)
            blur the line between the operating system and JVM. OS architecture
            and services are tailored to the needs of the JVM, and the JVM implementation
            benefits from direct, low-level access to system services. These products
            realise the vision of a &#8220;Java OS&#8221; in environments where
            this concept makes sense.</li>
        </ul>        
        <p>Small-footprint platforms present challenges
          to Java implementations, but there are good tradeoffs and in many cases,
          the need for maximum
          speed is restricted by the application model. In consumer-oriented
          devices like PDAs and cell phones, applications are usually bound to
          user input, networking and other OS services, so the application code
          (written in Java or not) is acceptable with modest performance. This
          is, not coincidentally, the same rationale that helped Java&#8217;s
          early releases &#8211; typical J2ME apps are not very different (in
          size, complexity and reliance on external resources) than the old browser
          applets.</p>
        <p>Another important relief comes with comparison to native applications:
          these are not very fast either, due to the physical limitations of
          the hardware. Any optimisation that trades space for speed is restricted
          to all languages; you can&#8217;t have it on your J2ME JIT compiler
          but you can&#8217;t have it on your embedded C compiler either, as
          in both cases, the resulting code bloat is not acceptable. Examples
          of such optimisations are: inlining; intrinsic functions; code specialisation
          (like in C++ template expansion); loop unrolling, and memory alignment.
          The net effect is that native applications are not such a high reference
          score to compare against. Finally, whatever the language, developers
          of applications for resource-constrained systems are long used to programming
          tradeoffs &#8211; sacrifices of abstraction and clarity are more acceptable.
          For example, public variables and non-polymorphic (<span class="code2">final</span>) methods,
          while not recommended in the J2SE domain and heretic in large-scale
          J2EE systems, are perfectly acceptable in the J2ME code. When some
          critical code really needs a space/speed trade-off, programmers typically
          do it manually, e.g. by unrolling loops in the source, instead of relying
          on the compiler for this. In theory, this shows an advantage of profile-driven
          JITs: these could apply space-expensive optimisations, as the bloat
          only affects a small number of hotspot methods.</p>
        <h4>RealTime Java</h4>
        <p>RealTime is the Last Frontier. <a href="#jsr1">[JSR1]</a> (the effort that
          started the Java Community Process) delivered a ~300pp specification
          after more
          than three years of work, and only now the first compliant implementations
          are surfacing, so it&#8217;s early to state the success of Java with
          real-time applications, in particular hard-RT.</p>
        <p>RealTime-ness is more
          a matter of predictability and control than raw speed. Java&#8217;s
          major sources of unpredictability are GC, JIT compilation and concurrency,
          while the threading and synchronization do not suffice
          the needs of many RT apps. The RTJ specification fixes these problems
          with big updates to the JVM architecture.</p>
        <p>The object heap can be arranged
          by the application into a hierarchy of areas with different management
          policies (standard garbage-collected
          Heap, plus Immortal and Scoped areas that need no GC). The application
          creates and &#8220;enters&#8221; areas, so the new statement will automatically
          allocate from the current area (this looks like C++ allocators, with
          additional facilities and rules). RealTime Threads support detailed
          scheduling options, and the most critical threads may run with higher
          priority than the GC (the trade-off is no access to garbage-collectable
          objects). There are other extensions for timers, synchronization, signal
          handling and asynchronous events.</p>
        <p>This specification is orthogonal to
          Java editions (can be supported by J2SE or J2ME VMs), but RT systems
          are often embedded systems with
          tight hardware resources, so the full set of JSR1 features may be a
          significant footprint for the tiniest platforms. Another challenge
          is (again) security: JSR1 mandates that references between objects
          do not violate certain rules, so the use of memory regions cannot lead
          to memory corruption. For example, an object stored in Scoped memory
          cannot refer to objects from inner scopes, and Heap or Immortal objects
          cannot refer to Scoped objects. Without such restrictions, the single-shot
          destruction of Scoped areas might produce dangling references. All
          reference assignments must be checked and throw an exception if these
          rules are violated.</p>
        <p>Many solutions developed for embedded Java could
          be applied to RealTime Java, even if the platform used by the RT application
          is not low-powered
          and could afford a full J2SE implementation plus JSR1: even if hardware
          resources support expensive on-demand optimisations, the application&#8217;s
          RT restrictions are easier to satisfy if operations of unbounded cost
          can be offloaded to development time. For example, escape analysis
          could remove some runtime checks for reference assignments; this is
          an expensive optimisation so we could do it in development time and
          employ verifiable annotations.</p>
        <p>The final trick is releasing the safety
          belt. In the embedded / RT domains (especially the latter), error recovery
          is often very limited
          (there&#8217;s no big difference between a NullPointerException and
          a reboot; the latter is often better because it&#8217;s sure to restore
          system sanity and it&#8217;s typically very fast in embedded HW). Developers
          may choose to spend extra time on testing, and then disable all runtime
          checks for production code. In any event, it&#8217;s not worse than
          using languages like C where safety is not even an option.</p>
        <h3>2	PERFORMANCE
          TENSIONS AND CHALLENGES</h3>
        <p>The impressive advance of JVM performance should
          not be interpreted as a statement that all remaining issues can be
          fixed with better implementations.
          There are difficult problems with no perfect solution, and the tradeoffs
          made by existing high-performance JVMs are evidence of the common sense
          that Brook&#8217;s &#8220;free lunch&#8221; is not available here either.
          It&#8217;s useful to assess how far we can go with better implementations
          of the same design, and which problems need fixes in the specification
          (language, bytecode or JVM architecture) if we consider critical to
          have maximum performance for everything.</p>
        <h4>The economics of language efficiency</h4>
        <p>Java is a general-purpose platform,
          so vendors and users wish to apply Java to every kind of software.
          Fitting very different domains is not
          impossible, but typically requires a large flexibility. The most successful
          language in terms of general-purposeness are C and C++, but these languages
          come with high costs for the developer. Low-level language features
          and libraries provide &#8220;cheap&#8221; efficiency for many programs,
          but the cost is the harder task of creating robust, evolvable, large-scale
          applications. Hybrid combination of low-level and high-level features,
          like in C++, cannot approach the benefits of pure high-level languages.
          Low-level languages can do anything because in truth, it&#8217;s the
          programmer who does many difficult things by dealing with unsafe or
          complex features.</p>
        <p>This is not necessarily evil. If all these features
          make C++ (say) 5X harder than Java, it&#8217;s still a better choice
          when a 5X increase in development &amp; maintenance cost is a good
          trade-off for C++&#8217;s
          advantages. The software market is usually split into three divisions:</p>
        <ul>
          <li><strong> Corporate.</strong> Most developers write business applications deployed to
            a single company or a handful at best. Development is a large share
            of the system&#8217;s total cost of ownership, because it is not
            diluted through many deployments. Even modest productivity enhancements
            translate
            to significant gains in TCO.</li>
          <li><strong> Shrink-wrapped.</strong> A successful product sells thousands of
            copies. Unless the market is very competitive and the margins too
            tight (e.g., text
              editors), or the product is a mammoth requiring armies of developers
              (e.g., RDB servers), chances are that a 5X decrease in development
              costs will be no good if the resulting code is 30% slower and this
              makes you loose 5% of your customers.</li>
          <li><strong> Embedded.</strong> These applications put all previous categories
            to shame in raw volumes. Unitary costs are critical, but the cost
            model is similar
              to shrink-wrapped, typically diluted by large deployment counts.
            Another significant factor is that code size and complexity is restricted
            by
              the small capacity of most platforms: embedded systems rarely suffer
              from featuritis. Developers often use low-level languages, even
            down to Assembly, as abstraction and reuse are less seductive when
            the complexity
              to manage is small. On the other hand, robustness is often a much
            higher concern (crashing a huge e-commerce server is bad, but crashing
            a pacemaker
              or brake controller is simply not an option).</li>
        </ul>        <p>Java is attempting to
            please all these scenarios, but it&#8217;s obviously
          more suited to the first, so it&#8217;s no surprise that Java&#8217;s
          greatest success lies in the &#8220;enterprise&#8221;. J2ME is becoming
          successful in the embedded scenario, but this happens because the J2ME
          specification makes the necessary tradeoffs and because this market
          is changing &#8211; many so-called &#8220;micro&#8221; devices, like
          last generation cell phones and PDAs, offer more horse-power than last
          decade&#8217;s top-of-line PCs. In the higher-end consumer devices,
          full J2SE support is soon becoming the standard, demanding custom APIs
          and programming only for device-specific functionality. The lower-end
          J2ME specs should move to &#8220;invisible&#8221; devices (your electric
          razor etc.) which have no flashy GUIs, so consumers cannot be convinced
          to pay for the additional hertz, bytes, pixels, sound channels etc.</p>
        <h4>Costs
          and Limitations</h4>
        <p>Java is a high-level, object-oriented, garbage-collected
          language, etc., and there is a cost behind most of these nice characteristics.
          This cost can often be removed automatically by advanced compilers
          and runtimes, but most often, the required optimisations impose tradeoffs
          that don&#8217;t exist in low-level languages like C. On the other
          hand, the advanced runtimes often offer bonus advantages that beat
          most low-level code. Two examples:</p>
        <ul>
          <li><strong>GC vs. Manual Memory:</strong> Sophisticated Garbage Collectors have
            bigger memory footprint due to more complex heap organisation and
            semi-spaces
            for copying. Dealing with pauses and unpredictability produces additional
            overhead. On the other hand, compaction allows trivial (stack-like)
            allocation, and some collectors (esp. the &#8220;Train&#8221;) tend
            to keep related objects together, improving performance of some applications
            that depends heavily on cache efficiency. A competent C programmer
            would optimise the memory layout of data structures manually and
            achieve similar benefits, but with a big cost in development effort.</li>
          <li><strong> JIT vs. Static compilers:</strong> JITs start in disadvantage as they must consume
            fewer resources; on the other hand, JITs benefit from system-specific
            compilation. Code generation can be tuned to the system&#8217;s configuration,
            as the generated code will never be deployed elsewhere. For example,
            the latest x86 VMs from Sun and IBM will automatically exploit Intel&#8217;s
            SIMD instructions (MMX/SSE/SSE2) on machines that support these (Pentium-IV
            and better will support all). This single factor doubles the score
            of FP-intensive benchmarks like JGF. Using the same VMs and bytecode,
            in lesser CPUs, the JIT will use the slower x87 instructions. In
            comparison, static compiled apps are typically limited to the least
            common denominator.
            Compilers often allow optimising for a CPU level (e.g., Pentium-IV)
            while keeping compatibility with lesser chips (e.g. Pentium); this
            supports architectural optimisations like instruction scheduling
            but not those depending on new instructions, and the optimised code
            may
            hurt performance in the inferior chips. Applications can compile
            the same code for multiple configurations and select the best code
            dynamically
            (by calling different procedures or dynamic libraries); in practice,
            only a handful of applications go through the trouble.</li>
        </ul>        <p>These comparisons
          are difficult and the most definitive and honest conclusion of most
          benchmarking studies, like <a href="#doed">[Doederlein]</a>, is the
          politically correct &#8220;test your own code&#8221;. Nevertheless,
          we&#8217;ll look at some aspects of Java&#8217;s design that so far,
          imply in undisputed performance disadvantages, meaning that the only
          possible fixes depend on specification changes. These issues are very
          important because Sun is very conservative in the advancement of the
          Java platform: the language <a href="#gosl00">[Gosling00]</a> and VM 
		  <a href="#lind96">[Lindholm96]</a> specifications
          are mostly cast in stone and received only minor fixes and enhancements
          since the earliest releases (the intense growth of Java&#8217;s complexity
          and performance has been mostly dependant on new APIs and better implementations
          of the same specs).</p>
        <ul>
          <li><strong>Typesystem.</strong> Java offers a limited set of primitive types
            compared to most languages of its family: there are no unsigned integrals,
            no structured
            types other than classes (structs or enums), no type aliasing (typedefs),
            and no references for valuetypes. On the plus side, the typesystem
            and syntax are kept simple. The bad news is that developers are often
            forced into awkward programming styles that have some cost (look
            no further than classes emulating enumerations, or the CORBA mapping&#8217;s
            infamous &#8220;Holder&#8221; classes mapping &#8220;out&#8221; parameters).
            JDK 1.5 <a href="#jsr176">[JSR176]</a> will add enums and library-based unsigned arithmetic.</li>
          <li><strong> Lightweight Objects.</strong> This includes several related items: user-defined
            valuetypes <a href="#baum98">[B&auml;umer98]</a>, headerless objects (structs); by-value
            containment of object fields and array items; by-value object parameters
            and returns; <span class="code2">equals()</span> mapped to &#8216;==&#8217;. The idea is to avoid
            the one-size-fits-all object layout; this can be done without compromising
            object pureness (see Eiffel&#8217;s &#8220;expanded&#8221; classes)
            but not without adding complexity to the language and VM. The benefits
            are obvious for some niches, e.g. in numerical computing, we really
            need to encode some user-defined types like Complex as efficiently
            as possible. But even conventional apps could benefit from better implementation
            of core libraries (e.g., Java2D &amp; Swing could use LWOs for performance-critical
            geometry and painting objects). Faster and simpler native interface
            is another boon to general applications: Java code often cannot produce
            data structures with the exact layout required by native libraries,
            so this interface needs wrapper native code that serves only to convert
            between graphs of Java objects (like a Rectangle containing two references
            for Point) and low-level C data structures (like the straight layout
            [x1,y1,x2,y2] produced by headerless, by-value Rectangle and Point
            structs). </li>
          <li> 
            <strong>Multidimensional Arrays.</strong> Java offers only one-dimension arrays; arrays
            of arrays emulate multidimensional arrays. This severely impacts the
            performance of numerical applications that depend on multidimensional
            arrays, unless some very ugly coding style is adopted, e.g. a <span class="code2">Matrix3x3</span>            
			class with nine <span class="code2">float</span> fields could emulate a <span class="code2">float[3][3]</span> with better
            performance. IBM developed a library-based solution that would fix
            this and implement additional FORTRAN-class trickery <a href="#jsr83">[JSR83]</a> but this
            effort doesn&#8217;t seem to be moving. See also <a href="#more99">[Moreira99]</a> for an
            account of IBM&#8217;s research to make Java a top platform for numerical
            computing; unfortunately, much of these improvements are still far
            from being accepted into the standard Java language and VM specs.</li>
          <li> 
            <strong>VM and Language Specifications.</strong> The Java Memory Model (JMM) is too
            stringent in an attempt to prevent concurrency errors, and makes some
            important code reordering optimisations unsafe for multiprocessors
            <a href="#pugh99">[Pugh99]</a>. The revisions that will be adopted by JDK1.5 should relax
            excessive constraints (properly synchronized code not affected). In
            the JLS there are other anti-optimiser: in <span class="code2">for
            (int i = 0; i &lt;=
            array.length; ++i) array[i] = 1</span> which contains an ArrayBounds error
            (at <span class="code2">array.length</span>), the exception must be thrown exactly where the
            error happens, preserving any previous side effects. This makes array
            bounds
            elimination harder than in other languages: in nontrivial cases,
            compilers can use versioning, taking the fast track (exception-free)
            if all indexing
            can be checked at the loop prologue; otherwise, it goes the slow
            track (fully checked). This creates bigger methods, which impacts
            optimizations.
            <a href="#ishi02">[Ishizaki02]</a> seems to present the best solution, but it demands a
            strong optimising compiler and hardware support, both beyond the
            specs of
            small platforms.</li>
          <li><strong> Insufficient control.</strong> Current JVMs offers many start-up tuning switches,
            but the offer of programmable, runtime options is too scarce. The <span class="code2">System.gc()</span>            
			call is comical: the only runtime control you have over GC, and it&#8217;s
            most often ignored! Examples of additional control that would still
            be compatible with Java&#8217;s safety:
            <ul>
              <li><em>Optimisation hints.</em> In addition
                to machine-generated annotations, programmer could add hints
                that are not easy to discover automatically,
                like &#8220;this object is never shared with other threads&#8221;.
                This could be specified by javadoc tags and encoded as bytecode
                annotations. These optimisations are often implemented as language
                keywords (like
                C++&#8217;s <span class="code2">inline</span> or <span class="code2">const</span>), but this complicates the syntax,
                typesystem, and linkage&#8230; with javadoc tags and annotations,
                the source is kept simple, and the bytecode is kept simple and
                compatible with VMs
                that don&#8217;t implement some or all of the annotation-driven
                optimisations.</li>
              <li>                <em>Memory Allocation.</em> In the CLR, unsafe code can explicitly pin
                objects against relocation; in RTJ, objects can be explicitly
                allocated
                in different kinds of heap; in JDK1.4&#8217;s DirectBuffers,
                Java code can allocate raw data from the non-Java heap. These
                are safe
                techniques
                  that can boost performance in some important cases. It would
                be nice if we could further explore these ideas, within the limits
                of
                safety.
                  For example, explicitly declaring at allocation time that an
                object is long-lived or heavily used by native code, could help
                the memory
                  manager to apply tricks that reduce GC and JNI costs.</li>
            </ul>
          </li>
        </ul>        <p>These are probably
          the top items; many performance-hungry Java developers will easily
          add a host of new items from their personal soapboxes.
          The most interesting point is that many critical improvements (like
          optimisation hints or library-based multidimensional arrays) can be
          added with small impact. Even if in some cases the result is less elegant
          code than an ideal solution (like language-based multidimensional arrays),
          the tradeoff is very good, especially considering Java&#8217;s culture
          of considering most syntax sugar evil.</p>
        <h4>Microsoft .NET</h4>
        <p>This story would not be complete without a look at
          Microsoft.NET. Many environments share aspects of Java, but .NET can
          be directly compared
          to Java in design and relevance. The .NET VM (Common Language Runtime),
          and its primary language C#, are often described in terms of Java: &#8220;proprietary
          Java&#8221; for some people, &#8220;Java done right&#8221; for others,
          the matching is clear. Microsoft benefited from the Java experience,
          so they could improve in some areas <a href="#grun02">[Gruntz02]</a>. It&#8217;s tempting
          to analyse the differences; where one architecture is superior; if
          the superiority is due to design or implementation; and if the competitor
          could (and should) catch up. We could identify some core performance
          items:</p>
        <ul>
          <li><strong>Richer typesystem.</strong> Includes a more complete set of primitives,
            non-OO structured types, and user-defined valuetypes (lightweight
            objects).</li>
          <li> 
            <strong>Full support for unsafe code.</strong> The CLR implements many features not
            covered by .NET&#8217;s language-neutral Common Type System, like
            raw pointers, manual memory allocation and unchecked operations.
            These
            features are a quick-and-dirty path to maximum performance, subject
            to abuse by weak programmers but a better choice than JNI when managed
            code cannot do the task.</li>
          <li> <strong>OS Integration.</strong> The CLR benefits from tight integration
            to Windows. Many framework classes can be implemented as thin layers
            over native
              APIs; one could port the same libraries to other OSes (like the
            open-source Mono project is doing), but the Windows implementation
            will usually
              be simpler and more efficient. The interfaces to non-CLR code and
            COM benefit too from Windows-centric design.</li>
          <li> 
            <strong>JIT Compilation.</strong> The JIT compiles all methods at first call. Loading
            time is potentially much worse than JVMs, so the CLR compensates with
            two strategies: the JIT disables very expensive optimisations, and
            large applications or libraries can use an install-time code generator
            (NGEN) to minimise JIT compilation. NGEN calls the same compiler as
            the JIT, but ahead-of-time, and enabling all optimisations, and stores
            code in the GAC (Global Assembly Cache). The CLR uses Microsoft&#8217;s
            mature compiler back-end, but it doesn&#8217;t seem to add much else.</li>
          <li> <strong>Memory Management.</strong> The CLR offers a three-generation collector
            that can run in stop-the-world or concurrent mode. The vast number
            of GC
              algorithms and tuning options found in JVMs is not available; this
              may change as .NET evolves, but the CLR makes easy for developers
            to use manual (unsafe) allocation, and the support for valuetypes
            result
              in smaller heap demands even for safe code.</li>
          <li><strong> Language Support.</strong> Central to .NET is the support for many languages.
            This feature is not perfect: mainstream OOPLs like C# are better supported
            than others, but it&#8217;s better than the JVM. This creates performance
            opportunities, especially with managed languages producing executable
            code that&#8217;s trusted to run in tight integration with system
            services and middlewares. If a single VM supports all code, the IPC
            barriers
            can be greatly reduced. On the other hand, J2EE products have made
            considerable success in implementing everything you need in Java
            and convincing developers to write everything in Java, so typical
            Java
            deployments already enjoy the advantages of safe, monolithic integration.</li>
        </ul>        
        <p>The
          .NET platform adds some performance-friendly designs (although it&#8217;s
          still difficult to collect significant data about their effectiveness
          versus JVMs for real-world apps). A lot of attention
          is drawn to the typesystem and unsafe code, which stand out in .NET.
          Heated discussions are common, because these items trade performance
          for OO pureness or safety. Some of these features are silently finding
          their way to Java (JDK1.4&#8217;s DirectBuffers add restricted unsafe
          memory access).</p>
        <p>System integration helps performance
          in several ways: lightweight frameworks; privileged OS access; fast
          native interface; modifying the OS to benefit
          the VM. This raises more safety issues, as Microsoft tends to move
          critical features to lower layers of Windows. The competition will
          be fairer in sub-PC devices, or OSes produced by Java backers, where
          Java can also be implemented in privileged conditions. System-integrated
          JVMs incur the same risks as the CLR, but Java applications are less
          likely to include any unsafe code so absence of critical JVM bugs is
          enough to keep system sanity.</p>
        <p>In some cases, .NET is in advantage as
          the richer architecture imposes fewer implementation problems, so .NET
          may compete with a simpler VM.
          The best example is memory management: GC for all objects is a challenge,
          but with .NET&#8217;s typesystem more data can be stack-allocated trivially,
          and many remaining heap objects carry less header and indirection overheads.
          If less garbage is produced, the system may use a straightforward GC
          and heap organisation, reducing speed and space costs. On the other
          hand, because of Java&#8217;s additional challenges, JVMs reached a
          sophistication that pays in the higher end &#8211; it&#8217;s not likely
          that the CLR&#8217;s current memory manager can scale to multi-Gb SMP
          servers as well as Java (papers like <a href="#detl02">[Detlefs02]</a> are an indication
          of Java&#8217;s state of the art). It&#8217;s good to have super advanced
          VMs, but not good to depend on them.</p>
        <p>In the &#8220;mixed blessing&#8221; category
          we find the bytecode and execution strategy. MSIL, the .NET bytecode,
          is dynamic-typed (e.g.
          a single add opcode instead of <span class="code2">iadd</span> for ints etc., like Java), while
          good for the typesystem and generics, makes a trivial interpreter less
          efficient (although this can be fixed with loading-time type propagation
          and bytecode rewriting). Microsoft uses this as a marketing point (&#8220;.NET
          never interprets any code&#8221;) as developers equate interpretation
          to bad performance. But interpretation has proven useful for Java,
          enabling low-footprint mixed-mode execution and very compact, yet reasonably
          fast, VMs for constrained devices (bytecode is typically more compact
          than native code, even without space-costing optimisations). The .NET
          Compact Framework competes with J2ME only in the higher-end devices.
          The GAC creates some management problems (the cache lives inside the
          Windows folder, growing with each installation), and it doesn&#8217;t
          enable closed-world optimisations while impeding aggressive dynamic
          optimisations.</p>
        <p>Only time will tell if .NET beats or is beaten by Java
          with its different designs. Some apparently obvious improvements (like
          caching JIT) are
          very seductive at first sight, but may fail to deliver significant
          advantage; and less flashy improvements (like a better memory model)
          could prove to be more important but easier to adopt by Java. There
          is also the possibility that future optimisations will make most low-level
          tradeoffs obsolete. For example, <a href="#baco02">[Bacon02]</a> presents a groundbreaking
          technique to allow Java objects to use a single-word header (current
          JVMs, as well as the CLR, need at least two-word headers), which reduces
          the need for impureness in the object model.</p>
        <h4>The VM model</h4>
        <p>Whatever the kind (Java, .NET or others), Virtual Machines
          (aka &#8220;Managed
          Runtime Environments&#8221;) should dominate the next generation of
          applications. The final question is whether VM-based execution has
          intrinsic performance advantages or disadvantages over traditional
          processes, and if future improvements could remove the disadvantages.</p>
        <p>It&#8217;s
          useful to remember that the current status quo &#8211; processes
          as we know them in standard operating systems &#8211; were not born
          with computing. Ancient operating systems (not to mention even older
          systems without an OS), did not offer memory protection, so all programs
          shared a single address space and other OS resources (like file descriptors),
          and nothing prevented buggy or ill-behaved programs from stepping over
          other&#8217;s toes. Right now everybody takes for granted the task
          isolation provided by OSes like Unix or Windows (typically using the
          term &#8220;Virtual Machine&#8221;), even though the overhead of this
          isolation is significant. Only in low-end embedded systems, it&#8217;s
          acceptable to go without memory protection in return to extra savings
          in speed and space. The software-based VMs like the JVM and CLR propose
          a higher-level of isolation.</p>
        <ul>
          <li> 
            <strong>Sharing.</strong> Both Java and .NET applications demand
            more memory than equivalent native applications. This is partially
            caused by the more complex runtime
            and heap organisation, but typically, most overhead comes from absence
            of sharing. Native VM code is typically shared, but it&#8217;s harder
            to share bytecode, VM metadata and JIT-generated native code. <a href="#czaj02">[Czajkowski02]</a>
            implements sharing for Sun HotSpot, a feature that might start surfacing
            in JDK1.5. Sharing can be combined to caching for better loading
            time: the CLR&#8217;s Global Assembly Cache organises code into relatively
            coarse-grained packages (&#8220;Assemblies&#8221;) that are stored
            in the filesystem. Any sharing solution has a trade-off in compatibility
            with dynamic optimisations: for instance, if a devirtualization depends
            on absence of (loaded) overrides of a polymorphic method, the code
            cannot be shared with other applications that may contain such overrides,
            so the JIT must either use a different devirtualization strategy
            or not share this specific compiled method.</li>
          <li><strong> Isolation.</strong> The next step in sharing is having
            a single VM process hosting multiple applications (of any kind, not
            only the neatly-packaged and
            well-behaved J2EE Enterprise Applications). Java presents difficulties
            like the single event queue of AWT and system methods with global-VM
            effect like <span class="code2">System.exit()</span>; these issues
            are being fixed with the &#8220;Application Isolation&#8221; API
            <a href="#jsr121">[JSR121]</a>. Full VM sharing is still difficult due to tuning and robustness
            concerns. Modern Java VMs
            offer a rich variety of knobs to select or fine-tune critical components
            like the JIT and GC, but these settings take effect at the VM scope.
            The same applies to management and global policies (e.g., enabling
            of assertions or remote VM monitoring). It&#8217;s likely that we
            will have multiple solutions &#8211; multiple JVM processes with
            shared memory for code; single VMs with JDK1.5&#8217;s Isolates,
            and application server instances &#8211; for different scenarios.
            The Isolates look good for applications executed by the Java PlugIn
            / WebStart, which
            typically are lightweight programs and don&#8217;t require custom
            VM options.</li>
          <li> 
            <strong>Privileged Execution.</strong> A type-safe language does not need memory protection:
            if all applications are pure Java, we could run these apps in a single
            address space without any risk because the constraints in each VM&#8217;s
            typesystem and object model would not allow producing rogue pointers
            to access other VM&#8217;s memory. One sad effect of .NET&#8217;s liberal
            support for unmanaged code and system calls is that a larger number
            of .NET apps will be &#8220;non-pure&#8221; and Microsoft would have
            trouble to exercise their ownership of Windows to realise the academic
            dream of process isolation through type safety (The SPIN OS, using
            Modula3, is a proof of concept for this idea <a href="#sire96">[Sirer96]</a>). Application
            servers that outlaw unmanaged code (like J2EE servers) are less sexy,
            but a good compromise. Low-end hardware is another ideal scenario:
            we can already use Java on devices that have no memory protection,
            which is a robustness advantage against native applications written
            for the same platforms (and this safety is absolutely mandatory when
            connectivity allows free installation of software titles, or mobile
            code).</li>
          <li><strong> Platform-specific Functionality.</strong> Binary deployment to multiple platforms
            is a selling point of most VMs (even Microsoft&#8217;s), but portable
            code creates more trouble than needing JITs. Full portability is only
            possible with portable APIs, so we quickly lose access to system-specific
            behaviours. Java&#8217;s challenges with multiplatform APIs are well
            known, but even the .NET class frameworks don&#8217;t cover the full
            Win32 APIs. How does a C# programmer write to a serial port? Answer:
            non-portable Win32 or COM calls. RS232 is pretty old stuff, but how
            about the Win32 Fiber Thread API?... Catching up with full OS APIs
            is neither possible, nor desirable in many cases (e.g. kernel APIs
            for drivers). The trick is supporting enough functionality that 99%
            of user-mode applications can be kept pure. The real problem is when
            an important feature is supported by multiple platforms, but very differently.
            Look no further than the event model of GUI toolkits. The only portable
            solution is creating a very heavyweight API that does things its own
            way, like Swing. Emulating or replacing the native features is possible
            (and easier with collaboration of the platform owner, like in MacOSX),
            but the footprint remains an issue as the implementation cannot be
            a relatively simple mapping from the framework to the platform. Notice
            that this problem is shared by any multiplatform solution; it&#8217;s
            not specific to VM-based systems.</li>
        </ul>        <p>Managed, portable applications will never match native applications
          in all aspects, but it doesn&#8217;t
          matter. Satisfying 90% of all applications is an outstanding success,
          if we remember that low-level programming environments cannot please
          everybody either.</p>
        <h4>Evolution and Compatibility</h4>
        <p>A critical analysis of the Java language,
          VM and frameworks will easily find many cases of &#8220;design rot&#8221;,
          where bad decisions from the past make the current system more limited,
          ugly or confusing than
          necessary. In many cases Sun &#8220;patched&#8221; Java with new solutions:
          Java carries obsolete APIs for GUI, collections, security, I/O, dates
          and more; all replaced by newer designs, but always keeping backwards
          compatibility, so developers still suffer with the old crust. Let&#8217;s
          look at this issue from the performance perspective.</p>
        <p>A common request
          is &#8220;dropping all deprecated APIs&#8221;, but
          this is irrelevant as the obsolete methods and classes amount to a
          very small fraction of the current platform and the savings in footprint
          and clutter would be minimal. More significant improvements would require
          incompatible changes in existing APIs. For example, all methods that
          return <span class="code2">Vector</span> need incompatible changes to return Java2&#8217;s <span class="code2">List</span>          
		  or <span class="code2">Collection</span>, as overloaded methods cannot differ by return type only.
          The problem here is Java&#8217;s lack of versioning support: the system
          should be able to carry multiple versions of the same packages, not
          allowing these to be mixed (each application can access either the
          old or the new API, not both). Sun implemented versioning features
          for application code (in the Java Web Start), but this is very necessary
          in the Java core. Dropping old APIs may benefit performance by dropping
          less efficient APIs in places where they are used only for compatibility,
          and where new APIs allow more efficient application code (e.g. if <span class="code2">StringTokenizer</span>          
		  would take <span class="code2">CharSequence</span> instead of <span class="code2">String</span>, callers that have string
          data in a <span class="code2">StringBuffer</span> or <span class="code2">CharBuffer</span> 
		  wouldn&#8217;t need <span class="code2">toString()</span>).</p>
        <p>The
          APIs evolve quickly, and newer APIs are generally better designed and
          more extensible (with strong reliance on interfaces) so they can
          evolve more smoothly even without versioning. On the other hand, everything
          else (language, bytecode, VM) moves very slowly, and Sun tends to adopt
          very conservative solutions. Everybody&#8217;s favourite example is
          inner classes, added to JDK1.1 to fix the event model without true
          first-class methods / closures (which would need VM and bytecode changes).
          A pragmatic thinking justifies inner classes: they avoid the performance
          problems of full closures but support their critical features. The
          problem is not the trade-off (C# delegates' are bigger)
          but conservative implementation: by requiring a new class even for
          a simple
          one-line event handler or comparator, inner classes have a disproportionate
          cost in bytecode size and other issues. JDK1.5 will define a dense
          binary format for faster downloading <a href="#jsr200">[JSR200]</a>, reducing some of these
          costs. This is a good example of a new (implementation) patch fixing
          an old (design) patch. Simultaneously, JDK1.5 will (after very long
          wait) introduce generic programming, but again, a very conservative
          model that needs zero changes to existing VMs and code. Considering
          all the research that went on more advanced models of generic Java,
          including support for primitive types and Just-In-Time specialisation
          <a href="#ages97">[Agesen97]</a> (like Microsoft&#8217;s design for generic C#), the major
          reason for GJ&#8217;s limitations are: not wanting to &#8220;fix what&#8217;s
          not broken&#8221; (like JITs and core APIs), and not scaring users
          and licensees with big language and VM changes.</p>
        <p>Good initial design,
          versioning, and some strategy (e.g. releasing new APIs as optional
          packages until they mature) help to evolve a platform,
          but at least once in a blue moon, the only solution is a major break
          with the past. Sun did it at least twice &#8211; Java2 (imposing major
          changes in implementations and APIs) and J2ME (abandoning WORA&#8217;s
          one-size-fits-all fundamentalism with platform-specific API, VM and
          language specs). From the performance side, when Java looks bad in
          any domain compared to competing systems, Java advocates (like me)
          are quick to point research papers / implementations that solve virtually
          all Java problems, which means that the problems have solutions and
          Java implementers have the skills to implement these solutions.</p>
        <p>The
          problem is getting all these improvements included in the Java platform.
          Sun, its major licensees like IBM, and now all JCP members,
          face increased tension between improving Java and maintaining a hard-to-earn
          reputation of stability, utterly important at least in the J2EE space,
          and a huge competitive advantage. Java is mature enough to replace
          very mature technology like C/C++, and Java&#8217;s first direct contender
          .NET faces an uphill battle for the hearts that value a stable platform
          where an application can be expected to run well today, and to run
          at all ten years later.</p>
        <p>Virtual Machines have a big advantage for software
          evolution: a newer platform can emulate its older versions just like
          it emulates a processor
          when running portable code. In theory, we could have a Java3 release
          some time in the future, fixing all known problems of Java with a clean
          solution, dropping half-backed fixes along with obsolete features.
          Just like an OS that moves from a 32-bit architecture to 64-bit, the
          runtime could support legacy code with several tricks &#8211; loading
          a different VM; using versioning so legacy code sees different versions
          of some APIs; or translating bytecodes at loading time. The latter
          option is very important: the job is much easier than for native code,
          because bytecode is (by design) easy to inspect, translate and instrument.
          Next-generation OSes/CPUs typically demand hardware support to run
          old programs with acceptable performance (see the debate between Intel&#8217;s
          pure IA64 Itanium and AMD&#8217;s 32-bit-compatible Opteron, that finally
          pressed Intel to announce better support for 32-bit application support).</p>
        <h3>3 CONCLUSIONS</h3>
        <p>High-level programming languages (like OOPLs) and execution
          environments (like VMs) offer many advantages to software developers,
          but these
          advantages usually embed trade-offs in size or speed. Decades of research
          and development in compilers, GCs and related technologies, try to
          eliminate these tradeoffs. In practice nothing is free, and trade-offs
          often only mutate &#8211; just like in physics, where matter and energy
          can be transformed into each other but never created or destroyed within
          a closed system. If we stretch this metaphor, the &#8220;closed system&#8221; must
          include compilation and linking.</p>
        <p>In traditional environments with ahead-of-time
          compilation, programmers perceive a &#8220;free lunch&#8221; in compiler
          optimisations: the biggest trade-off for better application code is
          increased resource
          usage by the compiler. Developers&#8217; workstations typically need
          at least double the memory, disk and CPU than end users&#8217;, so
          they don&#8217;t die of boredom during full builds or profiler sessions.
          The good news is that once finished, the program doesn&#8217;t carry
          heavyweight compilers to end user machines. It is theoretically possible
          to remove all overhead from high-level constructs with advanced compilers.
          This benefits even runtime facilities; for example, some optimisations
          (<a href="#gayy98">[Gayy98]</a>, <a href="#mikh02">[Mikheev02]</a>, 
		  <a href="#whal99">[Whaley99]</a>) reduce heap allocations, removing
          part of the GC overheads at runtime.</p>
        <p>VMs keep the compiler inside the
          deployed &#8220;closed system&#8221;.
          This makes harder to really eliminate overheads, from the end user&#8217;s
          perspective. Even in high-end systems, the remaining overheads are
          sometimes overkill: for example, background daemons like cvs have a
          near-zero memory footprint, and small tools like grep have near-zero
          loading time; in both cases a Java version (with current J2SE VMs)
          is not competitive.</p>
        <p>VMs compensate enabling some unique opportunities,
          like dynamic optimisation and code management, which more traditional
          environments can sometimes
          approach but never emulate completely. We must accept that each model
          offers some absolute advantages and disadvantages over the other. The &#8220;absolute&#8221; disadvantages
          can be reduced with better or more specialised implementations, but
          never removed completely.</p>
        <p>The Java VM has come a long way approaching
          maximum performance, first with better implementations, and more recently,
          with more specialised
          implementations. The very competitive market, with multiple implementers
          including licensees and clean-room, advanced the state of the art in
          a pace that no single company could do alone; competition played a
          major role in the &#8220;better implementations&#8221; stage but that&#8217;s
          certainly not enough. As evidenced by the evolution of J2ME, specialisation
          is very important in scenarios where Java&#8217;s usual trade-offs
          are not acceptable. Part of the solution here is making the VM work
          more like a traditional environment, removing as much hard work as
          possible from the application&#8217;s closed system: either moving
          work to development time (pre-verification, pre-optimisation, or install-time
          code generation) or even better, removing work completely (e.g., fixing
          the overweight memory model so optimised code uses less resources,
          or adding language features that reduce allocation in the heap so the
          garbage collector is triggered less often).</p>
        <p>A few issues depend on fixes
          or enhancements in the current Java specs; now the trade-off is not
          against bytes or cycles, it&#8217;s a matter
          of politics and strategy: major VM or language changes impact the investment
          of all Sun licensees, and may force complexity and compatibility issues
          into developers. The evolution of Java standards is historically much
          faster in the APIs: licensees redistribute most libraries without change,
          and developers face a smoother learning curve for features encapsulated
          by class libraries.</p>
        <p>&nbsp;</p>        <h3>REFERENCES</h3>
        <p><a name="ages00"></a>[Agesen00]	Ole Agesen, David Detlefs: &#8220;Mixed-mode
          Bytecode Execution&#8221;,
          <em>SMLI TR-2000-87</em>, June 2000.</p>
        <p><a name="ages97"></a>[Agesen97]	Ole Agesen, Stephen Freund, John
          Mitchell: &#8220;Adding
          Type Parameterization to the Java Language&#8221;, <em>OOPSLA&#8217;97</em>,
          October 1997.</p>
        <p><a name="azev99"></a>[Azevedo99]	Anna Azevedo, Alex Nicolau, Joe Hummel: &#8220;Java
          annotation-aware Just-in-Time (AJIT) Compilation System&#8221;, <em>JavaGrande&#8217; 99</em>,
          June 1999.</p>
        <p><a name="baco02"></a>[Bacon02]	David Bacon, Stephen Fink, David Grove: &#8220;Space-
          and Time-Efficient Implementation of the Java Object Model&#8221;,
          <em>ECOOP &#8216;02</em>,
          June 2002.</p>
        <p><a name="baum98"></a>[B&auml;umer98]	Dirk B&auml;umer, Dirk Riehle, Wolf Siberski,
          Carola Lilienthal, Daniel Megert, Karl-Heinz Sylla, Heinz Z&uuml;llighoven: &#8220;Values
          in Object Systems&#8221;, <em>Ubilab Technical Report 98</em>.10.1, 1998.</p>
        <p><a name="czaj02"></a>[Czajkowski02]
          Grzegorz Czajkowski, Laurent Dayn`es, Nathaniel Nystrom: &#8220;Code
          Sharing among Virtual Machines&#8221;, <em>ECOOP &#8216;02</em>, June 2002.</p>
        <p><a name="detl99"></a>[Detlefs99]
          David Detlefs, Ole Agesen: &#8220;Inlining of Virtual Methods&#8221;,
          <em>ECOOP &#8217;99</em>, June 1999.</p>
        <p><a name="detl02"></a>[Detlefs02] David Detlefs, Ross Knippel,
          William Clinger, Matthias Jacob: &#8220;Concurrent Remembered Set Refinement
          in Generational Garbage Collection&#8221;, <em>USENIX JVM&#8217;02</em>, April
          2002.</p>
        <p><a name="doed"></a>[Doederlein]	Osvaldo Doederlein, &#8220;The Java Performance Report&#8221;,
          JavaLobby site (<a href="http://www.javalobby.org/members/jpr/">http://www.javalobby.org/members/jpr/</a>).</p>
        <p><a name="floo01"></a>[Flood01]	Christine
          Flood, David Detlefs, Nir Shavit, Xiaolan Zhang: &#8220;Parallel
          Garbage Collection for Shared Memory Multiprocessors&#8221;, <em>USENIX
          JVM&#8217;02</em>, April 2001.</p>
        <p><a name="gayy98"></a>[Gayy98]	David Gayy, Bjarne Steensgaard: &#8220;Stack
          Allocating Objects in Java&#8221;,<em> Microsoft Technical Report</em>, November
          1998.</p>
        <p><a name="gosl00"></a>[Gosling00]	Gosling, J., Joy, B., Steele, G., Bracha, G.: &#8220;The
          Java Language Specification&#8221;, 2nd ed., 2000.</p>
        <p><a name="greh02"></a>[Grehan02]	Rick
          Grehan: &#8220;Deliver Big Functionality on Small Devices&#8221;,
          <em>Enabling the Wireless Enterprise</em>, July 25 2002.</p>
        <p><a name="grun02"></a>[Gruntz02]	Dominik Gruntz: &#8220;C#
          and Java: The Smart Distinctions&#8221;,
          in <em>Journal of Object Technology</em>, vol. 1, no. 5, November-December
          2002, pp. 163-176. <a href="http://www.jot.fm/issues/issue_2002_11/article4">http://www.jot.fm/issues/issue_2002_11/article4</a>.</p>
        <p><a name="gu00"></a>[Gu00]
          W. Gu, N. Burns, M. Collins, W. Wong: &#8220;The evolution of
          a high-performing Java virtual machine&#8221;, <em>IBM Systems Journal</em>,
          v39, n&deg;1, 2000.</p>
        <p><a name="holz94"></a>[H&ouml;lzle94]	Urs H&ouml;lzle: &#8220;Adaptive
          Optimization for Self: Reconciling High Performance with Exploratory
          Programming&#8221;, <em>PhD
          Thesis</em>, August 94.</p>
        <p><a name="ishi02"></a>[Ishizaki02] Kazuaki Ishizaki, Tatsushi Inagaki,
          Hideaki Komatsu, Toshio Nakatani: &#8220;Eliminating Exception Constraints
          of Java Programs for IA-64&#8221;. <em>PACT &#8217;02</em>, September 2002.</p>
        <p><a name="jsr1"></a>[JSR1]
          JSR-1: &#8220;Real-Time Specification for Java&#8221;. <em>Java
          Community Process</em> (<a href="http://www.jcp.org/en/jsr/detail?id=1">http://www.jcp.org/en/jsr/detail?id=1</a>).</p>
        <p><a name="jsr83"></a>[JSR83]	JSR-83: &#8220;Multiarray
          Package&#8221;. <em>Java Community Process</em>          (<a href="http://www.jcp.org/en/jsr/detail?id=83">http://www.jcp.org/en/jsr/detail?id=83</a>).</p>
        <p><a name="jsr121"></a>[JSR121]	JSR-121: &#8220;Application
          Isolation API Specification&#8221;.
          <em>Java Community Process</em> (<a href="http://www.jcp.org/en/jsr/detail?id=121">http://www.jcp.org/en/jsr/detail?id=121</a>).</p>
        <p><a name="jsr176"></a>[JSR176]
          JSR-176: &#8220;J2SE 1.5 (Tiger) Release Contents&#8221;.
          <em>Java Community Process</em> (<a href="http://www.jcp.org/en/jsr/detail?id=176">http://www.jcp.org/en/jsr/detail?id=176</a>).</p>
        <p><a name="jsr200"></a>[JSR200]
          JSR-200: &#8220;Network Transfer Format for Java Archives&#8221;.
          <em>Java Community Process</em> (<a href="http://www.jcp.org/en/jsr/detail?id=200">http://www.jcp.org/en/jsr/detail?id=200</a>).</p>
        <p><a name="jsr201"></a>[JSR201]
          JSR-201: &#8220;Extending the Java Programming Language with
          Enumerations, Autoboxing, Enhanced for loops and Static Import&#8221;.
          <em>Java Community Process</em> (<a href="http://www.jcp.org/en/jsr/detail?id=201">http://www.jcp.org/en/jsr/detail?id=201</a>).</p>
        <p><a name="jsr202"></a>[JSR202]
          JSR-202: &#8220;Java Class File Specification Update&#8221;.
          <em>Java Community Process</em> (<a href="http://www.jcp.org/en/jsr/detail?id=202">http://www.jcp.org/en/jsr/detail?id=202</a>).</p>
        <p><a name="lind96"></a>[Lindholm96]
          T. Lindholm, F. Yellin: &#8220;The Java Virtual Machine
          Specification&#8221;, 1996.</p>
        <p><a name="mikh02"></a>[Mikheev02]	Vitaly Mikheev, Stanislav
          Fedoseev: &#8220;Compiler-Cooperative
          Memory Management in Java&#8221;. <em>IWSP&#8217;02</em>, 2002.</p>
        <p><a name="more99"></a>[Moreira99]	Jose
          Moreira, Samuel P. Midkiff, Manish Gupta: &#8220;From
          Flop to Megaflops: Java for Technical Computing&#8221;, <em>ACM Transactions
          on Programming Languages Systems</em>, 1999.</p>
        <p><a name="pale01"></a>[Paleczny01]	Michael Paleczny,
          Christopher Vick, Cliff Click: &#8220;The
          Java HotSpot Server Compiler&#8221;, <em>JVM &#8217;01</em>, April 2001.</p>
        <p><a name="pugh99"></a>[Pugh99]
          William Pugh: &#8220;Fixing the Java Memory Model&#8221;,
          <em>JavaGrande&#8217;99</em>, June 1999.</p>
        <p><a name="serr00"></a>[Serrano00] Mauricio Serrano, Rajesh
          Bordawekar, Sam Midkiff, Manish Gupta: &#8220;Quicksilver: A Quasi-Static
          Compiler for Java&#8221;,
          <em>OOPSLA&#8217;00</em>, October 2000.</p>
        <p><a name="sire96"></a>[Sirer96]	Emin Sirer, Stefan Savage,
          Przemyslaw Pardyak: &#8220;Writing
          an Operating System with Modula-3&#8221;, <em>WCSSS&#8217;96</em>, February
          1996.</p>
        <p><a name="whal99"></a>[Whaley99]	John Whaley, Martin Rinard: &#8220;Compositional Pointer
          and Escape Analysis for Java Programs&#8221;, <em>OOPSLA&#8217;99</em>, 1999.</p>
        <p>&nbsp;</p>
		<p>&nbsp;</p>
        <h4>About the author<br>
        </h4>       
	    <p><strong>Osvaldo Pinali Doederlein</strong> is Technology Architect to Visionnaire S/A,
	      specialized in Java application development, object oriented technology
	      and distributed systems. He can be reached at <a href="mailto:osvaldo@visionnaire.com.br">osvaldo@visionnaire.com.br</a>.<br>
        </p>
	    <hr noshade width="80%" size="1">
        <p>Cite this column as follows: Osvaldo Doederlein: &#8220;The Tale of
          Java Performance&#8221;, in <em>Journal of Object Technology</em>, vol. 2, no.
        5, September-October 2003, pp. 17-40. <a href="http://www.jot.fm/issues/issue_2003_09/column3">http://www.jot.fm/issues/issue_2003_09/column3</a> </p>
        <hr> 
	<table width="179" border="0" align="right" cellpadding="0" cellspacing="2">
          <tr> 
            <td> <p class="text"><a href="../column2">Previous column</a></p></td>
            <td align="right"> <p class="text"><a href="../column4">Next column</a></p></td>
          </tr>
        </table></td>
    </tr>
  </table>
</div>
<!--#include virtual="/include/wide_footer.html" -->