An Empirical Study of the Code Pitching Mechanism in the .NET Framework

<!--#set var="title" value="Journal of Object Technology  -  An Empirical Study of the Code Pitching Mechanism in the .NET Framework, D. Anthony, W. Srisa-an, M. Leung" -->
<!--#include virtual="/include/wide_header.html" -->
  <div align="center">
  <table border="0" cellpadding="0" cellspacing="2">
    <tr> 
      <td width="0" class="text"> <div align="right"> 
          <table border="0" cellpadding="5" cellspacing="0" width="0">
            <tr> 
              <td> <p class="text"><a href="../article4">Previous article</a></p></td>
              <td align="right"> <p class="text"><a href="../outlook">Outlook</a></p></td>
            </tr>
          </table>
<hr>
          <table border="0" cellpadding="0" cellspacing="2" width="100%">
            <tr> 
              <td valign="top"> <h1>An Empirical Study of the Code Pitching Mechanism
                in the .NET Framework</h1>
			    <p class="text"><strong>David Anthony</strong>, <strong>Witawas Srisa-an</strong>, and <strong>Michael
			        Leung</strong>,
			      Department of Computer Science and Engineering,
			      University of Nebraska-Lincoln, United States<br>
              </p>
		      </td>
              <td align="right" valign="top"><img src="/images/graph/line20h.gif" alt="space" width="20" height="5" border="0"></td>
              <td align="right" valign="top"><span class="toptitle">ARTICLE</span><br>
                <br> 
				<a href="../article5.pdf"><img src="/images/logos/pdficonsmall.gif" alt="PDF Icon" width="22" height="24" border="0"><br>
                </a><span class="text_small">PDF Version</span></td>
            </tr>
          </table>
        </div>
       <p class="subtitle">Abstract</p>
       <p>The .NET Compact Framework is designed to be a high-performance virtual
         machine for
         mobile and embedded devices that operate on Windows CE (version 4.1
         and later). It
         achieves fast execution time by compiling methods dynamically instead
         of using interpretation.
         Once compiled, these methods are stored in a portion of the heap called
         the
         code cache and can be quickly reused to satisfy future method calls.
         While the code
         cache provides a high-level of reusability, it can also use a large
         amount of memory. As a
         result, the Compact Framework provides a &#8220;code pitching&#8221; mechanism
         that can be used
        to discard the previously compiled methods as needed.</p>
       <p> In this paper, we study the effect of code pitching on the overall
         performance and
          memory utilization of .NET applications. We conduct our experiments
           using Microsoft&#8217;s
          Shared-Source Common Language Infrastructure (SSCLI). We profile the
           access
          behavior of the compiled methods. We also experiment with various
           code cache
          configurations to perform pitching. We find that programs can operate
           efficiently with a
          small code cache without incurring substantial recompilation and execution
          overheads.</p>
       <hr noshade width="80%" size="1"> 
        <h3>1 INTRODUCTION</h3>
        <p>In both .NET and Java execution systems, Just-In-Time (JIT) compilers
          have been used to speed up
          the execution time by compiling methods into native code for the underlying
          hardware [12, 22, 18].
          JIT compilation has proved to be much more efficient than interpretation
          especially in execution
          intensive applications [11, 12, 22, 24]. In the Microsoft .NET Framework,
          a method is compiled
          prior to its first use. Afterward, the compiled methods are stored
          in the code cache for future reuse
        [16]. This code cache is located in the heap region.</p>
        <p> The size of the code cache can be increased or decreased depending
          on the program&#8217;s behavior.
          For example, in the default configuration of the <em>Shared-Source
          Common Language Infrastructure
          (SSCLI)</em>, frequently referred to as <em>Rotor</em> [14, 18], the initial code
          cache size is set to 64 MB. Once
          the accumulation of compiled methods reaches this size, the system
          can choose either to increase
          the code cache size or to keep the same size and free all the compiled
          methods not currently in
          scope (referred to as throw-away compiling [3] or code pitching [18,
          16]). There are two possible
          overheads of the &#8220;code pitching&#8221; mechanism [18, 16]&#8212; the
          overhead of traversing through all the
          compiled methods and the overhead of recompiling methods after pitching.
        However, pitching provides a means to maintain a small code cache as
        the memory is periodically reclaimed.</p>
        <p> Currently, code pitching is employed in the .NET Compact Framework
          (CF), which is used
          to develop applications for smart devices with limited memory resources
          [16]. Such devices include
          smart phones, Pocket PCs, and embedded systems running Windows CE.
          In these devices,
          a pitching policy can play a very important role since it can determine
          the amount of memory
          footprint for the code cache. If pitching occurs infrequently, the
          code cache would occupy a large
          amount of memory. If pitching occurs too frequently, a large number
          of methods would have to
          be recompiled. The goal of this paper is to take a preliminary step
          to study the effect of pitching
          on the overall performance and memory utilization of .NET applications.
          To date, there have been
          a few projects that investigate the recompiling decision and method
          unloading in Java [24, 23, 4].
          However, they are implemented into a virtual machine that does not
          support pitching. With the
          SSCLI, we have an opportunity to study the mechanism that has been
          built by a major software
          maker as a standard feature. Our work attempts to study two important
          research questions. They
          are:</p>
        <p> <em>RQ1: What are the basic behaviors of the compiled methods?</em>&#8212;We
          investigate the access behaviors,
          compilation frequency, and commonly used metrics such as size and the
          number of
          methods.</p>
        <p> <em>RQ2: Can we improve the overall performance and memory utilization
          by manipulating the code
          cache configuration?</em>&#8212;We experiment with multiple code cache sizes
          and investigate the
          impacts of utilizing different cache size enlargement policies.</p>
        <p> The remainder of this paper is organized as follows. Section 2 introduces
          related background
          information. Section 3 describes our challenges and research questions
          in detail. It also describes
          the methodology and constraints used to perform the experiments. Section
          4 discusses the experiments
          and results conducted with regard to the research questions. It also
          contains the detailed
          analysis of our findings. Section 5 discusses our finding in details.
          Section 6 presents the future
          work. Section 8 discusses prior research work in this area. The last
          section concludes this paper.</p>
        <h3> 2 BACKGROUND</h3>
        <p> This section discusses background information related to this work.</p>
        <h4> Shared-Source Common Language Infrastructure (SSCLI)</h4>
        <p> The main objective of the CLI is to allow programmers to develop
          component-based applications
          where the components can be constructed using multiple languages (e.g.
          C#, C++, Python,
          etc.). ECMA-335<sup><a href="#footnote1">1</a></sup></em> (CLI) standard
          describes &#8220;a language-agnostic
          runtime engine that is capable
          of converting lifeless blobs of metadata into self-assembling, robust,
          and type-safe software systems&#8221;          [18]. There are several
          implementations of this standard that include Microsoft&#8217;s <em>Common Language
            Runtime (CLR)</em>, Microsoft&#8217;s Shared Source Common Language Infrastructure
            (SSCLI)
          [14], Microsoft&#8217;s .NET Compact Framework, Ximian&#8217;s Mono
            project [15], and DotGNU [8]. For
          this research, we use the SSCLI due to the availability of the source
            code. Moreover, it seems to
          be the most mature implementation when compared to the Mono or DotGNU
          projects.</p>
        <p> SSCLI is a public implementation of ECMA-335 standard. It is released
            under Microsoft&#8217;s
          shared source license. The code base is very similar to the commercial
            CLR with a few exceptions.
          First, the SSCLI does not support ADO.NET and ASP.NET which are available
            in the
          commercial CLR. ADO.NET is a database connectivity API and ASP.NET
            is a web API that is
          used to create Web services. Second, the SSCLI uses a different <em>Just-In-Time
            (JIT)</em> compiler than
          the commercial CLR uses. The latter provides a more sophisticated
            JIT compiler with the ability
          to pre-compile code. However, the commercial CLR does not support
            code pitching. Notice that
          both implementations of the CLI adopt JIT compilation and not interpretation
            mode as in some
          earlier Java Virtual Machine implementations [19]. Third, it is designed
            to provide maximum
          portability. Thus, a software layer called Platform Adaptation Layer
            (PAL) is used to provide
          Win32 API for the SSCLI. Currently, the SSCLI has been successfully
            ported to the Windows,
          FreeBSD, and MacOS-X operating systems.</p>
        <p> One of the major runtime components related to this work is the Just-In-Time
            (JIT) compiler.
          It is used to compile methods within components into the native code
            for the underlying hardware
          [22]. The JIT compiler also ensures that every instruction conforms
            to the specification by
          the ECMA standard. Once compiled, these methods reside in the code
            cache which is located
          in the heap memory. Instead of recompiling a method each time it
            is called, the native code is
          retrieved from the code cache [16]. When more memory is needed by
            the system or when a long
          running application is moved to the background, the methods in the
            code cache are &#8220;pitched&#8221; to
          free up memory [16, 18].</p>
        <h4> Code Pitching Mechanism</h4>
        <p> There are three important variables in the code pitching mechanism:
                reserved cache size, target
          cache size, and maximum cache size. The target cache size is
                initially set to be the same as the
          reserved cache size, e.g. 64KB. The maximum cache size is set
                to be very large to allow the
          most flexible growth. By default, the SSCLI sets the maximum
                cache size to 2GB. The execution
          engine initializes the code cache by allocating 8KB. As program
                execution continues, additional
          heap space is allocated to the code cache in 8KB increments as
                needed to store the compiled
          methods. The total size of the allocated heap space is called
                the committed code cache size. As
          the committed code cache size approaches the reserved code cache
                size (e.g. 256KB), the allocator
          will decide whether to allocate more heap space beyond the current
                reserved cache size or pitch
          all unused methods. Notice that once the target cache size is
                reached, the reserved size is adjusted
          and used to determine the actual cache size. For example, if
                the initial cache size is 256KB (i.e.
          the initial value of target and reserved cache sizes) but the
                system presently needs 512KB to store
          compiled methods, the reserved size would be adjusted to 512KB
                while the target cache size
          remain the same at 256KB.</p>
        <p> If the reserved size is less than the maximum code cache size
                  and the existing pitch overhead (amount of time spent on pitching)
          is over the acceptable maximum (default 5ms), the allocator
                  will attempt to increase the code cache size. Notice that 5
                  ms is chosen as the default value in the
                  SSCLI to avoid excessive pitching. During this attempt, if
                  the needed memory is greater than the
                  reserved size, less than the maximum cache size, not at the
                  pitch trigger point, and pitch overhead
                  is too high, it will expand the committed code cache size and
                  the reserved size. Otherwise, it will
                  pitch all the unused compile methods<sup><a href="#footnote2">2</a></sup></em> . If there is still insufficient
                  memory after pitching, the code
                  cache size and the reserved size will be increased until enough
                  memory is available. If at any point
                  during the execution, the number of compiled methods reaches
                  the pitch trigger, pitching occurs
                  regardless of other cache conditions.</p>
        <p> Currently, code pitching is used in the .NET Compact Framework,
                    which is built for embedded
                  devices. Obviously, it is very important to strike a good
                    balance of memory usage and
                  performance overhead since such devices have a very limited
                    amount of memory. In addition, the
                  Compact Framework is often used in Windows CE that has the
                    maximum virtual process space of
                  only 32 MB. Thus, the amount of code cache has to be small
                    enough to work in this computing
                  environment but yet big enough to provide efficient compilation
                  of methods.</p>
        <h3> 3 EMPIRICAL STUDY</h3>
        <p> As stated earlier, the behavior of compiled methods in the
                        .NET framework has yet to be studied.
                  In order to design an efficient pitching policy, a thorough
                        understanding of the behavior is needed.
                  The current lack of this knowledge has led us to the
                  first research question.</p>
        <p> <em><strong>RQ1: What are the basic behaviors of compiled methods?</strong></em></p>
        <p> If a large number of methods is frequently used, then
                            it may not be suitable to pitch the code
                  cache frequently. Our contribution is to profile
                            the access behavior of compiled method so that
                  an efficient pitching decision can be made. We conjecture
                            that a significant performance gain or
                  reduction in memory usage can be obtained by utilizing
                            different pitching policies. Thus, our
                  second research question is:</p>
        <p> <em><strong>RQ2: Can we improve the overall performance and memory
                              utilization by manipulating the
                  code cache configuration?</strong></em></p>
        <p> In the default configuration of the SSCLI, the
                                policy is to explore other possibilities before
                  pitching is considered. This may not be the most
                                optimal approach especially for the Compact
                  Framework in which the amount of memory available
                                in a system may be limited. Our contribution
                  is to identify a cache size and suggest pitching
                                policies that would result in small cache footprint
                  and minimal compilation overhead.</p>
        <p> In terms of experimental platform, we conducted
                                  our study on an AMD 64 workstation with 1
          GB of memory and running Windows XP Professional. </p>
        <h4>Variables
            and Measures</h4>
        <p> The JIT compiler relies on several variables to control cache
                  size and pitching. These variables
                  are used to control the compiler when to pitch, maximum and
                  minimum cache size, and cache
                  growth characteristics. As will be described in the next subsection,
                  we use existing benchmark
                  programs written in C# to perform our experiments. It is worth
                  noting that we fix the heap size to
                  800KB (the default value set by the SSCLI). In doing so, we
                  can be certain that any changes in
                  performance are not due to different heap sizes but mainly
                  the changes in code cache sizes.</p>
        <p> Throughout the experiment, we monitored the following variables.
                  They provided useful insight
                  into the operation of the JIT compiler, specifically, its caching
                  mechanism.</p>
        <ul>
          <li> 
                    <em>Number of pitch events</em><br>
            When the compiler removes compiled code from the cache it
                    is called a pitch event. Pitching
                    will preserve methods that are currently in use (in scope),
                    but will remove the rest.</li>
          <li> 
                    <em>Number of recompilations</em><br>
                    After a method has been pitched, each time it has to be
                      compiled again is called a recompilation.
                      A method could be pitched and recompiled multiple times.</li>
          <li> 
                    <em>Number of distinct methods</em><br>
                    This is the number of distinct methods that have been compiled.
                      The number of distinct
                      methods does not include recompilations and does not consider
                      whether the method has
                      been pitched or not.</li>
          <li> 
                    <em>Committed code cache size</em><br>
                    The amount of heap space requested from the system to store
                      code is called the committed
                      code cache size. The compiler asks for heap in increments
                      of 8KB.</li>
          <li> 
                    <em>Code cache usage</em><br>
                    Code cache usage is the actual amount of memory used to
                      store compiled methods at a given
                      time.</li>
        </ul>        <p> To address RQ1, we monitor the basic behavior of compiled methods.
                  Our goal is to derive at
                  two important performance metrics based on the results of variables
                  above:</p>
        <ol>
          <li>  compilation frequency&#8212;we monitor how often methods
                      are compiled and recompiled.</li>
          <li>  concentration of compiled methods&#8212;we monitor which
                      part in the execution methods arecompiled the most.</li>
        </ol>        <p> We also observed the average size of compiled methods and compared
                      them to the sizes of
                  typical objects. In order to perform our experiments, we
                      created an environment in which the
                  amount of memory is similar to a typical Java embedded
                      device. To do so, we set the initial code
                  cache size to 256KB. However, we would allow the SSCLI
                      to enlarge the code cache as necessary.</p>        <p align="center"><img src="images/table1.gif" width="657" height="93"></p>
        <p align="center">Table 1: Basic characteristic of the compiled methods in our benchmarks.</p>
        <p align="center"><img src="images/table2.gif" width="374" height="113"></p>
        <p align="center">Table 2: Code-cache usage at different points in execution
        time.</p>
        <p>To address RQ2, we went a step further and prevented the SSCLI from
          enlarging the code
          cache. The goal of our experiment is to observe the behavior of compiled
          methods under a hardlimit
          and explore different code cache configurations to improve the overall
          performance. We
          also compared the execution time among different configurations that
          result in different number
        of pitch events.</p>
        <h4> Benchmark Programs</h4>
        <p> To address our research questions, we need a set of programs that
          compile a large number of
          methods. In addition, we must be able to manipulate the way these programs
          are operated. As of
          now, there are very few benchmark programs available for the .NET platform.
          We have gathered
          3 different programs that compiled a reasonable amount of methods (over
          1000). We also want to
          observe how the code cache would perform during the execution of smaller
          applications. Therefore,
          we also experiment with using the classic HelloWorld and Adaptive Huffman
          Compression
          to get some insights on how many methods are needed to execute such
          programs. To our surprise,
          HelloWorld still requires over 300 compiled methods. This section describes
          the experimental
        objects:</p>
        <ul>
          <li><strong> LCSC</strong><br>
            This benchmark is based on the front end of a C# compiler. The
                program parses a given C#
              input file with a generalized LR algorithm. The benchmark is available
              from Microsoft&#8217;s
              research web site [13], along with the inputs that were used in
              performing the analysis.</li>
          <li><strong> AHC</strong><br>
            This program uses an adaptive Huffman compression algorithm to
                process files. For this program there were three separate inputs
                for use as test
              cases. This benchmark is also
            available from Microsoft&#8217;s research web site [13].</li>
          <li><strong> PNG Decoder</strong><br>
            This program shows &#8220;how fast a Java implementation can decode
            a PNG photo image of a
          typical size used on a mobile phone.&#8221; [9]</li>
          <li><strong> CLisp Compiler</strong><br>
            This is a small compiler that converts a Lisp source file to an
                executable. The compiler
              was used to compile two sample source files, a Fibonacci series
                generator and a numerical
              sorting algorithm. This compiler is found in the sscli/compilers/clisp
            directory.</li>
          <li><strong> SharpSATbench</strong><br>
            SharpSATbench is a &#8221;clause-based satisfiability solver where
            the logic formula is written in
          Conjunctive Normal Form (CNF).&#8221; [13]</li>
        </ul>        <h3> 4 RESULTS</h3>
        <p> In the following subsections, we present the results of our experiments
          that answer two research
        questions proposed in Section 3.</p>
        <h4> RQ1: Access Behavior</h4>
        <p> In this section, we discuss the basic behavior of these compiled
          methods. The issues that will be
          discussed in this section include the number of compiled methods in
          each application, the number
          of methods recompiled, and the size of the compiled methods. Table
          1 depicts the size information
        of compiled methods in our benchmark programs.</p>
        <p> It is worth noticing that typical objects in object-oriented languages
          such as Java and C# only
          have an average object size of less than 100 bytes [7, 21]. However,
          the average size of the compiled
          methods in each application range from 300 bytes to over 1100 bytes.
          It is also worth noting
          that the smallest size for a compiled method is 52 bytes (see Section
          5 for more information). This
          is true across all applications. For the largest size, a method can
          be as large as 44KB bytes. We
          also conducted experiments with no pitching and found that 1.4MB of
          memory is needed to store
        the compiled methods in LCSC.</p>
        <p> In our experiment, we first study the code cache usage of every application.
          We set the cache
          size to be large enough so that pitching does not occur. With the proposed
          set of benchmarks,
          the size is set to 2 MB. We then monitor the fraction of the code cache
          usage at different points
          in the program&#8217;s execution. For example, LCSC requires 1.4 MB
        of space to store all compiled methods. When the program consumes 15%
          of all the needed cache space or 212KB, we observe
        the percentage of execution. In this case, the program has only completed
        0.29% of the total
        execution time (see Table 2). It is worth noting that in three out of
        five applications, over 45% of
        all the space needed for the code cache are consumed with in the first
        few percent of the execution
        time.</p>
        <p align="center"><img src="images/figure1.gif" width="477" height="345"></p>
        <p align="center">Figure 1: Distribution of compiled methods based on
          the number of compilations (128KB cache size).</p>        <p> We also monitor the distribution of methods based on the number of
          compilations. We set the
          code cache size to 128KB to emulate embedded devices environment and
          to induce some pitch
          events. We find that in two applications AHC and PNGD, no methods are
          compiled more than
          twice. However, in larger applications, such as the compilers and the
          constraints satisfaction program,
          methods are compiled multiple times. Notice that CLisp requires at
          most 7 compilations
          while SAT and LCSC require 9 and more than 10 compilations, respectively.
          Since most of these
          applications execute repetitive tasks, many compiled methods are reused.
          If pitch events are forced
          to occur more often, these programs may need to have methods recompiled
          more frequently. Figure
          1 illustrates our findings.</p>
        <p> To investigate the number of recompiled methods, we set the code
          cache size to 256KB to
          force pitching. We find that about 70% of recompilations occur during
          the first 10% of execution
          time (depicted in Figure 2) in all benchmark programs. The remaining
          30% of compilation occur
          during the remaining 94% of execution time. Thus, many of these methods
          are short-lived but
          during their lifetimes seem to have many accesses. This is similar
          to typical objects where the
          majority are short-lived [10, 20]. This behavior may provide an opportunity
          for optimization by
          dynamically adjusting the cache size as needed. For example, the cache
          size can initially be set to
          be larger and then reduced after the first 10% of execution. We are
          currently experimenting with
          this approach.</p>
        <p align="center"><img src="images/figure2.gif" width="460" height="246"></p>
        <p align="center">Figure 2: Distribution of recompiled methods over the
          execution time (all applications combined using 256KB cache size).</p>        <p> In summary, we find that compiled methods have the following behavior:</p>
        <ul>
          <li> The average size of a method is much larger than the average
            size of a typical object.</li>
          <li> Even the simplest applications require at least 300 methods
            to execute.</li>
          <li> In larger programs, a large number of methods are reused.
            This conclusion is based on the
              fact that large programs recompile a large amount of methods when
                the cache size is small
              and pitching occurs frequently.</li>
          <li> Methods are compiled more frequently toward the beginning
            of a program execution.</li>
        </ul>        <h4> RQ2: Optimizing Code-Cache Configuration and Pitching Policy</h4>
        <p> In this section, we will apply different pitching policies to LCSC
                and monitor the differences
          in the runtime behavior. We choose LCSC because it accesses a
                large number of methods and requires the largest number of pitch
          events. In the SSCLI, there
                    are two ways to set the size of the
                code cache. The first method (which shall be referred to
                    as the soft limit approach) is to set the
                target code cache to a certain size (e.g. 256KB). This however,
                    is not the highest possible value.
                When the volume of compiled methods reach 256KB for the first
                    time, the system will pitch all
                methods that are not reachable, but it will also consider
                    whether to increase the target cache size.
                Typically, the cache size is doubled. Thus, the next pitch
                    event will occur when the accumulation
                of methods in the code cache approaches 512KB. The second
                    method (shall be referred to as the
                hard limit approach) is to set the initial code cache size
                    to be the limit. Notice that the limit
                must be big enough to contain the initial method working
                    set that can initialize the application.
                If the cache size is too small to contain all methods during
                    initialization, the program may crash. Figure 3 depicts the
          pitch events using the soft limit approach. The initial target code
          cache is set
          to 256KB.</p>
        <p align="center"><img src="images/figure3.gif" width="453" height="301"></p>
        <p align="center">Figure 3: Monitoring pitch events in LCSC using the soft
                      limit approach.</p>
              <p> Figure 3 illustrates the basic behavior of code cache
                            expansion in the soft limit approach. The
                x-axis in the figure represents all the pitch events
                            that occur in the system. In this example, we
                have 7 pitch events throughout the execution of LCSC.
                            Table 3 depicts the number of pitch events
                in all applications with different target cache sizes
                            (256KB, 512KB, 1MB, and 2MB). It is worth
                noting that the benefit gained through this approach
                            is the reduction of the number of pitch events
                during the initial execution period. For example,
                            by increasing the initial cache size from 256KB
                to 512KB, the number of pitch events decreases by
                            two in LCSC. These two events occur during
                the first 5% of the execution.</p>
              <p>Figure 4 depicts the number of
                  methods that are recompiled by applying the soft limit approach
                  in which the cache size can be increased as needed.
                                Notice that there are more methods recompiled
                  after the later pitch events (4 to 6). This is
                                corresponding to Table 2 as methods are compiled
                  during the early part of the execution. As we continue
                                to pitch late into the execution, the methods
                  that were compiled and have recently been pitched
                  are still being invoked and must be recompiled.</p>
              <p align="center"><img src="images/figure4.gif" width="465" height="315"></p>
        <p align="center"> Figure 4: Ratios between new methods and recompiled
          methods based on pitch events.</p>
        <p align="center"> <img src="images/table3.gif" width="466" height="95"></p>
        <p align="center">Table 3: The number of pitch events with different
          code cache sizes.</p>        <p>&nbsp; </p>
        <p> It is worth noting that the initial target size
                                can greatly affect the number of pitch events
          in a system. This is because the first pitch event will take longer
          to occur with larger cache size. As
          shown in Figure 2, a majority of repeated invocations occurs within
          the first 10% of execution.
          Thus, a larger initial cache size is more advantageous because more
          reuse occurs at the beginning.</p>
        <p> Figure 4 initially appears to be contradicting Figure 2 as the volume
          of recompiled methods do
          not become significant until the fourth pitch event. However, we find
          that 4 out of 6 pitch events
          occur in the first 3% of execution. The fifth event occurs around 33%
          and the last event occurs at
          the 80%. Thus, most of the recompilation events occur during the initialization
        of the system.</p>
        <p> Similar to the <em>soft limit approach</em>, we investigate the number of
          distinct methods and previously
          used methods that have to be compiled after each pitch event using
          the hard limit approach.
          Figure 5 illustrates the number of pitch events occurring when this
          approach is used. When the
          code cache size is limited to 128KB, we have nearly 25000 pitch events.
          As expected, the number
          of pitch events decreases as the code cache size is enlarged (256KB
          and 512KB, respectively).
          We also discover that when the code cache is set to 1024KB, there are
          only 4 pitch events (not
          in the figure). Figure 5 also compares the number of distinct methods
          to the number of recompiled
          methods in the three configurations of code cache (128KB, 256KB, and
          512KB). In most
          instances, there are no distinct methods created during the execution&#8212;only
          at the beginning and
        at the end.</p>
        <p align="center"><img src="images/figure5.gif" width="575" height="428"></p>
        <p align="center">Figure 5: New methods v. recompiled methods in LCSC based on pitch
            events using the hard
          limit approach. Note that the y-axis represents the number of times
            that the two types of methods,
          distinct and recompiled have been compiled.</p>
       
        <p> On the other hand, a small code cache in applications that invoke
                a large number of methods
          can cause excessive pitching, as in the cases of 64KB to 256KB
                cache sizes in the hard limit
          approach (see Figure 6) and significant runtime overheads. LCSC
                in our experiment can take
          as much as 35 times more execution time than the soft limit approach
                when the code cache is
          set to 64KB. We also find that in all applications except LCSC,
                the code cache requirements are
          moderate. Thus, the number of pitch events does not increase
                significantly. For example, adjusting
          the code cache size in AHC, PNGD, SAT, and CLisp has very little
                effect on the execution time
          (see Figure 7). As a reminder, these applications were executed
                using the same initial heap size
          (800KB). Therefore, the differences in performance are mainly
                due to the changes in code cache
          size. With the observation, we conjecture that pitching may be
                used to reduce the memory footprint
          without incurring a substantial amount of overhead in memory
                constrained systems. We will
          conduct experiments to further validate our conjecture as part
                of future work.</p>
        <p align="center"><img src="images/figure6.gif" width="607" height="197"></p>
        <p align="center"> Figure 6: Pitching overhead when the hard limit approach
          is used.</p>
        <p align="center"><img src="images/figure7.gif" width="506" height="360"></p>
        <p align="center">Figure 7: Execution time overhead in the hard limit
          approach.</p>        <p> Figure 8 depicts the usage of code cache as LCSC is executed.
                  The x-axis represents the
          percentage of execution completed and the y-axis represents
                  the amount of memory in the code
          cache used by the program. </p>
        <p align="center">&nbsp;</p>
        <p align="center"><img src="images/figure8.gif" width="385" height="217"></p>        <p align="center"> Figure 8: Code cache usage for LCSC (256KB). Notice that the
                  scale of the x-axis is not linear.</p>
        <p> It is worth noting that with 256KB
          initial cache size using the soft limit approach, the size of the code
          cache increases to 1024KB within the first 3% of execution.
                  However, it will take another 30% of execution to accumulate
                  the compiled methods that would
                  result in another pitching. In this situation, it may not be
                  necessary to increase the cache size
                  from 768K to 1024K. In addition, after the pitch event at 33%
                  of execution time, the next pitch
                  events do not occur until 81 %. One possible improvement to
                  the pitching policy is to reduce the
                  cache size after the programs are fully initialized. This may
                  result in a few more pitch events but
                  a significant reduction in memory usage may also be obtained.</p>
        <p align="center">&nbsp;</p>
        <p align="center"><img src="images/figure9.gif" width="388" height="216"></p>        <p align="center">Figure 9: Code cache usage (1024KB). Notice that the scale
                  of the x-axis is not linear.</p>        <p> Figure 9 depicts the usage of code cache for LCSC with 1024KB
                  cache size applying the hard
                  limit approach. It is worth noting that there are no pitch
                  events at all until after 2.25% of execution.
                  The figure also shows that after the first pitch event, there
                  are only two more pitch events at 33 and
                  81% of execution. As a reminder, this is similar to the number
                  of pitch events in Figure 8 after 4%
                  of the program has been executed. Thus, a policy that favored
                  a larger cache size clearly reduces
                  the number of pitching activities during the initial state
                  of execution and would be effective in
                  memory-constrained environments.</p>
        <p> In summary, we conclude that the following policies may be
                  used to improve the pitching
                  performance.</p>
        <ul>
          <li> Moderate pitching activities have very little effect on
            the overall performance of the system.
                      However, excessive pitching can incur a large amount of
            overhead. Thus, a policy that favors
                      reducing memory usage over a moderate increase in pitching
                      activity would be effective in
                      memory-constrained environments.</li>
          <li> Larger initial cache sizes can significantly reduce the
            number of pitch events during the program
                      initialization. Thus, the policy should allocate a large
            enough cache at the beginning.</li>
          <li> Once stabilized, the system compiled fewer methods which
            means that we can potentially
                      reduce the cache size at the expense of more pitching activities.
                      However, the number of
                      pitch events should be moderate and not result in a substantial
                      run-time overhead. Thus, the
                      policy should include reducing the cache size after the
            initialization phase.</li>
        </ul>        <h3>5 CLASSIFICATION OF FREQUENTLY RECOMPILED METHODS</h3>
        <p> As shown in Figure 5, a large portion of methods are compiled and
          recompiled over and over
          again. In this section, we further identify the types of methods
            that exhibit this behavior. This
          insight may open more avenues to further optimize code cache management
          strategies.</p>
        <p> We conducted experiments by setting the maximum code cache to 64KB
          to force regular pitching.
          We then identified a subset of methods that are called at least once
          by all five benchmarks. We
          investigated the distribution of such methods and the frequency that
          these methods are invoked.
          For example, the distribution for <em>call by all</em> is calculated by observing
          the number of distinct methods
          that are invoked by all five applications. In our experiments, there
          are 85 methods that fall into
          this criteria. We then took the summation of all methods in all applications
          (last column in Table
          1) and the sum is 2438 in this case. The distribution of <em>call by
          all</em>          is calculated by <img src="images/bruch.gif" width="36" height="30"> or 16%.</p>
        <p align="center"><img src="images/figure10.gif" width="397" height="306"></p>
        <p align="center">Figure 10: Distribution of methods based on the number of applications
            that used them.</p>        <p> We then investigate the invocations frequency of these methods. There
              are over 3.5 million
          total invocations in all applications when the maximum code cache
              is set to 64KB. We find that
          the methods invoked by all applications account for 23.70% of invocations.
              This is 7% higher than
          the distribution depicted in Figure 10, which translates to heavier
              usage of these methods. We also
          find that all of these methods belong to the <em>System</em> library. We
              then combine the methods invoked
          by 5 and 4 applications and find that they only account for 24.90%,
              which means that methods
          used in four applications are not as heavily invoked as ones used
              by all applications.</p>
        <h3> 6 FUTURE WORK</h3>
        <p> Better benchmarks that utilize more methods and force the execution
                  engine to pitch more frequently
          especially for larger cache sizes are needed. In addition,
                  the benchmarks used in this
          experiment do not demonstrate the diversity of applications
                  the typical end user runs. More practical
          benchmarks are definitely needed to better simulate a real
                  world system. On the other hand,
          some of the chosen experimental objects compile reasonable
                  amounts of methods.</p>
        <p> With that said, many of our results derive from experimenting
                    with these few benchmark
          programs. Thus, our conclusions or suggestions should not
                    be viewed as generalized ones. Instead,
          they should be viewed as potential solutions to improve the
                    performance of the code-pitching
          mechanism in the SSCLI and .NET Compact Framework. Obviously,
                    experiments with more
          benchmark programs are needed.</p>
        <p> Future work will focus on four primary goals. The first goal
                      is to develop better benchmarks in
          order to better simulate real world uses of the SSCLI.
                      These benchmarks should focus on what a
          more average user would be expected to run. New benchmarks
                      should have networking and other
          communication methods that are inherent to their proper
                      execution.</p>
        <p> The second major goal is to develop a better code pitching
                        mechanism that selectively removes code from the cache,
          as opposed to the all or nothing approach taken in the current Rotor
                        implementation. This improved collection mechanism will
                        likely correlate method usage and size
                        to enable the pitching mechanism to make a better decision
                        as to its usefulness in the future. In
                        addition, the current Rotor implementation does not decrease
                        the size once the code cache has
                        been expanded. We plan to investigate the performance
                        gain of decreasing the cache size after the
                        initial phase of execution.</p>
        <p> The third goal is to further investigate the affects
                        of code cache management strategies on
                        performance metrics not studied in this paper such as
                        start-up time. Typically, applications for
                        mobile embedded devices are user-interactive. This means
                        that start-up time plays a very important
                        role in enriching users&#8217; experience. Once the programs
                        have started, the overall performance is
                        less noticeable by the users as most of the execution
                        time will be spent on input/output. It is
                        possible to tune the code cache manager to minimize start-up
                        time.</p>
        <p> Our last goal is to utilize an on-line profiling to create
                        application-specific code cache management
                        policies that perform as well as those created using
                        off-line analysis [23]. It will determine
                        the proper size, the proper increment, and the proper
                        increment frequency dynamically. Note [23]
                        introduced these issues but did not study them. The system
                        initializes a small code cache and
                        lets it quickly grow to a sufficient size to contain
                        all methods needed to start the program. In this
                        approach, no methods are unloaded during this period.
                        However, when the number of methods
                        needed has declined (e.g. after 10% of execution), the
                        methods are unloaded and the code cache is
                        reinitialized to a small size. During this time, the
                        code cache grows very slowly or does not grow
                        at all as there are only a few methods compiled and used.
                        Toward the end when the number of
                        methods compiled dramatically increases, the growth rate
                        of the code cache is once again adjusted
                        to match the applications&#8217; needs. Thus, the system
                        will allow the application to start faster but will
                        maintain a very small cache size throughout most of the
                        execution.</p>
        <h3> 7 RELATED WORK</h3>
        <p> In [2], a multi-level recompilation technique was introduced
                        as part of the Jalape&#732;no Virtual Machine.
                        The basic idea is to use a non-optimized compiler to
                        compile a method the first time it
                        is called. During the execution, the virtual machine
                        would keep track of all the &#8221;hot&#8221; methods
                        (frequently invoked) and recompile them with higher optimization
                        levels.                        </p>
        <p>Currently, the code pitching mechanism in .NET compact
                        framework as well as the SSCLI
                        discards all compiled methods that are not in scope.
                        The code cache itself is separate from the
                        main heap memory region. This type of strategy is often
                        referred to as &#8220;flush when full&#8221; [17]. One
                        of the first systems to use the approach to managed code
                        caches or translation-cache is <em>Shade&#8212;</em>an
                        instruction set simulator for execution profiling [5,
                        6]. In this system, <em>translation cache</em> (TC) is
                        a separate memory area used to stored translations. When
                        the TC is full, the system flushes all
                        the entries in the TC. The authors claimed that flushing
                        is advantageous over other approaches
                        because methods&#8217; chaining makes selective freeing
                        tedious. They initialized the TC to a large size
                        to minimize the number of flushing [5].</p>
        <p> Recent efforts by [24, 23] mainly investigated the performance
                          of a code unloading mechanism in virtual machines that
          store both compiled methods and regular objects in the same heap space.
                          Specifically, they identified what methods to unload
                          and when to unload. They used off-line and
                          on-line profiling techniques to improve the performance.
                          They reported a code size reduction of
                          up to 61%. One interesting observation that 70% of
                          compiled methods are dead within the first
                          10% of execution time was also reported and confirmed
                          by [1], in which .NET applications were
                          studied. However, [23] reported difficulties in developing
                          mechanisms to detect a point in time
                          that represents 10% of execution. Therefore, they used
                          two triggering techniques: unload every
                          10 GC cycles and unload every 10 seconds.</p>
        <p> In addition, they also conducted a preliminary study
                            of a similar scheme used in the .NET
                          compact framework that stores compiled method bodies
                            in a separate code cache; unloading is
                          performed when the cache is full. Their experiment
                            consisted of initializing the code cache size
                          to 64KB and enlarging it by 32KB every 10 unloading
                            cycles. They anticipated two difficulties
                          that were not studied: dynamically determining the
                            size of the code cache and determining how
                          often and by what increments to grow. Our proposed
                            activity further investigates and improves the
                          performance of this code cache management scheme.
                          It is worth noticing that they reported in their
                            earlier work that native IA32 code tends to be
                          6 to 8 times larger than the bytecode written in
                            Java. They also reported that on average 61% of
                          compiled methods are no longer accessed after the
                            first 10% of execution [24].</p>
        <h3> 8 CONCLUSIONS</h3>
        <p> We have performed experiments to demonstrate the
                                effects of code-pitching on the overall performance
                          of .NET applications. We find that the compiled
                                methods have the following properties.
                          First, they are much larger than typical objects
                                with averages ranging from 300 bytes to 1100
                          bytes. Second, a large number of methods are
                                repeatedly accessed. Third, these accesses often
                          occur within the first 6% of execution time.
                                Fourth, methods are compiled prolifically. At
                                64KB
                          cache size, the applications compiled over 3.3
                                million methods.</p>
        <p> Based on the above finding, we conduct multiple
                                  experiments using different code cache configurations.
                          First, we set the initial cache size to 256KB.
                                  We allow the system to expand the cache
                          as needed. By setting a larger initial cache
                                  size (e.g. 512KB versus 256KB), we can reduce
                                  the
                          number of pitch events by 29% (from 7 events
                                  to 5 events). Having a large initial cache
                                  size can
                          be advantageous since most of the method reuse
                                  occurs within the first few percent of execution.
                          Larger cache size may defer pitching and promote
                                  more reuse. Second, we also find that excessive
                          pitching can cause significant overhead. However,
                                  a moderate amount of pitching barely incurs
                          overhead. In our experiment we find that when
                                  the cache size is set at 2MB, no pitching occur.
                          However, if we reduce the cache size by half,
                                  only 3 pitch events would occur (in LCSC) but
                                  the
                          overall execution time increases only slightly.
                                  Thus, we conclude that a well designed pitching
                          policy can greatly reduce the amount of the
                                  code cache footprint without incurring substantial
                          overheads. In addition, a policy to reduce
                                  the code cache size after the initial state
                                  can also
                                  be
                          employed to further reduce the code cache footprint.</p>
        <p> We also find that there is a significant number
                                    of methods that are shared by all applications.These
          methods account for 16% of all methods. This methods are invoked prolifically
          and when
                                    the cache size is set to 64KB, they account
                                    for nearly 24% of all method invocations.
                                    By looking
                                    at the type of these methods, they are all
                                    part of the System library. One possible
                                    optimization
                                    based on this insight is not to pitch these
                                    methods to minimize recompilation efforts.</p>
        <h3> 9 ACKNOWLEDGEMENT</h3>
        <p> We would like to acknowledge Tyson Stewart
                                    for helping during the initial phase of this
                                    project.
                                    This project is partially supported by the
                                    National Science Foundation under grant CNS-0411043,
                                    University of Nebraska UCARE program, and
                                    University of Nebraska Layman&#8217;s Award.</p>
        <p><strong>Footnote:</strong></p>        
        <p><sup>1</sup><a name="footnote1"></a> European Computer Manufacturers
          Association.</p>
        <p><sup>2</sup><a name="footnote2"></a> This is in contrary to [18] in
          which the authors mentioned that the whole code cache is unloaded.
          We investigated the source code and found that only methods that are
          not currently accessed are unloaded.</p>
        
        <hr noshade width="20%" size="1">        <h3> REFERENCES</h3>
        <p> [1] D. Anthony, M. Leung, and W. Srisa-an.
                                    To JIT or not to JIT: The effect of code-pitching
                                    on the performance of .NET framework. In
                                    <em>Proceedings of 3rd International Conference
                                    on
                                    .NET Technologies</em>, pages 165&#8211;173, Plzen,
                                    Czech Republic, May 30-31, 2005.</p>
        <p> [2] M. Arnold, S. Fink, D. Grove, M. Hind,
                                    and P. F. Sweeney. Adaptive optimization
                                    in the
                                    Jalape&ntilde;o JVM. In P<em>roceedings of
                                    the 15th ACM SIGPLAN Conference on Object-Oriented
                                    Programming, Systems, Languages, and Applications</em>,
                                    pages 47&#8211;65, New York, NY, USA,
                                    2000. ACM Press.</p>
        <p> [3] P. Brown. Throw-away compiling. <em>Software
                                    Practice and Experience</em>, 6:423&#8211;434,
                                    1976.</p>
        <p> [4] D. Bruening, T. Garnett, and S. Amarasinghe.
                                    An infrastructure for adaptive dynamic
                                    optimization. In CGO &#8217;03: <em>Proceedings
                                    of the international symposium on Code generation
                                    and optimization</em>, pages 265&#8211;275, Washington,
                                    DC, USA, 2003. IEEE Computer Society.</p>
        <p> [5] B. Cmelik and D. Keppel. Shade: a fast
                                    instruction-set simulator for execution profiling.
                                    In
                                    SIGMETRICS &#8217;94: Proceedings of the
                                    1994 ACM S<em>IGMETRICS conference on
                                    Measurement and modeling of computer systems</em>,
                                    pages 128&#8211;137, New York, NY, USA,
                                    1994. ACM Press.</p>
        <p> [6] R. F. Cmelik and D. Keppel. Shade: A
                                    fast instruction set simulator for execution
                                    profiling.
                                    Technical report, Mountain View, CA, USA,
                                    1993.</p>
        <p> [7] S. Dieckmann and U. H&ouml;lzle. A study
                                    of the allocation behavior of the SPECjvm98
                                    Java
                                    benchmarks. In <em>Proceedings of the European
                                    Conference on Object-Oriented
                                    Programming</em>, June 1999.</p>
        <p> [8] DotGNU Project. DotGNU Project - GNU
                                      Freedom for the Net.<a href="http://www.dotgnu.org" target="_blank"> http://www.dotgnu.org</a>,
                                    2006.</p>
        <p> [9] Embedded Microprocessor Benchmark Consortium.
                                        Grinderbench.
                                    <a href="http://www.grinderbench.com/" target="_blank">http://www.grinderbench.com/</a>.</p>
        <p>[10] R. Jones and R. Lins. Garbage Collection: <em>Algorithms for
            automatic Dynamic Memory
        Management</em>. John Wiley and Sons, 1998.</p>
        <p> [11] A. Krall. Efficient JavaVM just-in-time compilation. In J.-L.
          Gaudiot, editor, <em>International
          Conference on Parallel Architectures and Compilation Techniques</em>, pages
          205&#8211;212, Paris,
          1998. North-Holland.</p>
        <p> [12] A. Krall and R. Grafl. CACAO &#8212; A 64-bit JavaVM just-in-time
          compiler. <em>Concurrency:
          Practice and Experience</em>, 9(11):1017&#8211;1030, 1997.</p>
        <p> [13] Microsoft. Ben&#8217;s CLI benchmark. <a href="http://research.microsoft.com/" target="_blank">http://research.microsoft.com/</a></p>
        <p> [14] Microsoft. Microsoft shared source CLI. <a href="http://www.microsoft.com/downloads" target="_blank">http://www.microsoft.com/downloads</a>.</p>
        <p> [15] Mono. What is Mono? <a href="http://www.mono-project.com/Main%20Page" target="_blank">http://www.mono-project.com/Main
        Page</a>, 2006.</p>
        <p> [16] S. Pratschner. information available from
          <a href="http://weblogs.asp.net/stevenpr/archive/2004/07/26.aspx" target="_blank">http://weblogs.asp.net/stevenpr/archive/2004/07/26.aspx</a>.</p>        <p> [17] J. Smith and R. Nair. Virtual Machines: <em>Versatile
              Platforms for Systems and Processes</em>.
          Morgan Kaufmann, June 2005.</p>
        <p> [18] D. Stutz, T. Neward, and G. Shilling. <em>Shared Source CLI
            Essentials</em>.
          O&#8217;Reilly and
          Associates, 2003.</p>
        <p> [19] Transvirtual. Kaffe virtual machine. <a href="http://www.kaffe.org" target="_blank">http://www.kaffe.org</a>.</p>
        <p> [20] D. Ungar. The design and evaluation of a high performance Smalltalk
          system. <em>ACM Distinguished Dissertations</em>, 1987.</p>
        <p> [21] Q. Yang, W. Srisa-an, T. Skotiniotis, and J. M. Chang. Java
          virtual machine timing probes:
          A study of object lifespan and garbage collection. In <em>Proceedings
          of 21st IEEE
          International Performance Computing and Communication Conference (IPCCC-2002)</em>,
          pages 73&#8211;80, Phoenix Arizona, April 3-5, 2001.</p>
        <p> [22] F. Yellin. Just-in-time compiler interface specification.
          <a href="ftp://ftp.javasoft.com/docs/jit%20interface.pdf" target="_blank">ftp://ftp.javasoft.com/docs/jit
          interface.pdf</a>, 1999.</p>
        <p> [23] L. Zhang and C. Krintz. Adaptive code unloading for resource-constrained
          JVMs. In
          <em>LCTES &#8217;04: Proceedings of the 2004 ACM SIGPLAN/SIGBED conference
          on Languages</em>,
          compilers, and tools, pages 155&#8211;164, New York, NY, USA, 2004.
          ACM Press.</p>
        <p> [24] L. Zhang and C. Krintz. Profile-driven code unloading for resource-constrained
            JVMs. In
          <em>International Conference on the Principles and Practice of Programming
            in Java</em>, Las
          Vegas, NV, June 2004.</p>
        
        <h4>About the authors</h4>       
		 <table border="0" cellpadding="0" cellspacing="0">
          <tr>
            <td valign="top">&nbsp;</td>
            <td valign="top" class="text"><strong>David Anthony</strong> is a
              senior and an undergraduate research assistant in the Department
              of Computer
              Science and Engineering at University of Nebraska-Lincoln, USA.
              He can be reached at
              <strong>danthony@cse.unl.edu</strong>. <br>
              <br>
</td>
          </tr>
          <tr>
            <td height="0" valign="top">&nbsp;</td>
            <td valign="top" class="text"><strong>Witawas Srisa-an</strong> is
              an Assistant Professor in the Department of Computer Science and
              Engineering
(<a href="http://www.cse.unl.edu" target="_blank">http://www.cse.unl.edu</a>)
at University of Nebraska-Lincoln, USA. His research
interests
include computer architecture, Object-Oriented systems, programming languages,
and distributed
systems. He can be reached at <a href="mailto:witty@cse.unl.edu">witty@cse.unl.edu</a>.<br>
<br></td>
          </tr>
          <tr>
            <td height="0" valign="top"></td>
            <td valign="top" class="text"><p><strong>Michael Leung </strong>graduated in 2005
                with a BS degree in Computer Engineering from the Department
of Computer Science and Engineering at University of Nebraska-Lincoln, USA. He
can be
reached at <a href="mailto:mleung@cse.unl.edu" target="_blank">mleung@cse.unl.edu</a><br>
<br>
            </p>
           </td>
          </tr>
        </table>        
        <p>Cite this article as follows: David Anthony, Witawas Srisa-an and
          Michael Leung: &#8220;An Empirical Study of the Code Pitching Mechanism
in the .NET Framework&#8221;, in Journal of
          Object Technology, vol. 5, no. 3, April 2006, Special issue: .NET Technologies
          2005 Conference
        2005, pp. 107-127 <a href="http://www.jot.fm/issues/issue_2006_04/article5">
        http://www.jot.fm/issues/issue_2006_04/article5</a></p>
        <hr noshade width="80%" size="1">
        <p>&nbsp;</p>        <hr> 
		<table border="0" align="right" cellpadding="5" cellspacing="0">
          <tr> 
            <td> <p class="text"><a href="../article4">Previous article</a></p></td>
            <td align="right"> <p class="text"><a href="../outlook">Outlook</a></p></td>
          </tr>
        </table></td>
    </tr>
  </table>
</div>
<!--#include virtual="/include/wide_footer.html" -->