Previous article

Next article


Scaling a Shared Object Space to the Internet: Case Study of Virat

A Vijay Srinivas and D Janakiram, Distributed & Object Systems Lab, Department of Computer Science & Engineering, Indian Institute of Technology, Madras, India

space REFEERED
ARTICLE


PDF Icon
PDF Version

Scalability is an important issue in the construction of distributed systems. Shared object spaces provide an elegant and easy-to program abstraction for building applications. However, existing shared object spaces have been realized at the cluster level. Use of centralized components, lack of effective failure handling mechanisms, lack of efficient object lookup mechanisms as well as consistency maintenance are the key issues that inhibit scalability of existing shared object spaces. We present the case study of scaling an existing shared object space (Virat) to the Internet. Bottlenecks in Virat include the granularity of consistency maintenance and Object Meta-data Repository (OMR) failures. Both the design and implementation of Virat has been modified in order to increase the granularity at which consistency is maintained. Virat has also been redesigned such that the OMRs form a Peer-to-Peer (P2P) overlay in order to handle OMR failures and improve scalability. Experimental evaluations are presented to show that the optimized version of Virat scales better, especially over a wide-area network. In addition, this paper also explains how to develop applications over the shared object space, with code sketches.


1 INTRODUCTION

Distributed Shared Memory (DSM) provides an illusion of globally shared memory, in which processors can share data, without the application developer needing to specify explicitly where data is stored and how it should be accessed. DSM abstraction is particularly useful for parallel computing applications, as demonstrated by TreadMarks [1]. Collaborative applications such as on-line chatting and collaborative browsing would be easier to develop over a DSM.

Java/DSM [2] provides a Java Virtual Machine (JVM) abstraction over Tread-Marks. It is an example of page based DSMs, similar to Munin [3] and TreadMarks. Page based DSMs shared data at the level of memory pages, while object based DSMs (also known as shared object spaces) share application objects. Page based DSMs can be more efficient, due to the availability of hardware support for detecting memory accesses. But due to the larger granularity of sharing, page based DSMs may suffer from false sharing. Object based DSMs alleviate the problem by a more

application specific granularity of sharing. Examples of object based DSMs include Linda [4] and Orca [5]. To our best knowledge, none of the shared object spaces have been really scaled to the Internet, as we argue below. An Internet scale shared object space can make it easier to develop applications such as computer supported cooperative work (CSCW) based collaborative designing and multi-player games [6], among others.

Global shared objects [7] allows heap objects in a JVM to be shared across nodes. Based on memory access patterns of applications, it also proposes various consistency mechanisms to be realized efficiently. However, it uses locks and a perobject lock manager for keeping replicas consistent. It does not address failures of the lock manager. Java spaces specification from Sun [8] provides a distributed persistent shared object space using Java RMI and Java serialization. It provides Linda-like operations on the tuple space and uses Jini’s transaction specification to achieve serializability of write operations. It also does not address fault tolerance, an important issue for Internet scale systems.

Orca relies on an update mechanism based on totally ordered group communication to serialize access to replicas. Even though a study has shown that the overhead of totally ordered group communication affects application performance minimally [9]1, the study was done on a Myrinet cluster. This overhead may become substantial and affect application performance adversely over the Internet. This is primarily because the totally ordered group communication relies on a sequencer or uses broadcast. These could be expensive over the Internet. But Orca has not been evaluated on the Internet scale.

We have proposed a generic scalability model for analyzing distributed systems [10]. It takes the view that scalability of distributed systems should be analyzed considering related issues such as consistency, synchronization, and availability. By applying the scalability model on shared object spaces, we have identified the key bottlenecks that inhibit existing shared object spaces from scaling up to the Internet:

  • Centralized Components
    Many existing DSMs and shared object spaces have some centralized components that affects their scalability. For instance, Orca has a sequencer for realizing totally ordered group communication, while others like T Spaces [11] have a centralized component for object lookups.
  • Failures
    Existing shared object spaces do not handle failures. For instance, JavaSpaces and global shared objects do not handle failures of transaction coordinator, while Orca does not handle failure of the sequencer.
  • Object Lookup
    Given an object identifier (id), efficient mechanisms must exist that mapsthe id to the node that either stores a replica or stores meta-data about the replica. Existing shared object spaces such as T Spaces uses centralized lookup mechanisms. Object lookup mechanisms in distributed object middleware such as CORBA, DCOM also have difficulty in handling failures and scaling up.
  • Consistency
    Several existing DSM systems such as TreadMarks, Munin and shared object spaces such as JavaSpaces provide relaxed consistency mechanism such as release consistency and entry consistency. Relaxed consistency mechanisms have also been explored in other areas [12, 13]. However, to our knowledge, these mechanisms have not been evaluated in Internet scale systems. Peer-to-Peer (P2P) systems which have been scaled to the Internet, such as Pastry [14] and Tapestry [15] assume replicas are read-only.

Scaling up a Shared Object Space to the Internet

We have developed Virat [16], that was intended as a wide area shared object space to addresses infrastructure support for large scale distributed applications. It supports various object interaction styles so that applications can be architected easily. Virat uses a novel mechanism for handling failures and provides a data centric Concurrency Control (CC) mechanism to realize various consistency models. Virat has been extended to a fully typed shared event space, facilitating publish-subscribe kind of interactions and large scale event notifications [17]. However, by applying the scalability model [10] to Virat, we find that there are bottlenecks. The goal of this paper is to explain how we address the above mentioned problems in the context of Virat and scale it up to the Internet.

Peer-to-Peer systems such as Pastry [14] have been proposed to handle efficient, decentralized object lookups and tolerate failures. The early versions of Virat handled OMR failures through explicit lookup servers, under the assumption that the lookup server and the OMR in that cluster do not fail simultaneously. We have redesigned Virat to handle OMR failures elegantly and scale up better. The
OMRs form a Pastry ring and route data through the routing protocol of Pastry in order to find which OMR maintains information about a given object. Information maintained in any OMR is also stored in k-replicas. The former enables the lookup time to be only O(log(n)) for n nodes and the latter enables Virat to handle OMR failures. Virat has been integrated with the code of Freepastry (available at http://freepastry.rice.edu/) for object lookup and routing among the OMRs. Performance evaluation over a wide-area network shows that the P2P communication based Virat scales better.

We also find that Virat could be optimized with respect to consistency granularity maintenance. We evaluate its performance and show that the optimized Virat scales better. In the process, we experimentally compare three different design mechanisms for achieving consistency in Virat: a two phase commit (2PC) based naive mechanism, a CompareAndSwap (CAS)-like update mechanism and the optimized mechanism. We find that the CAS-like mechanism, although it performs well, degenerates to the optimized mechanism to handle failures and scale up.

Section 2 provides a brief overview of the existing Virat shared object space. Section 3 describes the scalability model that we have applied to Virat. Section 4 explains the changes in design and implementation made to Virat for scaling it up. Section 5 presents the performance evaluation of Virat over an Institute wide as well as a wide-area testbed. Section 6 compares the modified Virat with a few other large scale systems. Section 7 puts the paper in perspective and provides directions for future research.

2 VIRAT: A SHARED OBJECT, EVENT AND SERVICE SPACE

Virat supports middleware services such as naming and trading as well as replica object management. Virat uses an independent check pointing and lazy reconstruction mechanism to handle failures of OMRs. Lookup servers are also run, one in each cluster. The lookup server maintains the current location of the OMR in that cluster. Under the assumption that both lookup servers and OMRs in a cluster would not fail simultaneously, failures are handled. The OMRs (one per cluster) are responsible for cluster level management of replicas. The OMRs communicate among themselves to locate objects across clusters.

The appendix explains how to write programs over the shared object space. A DSM runtime object is present in every node of the system. This object serves as the interface for the client code to access the DSM services. When the DSM runtime object gets a request for creating shared objects, it interacts with the DSM services, namely the lookup and OMR services and returns a replica of the object to the client. The OMR service maintains information about objects and the current accessors for each object. Each object has a data counter, for ensuring data centric CC [18]. The data counter is also used as a versioning mechanism that forms the basis of reconstruction of global system states, while using independent checkpointing to recover from failures.

The DSM runtime on each node handles the initial object discovery requests. It looks up the object in its cache, failing which it contacts the lookup service of the cluster. If the object has not been created before or has been created in a different cluster, the DSM runtime sends a request to the OMR of this cluster. The OMR creates a new unique oid for the object and gives back a copy of the object. The OMR maintains list of current accessors for each object. When an update request message is received from an accessor, it is propagated to all other accessors, the order depending on the application specific consistency criteria.

3 THE SCALABILITY MODEL

A number of studies have been made considering the scalability of distributed systems. A scalability metric based on productivity was defined in [19]. The metric evaluates scalability as the product of throughput and response time (or any value function) divided by the cost factor. An analytical model of server systems based on Layered Queuing Networks (LQN) was proposed in [20]. A more general analytical model of client-server systems based on rendezvous networks was given in [21]. It illustrates how certain slow servers could become bottlenecks and suggests threading/cloning to relieve these bottlenecks. Recent work on design patterns for concurrent and networked systems documents patterns such as Reactor, Half-Sync Half-Async for efficient multi-threading in server systems [22]. Scalability of Java 2 Enterprise Edition (J2EE) technology has been evaluated in [23].

All the above efforts perceive scalability in isolation and some are even specific to certain technologies. The inter-related issues of synchronization, consistency and availability are neglected. One of our main conjectures is that availability, consistency, synchronization and scalability are closely related and should be looked at in totality, not in isolation. We have proposed an analytical model that considers scalability as a function of synchronization, consistency, availability, workload and faultload [10].

scalability = f(avail, sync, consis,workload, faultload)

  • avail is availability - can be quantified as the ratio of number of transactions accepted versus submitted. Availability itself is a function of network availability (number of operations reaching any replica) and service availability (number of operations accepted by a replica). If the availability requirements are very high, a trade-off between scalability and consistency may be required. If strict consistency is also required, the scalability of the system may be compromised. If consistency can be relaxed, both high availability and scalability can be achieved [24].
  • consis is consistency, itself a function of update ordering and consistency granularity. Update ordering refers to update ordering mechanisms across replicas and can be one of causal, serializable or PRAM. Consistency granularity refers to the grain size at which consistency needs to be maintained in the system. There are two dimensions to consistency granularity. Caching is an important and well known method for improving performance. The size of caching is an important factor governing the performance. A higher size implies more chances of program locality effects and better performance. The other dimension is the granularity of consistency maintenance; whether it is at the individual replica level or using some form of aggregation. It may be easier to maintain consistency at aggregate level than at individual replica level, but at the cost of strictness.
  • sync refers to synchronization among the replicas. The two dimensions of synchronization are how often the replicas are synchronized and the mode of synchronization. The former is captured in terms of the referring to the number of updates that can be buffered in a particular node before synchronizing with others. The latter can be modeled as a value, with = 0 implying a server push/invalidation model and a positive non-zero value indicating that a replica could miss some updates and can be updated lazily, either in push or pull mode. This may reduce the messaging overhead and improve the scalability of the system. A similar idea has been exploited in the notion of local consistency [13], but without any attempt at quantification.
  • Workload can be broken down into workload intensity and workload service demand characterization. Workload intensity refers to number of transactions per second or number of clients etc. and service demand refers to CPU time for operations, network delays etc. As workload increases, performance of the system must degrade gracefully. Improving service demand (say by having a faster processor) can help in graceful degradation.
  • Faultload is the failure sequences and number as well as location of replicas. In Internet scale systems, failures become important. Systems as well as networks are prone to failure, with increased message latencies creating additional difficulties.

The function could be a mathematical function that can be used to quantify the scalability of a distributed system. At a higher level, the can also be viewed as an algorithm. The primary goal of this algorithm is to identify scalability bottlenecks in the system. An open issue is whether the arguments to are complete and minimal.

4 REDESIGNING VIRAT TO THE INTERNET SCALE

The scalability model is a useful tool to identify bottlenecks in distributed systems. The scalability model has been applied to Virat and it gives directions for optimization of Virat. First, relating to the consistency parameter, Virat has been optimized to implement different update ordering mechanisms. These imply different consistency mechanisms, namely causal consistency, sequential consistency and causal serializability. This is in line with the established principle that relaxing consistency ordering leads to improved scalability [13]. The stricter the consistency mechanism that needs to be enforced, the more overhead is involved in terms of messages and protocols. For instance, to enforce serial consistency, a two phase commit (2PC) or a more sophisticated three phase commit (3PC) protocol may have to be in place. Causal consistency can be implemented by a data centric mechanism [26]. Causal serializability can also be implemented without a system wide agreement protocol. A further optimization in Virat relates to consistency granularity. Serial consistency is enforced by a 2PC protocol. The DSM objects in each node which maintain replicas of an object, run the 2PC among themselves. The initiator node acts as a 2PC coordinator. 2PC may be expensive, especially in a wide-area setting. An optimization is to ensure strict consistency only at the level of OMRs. They are responsible for update propagation to other nodes. Thus, a serializability protocol, if required is enforced only among the repositories and not among all replicas. This reduces the number of participants in the 2PC algorithm, making it more efficient. The OMRs are responsible for propagating updates to individual replicas. An invalidation based approach is used to maintain consistency among the OMRs, to avoid 2PC. These optimizations are expected to improve the scalability of Virat.

Fault tolerance is the other dimension along which Virat has been redesigned. Nodes as well as networks are more prone to failure in an Internet setting, making the initial version of Virat unsuitable. This is true especially with respect to OMR failures. OMRs maintain information about objects or object meta-data, including details like object id and list of accessors for the object. Each cluster has a predesignated node as an OMR. To handle OMR failure, a lookup server was used. The lookup server keeps update information about the current location of its (cluster level) OMR. The assumption was that both the lookup server and the corresponding OMR do not fail simultaneously. This may be untenable in an Internet setting. If an object that is created in one cluster (its meta-data is maintained by OMR1) needs to be replicated in a different cluster (which has OMR2), OMR1 does not know which of the OMRs has meta-data corresponding to that object. Hence, OMR1 searches through all the OMRs and finds that OMR2 maintains this data. This causes replica creation requests to be delayed, especially in a wide-area setting. Thus, a better mechanism is required to handle OMR failure and efficient object lookups across clusters.

The OMRs form a Pastry ring and route data through the routing protocol of Pastry in order to find which OMR maintains information about a given object. This enables the lookup time to be only O(log(n)) for n nodes and can reduce crosscluster replication request times considerably. Information maintained in any OMR is also stored in k-replicas, meaning k other OMRs. Thus, even if an OMR fails, the routing protocol of Pastry ensures that, among the k OMRs that are alive at that point, the message is routed to the OMR with the closest matching id. Thus, OMR failures are also handled elegantly.

Design and Implementation

This section presents the details of redesigning Virat along the lines of consistency granularity and fault-tolerance (P2P routing among OMRs) for scalability. We now explain the design and implementation details of the three consistency mechanisms: naive 2PC mechanism, a CAS-like update mechanism and an optimized 2PC mechanism. The client instantiates a DSM object and makes calls on that object. All the calls are forwarded to a local DSM runtime object, which in turn forwards them to the OMR whenever necessary. The DSM runtime object contains data structures to maintain the objects created or read by the local node. The OMR maintains details of objects accessed within a cluster.

Interface Additions for Naive 2PC Based Consistency Mechanism

The methods to be added to the interface of the DSM object are given below, with brief explanations of their purpose.

  • String WriteSequential (Object obj, String ClassName, Object[] dependencyList)

This method allows an application to write into an object and ensure sequential consistency. The normal Write method of Virat ensures only causal consistency by default. This method makes the DSM object act as a 2PC coordinator and initiate a 2PC algorithm to ensure that the write is propagated to all replicas consistently.

  • public void UpdateClientList(String oid, server.returnable clientList)

This method is called from a Create method (this creates a new replica). It adds the new replica to all the existing clients, so that whichever client initiates a 2PC, this client is also included as a participant.

  • public boolean Prepare2PC(String oid, Object obj)

This method refers to the prepare phase of 2PC among replicas to serialize writes. This method is invoked by the 2PC coordinator. It checks local constraints and returns true if there are no other updates pending. It returns false otherwise, in which case the 2PC coordinator will abort in the next phase.

  • public void Commit2PC(String oid, Object obj)

This method refers to the commit phase of 2PC among replicas to serialize writes. This method is invoked by the 2PC coordinator, after it gets a ’true’ from all replicas. Even if it gets a single ’no’ vote, the 2PC has to be aborted.

  • public void Abort2PC(String oid, Object obj)

This is similar to the above for the abort phase of 2PC.

CompareAndSwap(CAS)-like Updates

The naive 2PC based consistency mechanism of Virat can be compared to the Distributed Data Structure (DDS) approach of [27]. DDS was proposed as a centralized interface to persistent distributed data for web services. DDS is actually based on replication and distribution and provides atomic single element updates. The consistency of the replicas are maintained by a 2PC protocol to ensure serialization. This makes write operations very expensive and a bottleneck for scalability. Thus, an efficient mechanism for updating replicas would be very useful.

We have developed a CAS-like primitive that is inspired from multi-processor synchronization primitives [28]. We have implemented the CAS-like update mechanism by using a hierarchical tree-based multilevel CAS. It provides only serializability semantics. The main idea is that CAS-like update mechanism is optimistic, compared to the pessimistic approach of 2PC or mutual exclusion. A simple way to realize the CAS-like update mechanism is to have only a single node (owner) at the top of the hierarchy. A CAS operation is done at each level of the hierarchy and propagated up, if the local CAS succeeds. If the local CAS fails, the CAS returns failure. Different nodes may be owners for different data items for load distribution. However, this mechanism may not be able to handle failures. To handle failures, multiple nodes may be required at the root of the hierarchy. But this may lead to consistency problems at the root set. The interfaces required to realize this CAS-like update mechanism is given below.

  • Object CompareAndSwap(String oid, Object oldValue, Object newValue)

This is the CAS method on the OMR. This method is invoked by a CAS operation from the child. If local CAS succeeds, it sends a CAS request to the owner of the object with id as oid. If local CAS fails, it returns a failure. There is a similar CAS method on the DSM runtime object, which is invoked by the application. The DSM runtime object invokes CAS on the OMR of the cluster.

Optimized Consistency Mechanism - OMR Level 2PC

Consistency is maintained at the level of OMRs only, and they are responsible for update propagation to other nodes. Thus, serializability protocol, if required is enforced only among the repositories and not among all replicas. The number of nodes among which agreement must be reached for serializing writes is decreased drastically. As a result, it is much easier to implement. Further, as the number of nodes is higher, greater is the failure probability. 2PC will be even more expensive given failure scenarios. The optimized mechanism is hence, expected to scale better, especially in a wide-area setting (in spite of its dynamic nature). However, this method does not ensure strict serializability among all replicas. It is a different way of relaxing consistency compared to the normal ways [12]. Applications which can tolerate slight inconsistencies such as those given in [29] can benefit. The interface changes required to realize this mechanism are:

  • The methods Prepare2PC, Commit2PC and Abort2PC on the DSM runtime object must be moved to the OMR.
  • public booleanWriteObjRepoSeq(String oid,Object obj) method must be added to the OMR. This method will initiate the 2PC among the OMRs, with this OMR acting as 2PC coordinator. The WriteSequential method on the DSM object must be modified to invoke this method of the OMR.
  • void notifyLocalSubscribers(Object obj, String eventType) method of the OMR2 must be modified to propagate updates to objects. Basically, the event space of Virat [17] is being used for update propagation by the OMRs within clusters.
Incorporating Pastry Routing Among OMRs

The Freepastry implementation of Pastry provides the commonapi class. This class is an abstraction of common APIs for P2P systems. Important methods of this class include route(Id key, Message msg, NodeHandle hint) and replicaSet(Id id, int maxRank). The route method implements the traditional P2P routing, ensuring that the message is routed to the node with the closest matching id that is alive. The deliver message on the receiving Application instance is invoked. The replicaSet returns a set of nodes on which replica meta-data of a given object can be stored.

If an OMR invokes the route method, the deliver message on the application process at the destination OMR is invoked. However, the originating OMR needs a handle to the destination OMR to invoke methods such as Read, Write etc. that are part of the OMR interface. Since, the route method does not return a handle, in our implementation the deliver method invokes the passMyHandle on the originating OMR to pass its own reference. The originating OMR makes invocations such as GetObjectDetailsWithID (to get details of an object created under a different OMR) on the handle.

Each OMR maintains a list of OMRs (since it gets an invocation from the OMR, it can get the IP address) for all objects for which it stores meta-data. This list is used for updating OMRs for Write operations. Currently, we provide only causal consistency in the P2P based version of Virat. It should be noted that for implementing causal consistency, local operations are sufficient and global agreement protocols may not be necessary [30]. Alternative consistency mechanisms for P2P systems have not been explored too much in the literature, with many current P2P systems such as Tapestry [15] assuming data is read-only. We leave it for future research to realize other consistency models in the P2P version of Virat.

An optimization to avoid 2PC among OMRs has been realized. When an OMR needs to write an object, instead of becoming a 2PC coordinator and realizing 2PC for serializing writes, the OMR sends an invalidate message to other OMRs. When OMRs receive invalidate message, they invalidate the local copy of the object. A pull based model is used for updating the values, implying that other OMRs can pull the values from the current writer. The OMRs are responsible for propagating the updates to the replicas within their cluster.

5 PERFORMANCE STUDIES

This section details some of the performance studies we have conducted over Virat to show its scalability. The initial set of experiments have been conducted on an Institute wide network, consisting of about thirty five heterogeneous machines, each having memory from 64MB to 1GB, processing speed from 350MHz to 2.4 GHz and running different operating systems (linux, solaris etc.). The machines are spread across three clusters, with each cluster being connected by a 10/100 Ethernet connection. A few nodes from an engineering college in Trichy3 were used and a widearea testbed was formed. This testbed was used for the wide-area results reported in this paper.

Figure 1 shows the overhead of the WriteSequential method call with increasing number of nodes and hence clients (each node can be hosting hundreds of clients). It can be observed that the naive Virat incurs nearly linear overheads. This nontrivial overhead is more prominently seen in the wide area results. Thus, as the number of Internet nodes participating in Virat increases, the naive Virat may have trouble scaling up. This is due to the overhead involved in the 2PC algorithm that is required to serialize the writes.

Figure 1: Write Overheads in Naive Virat on Campus Intranet and Wide-Area Testbeds

Figure 2 shows the overhead of the WriteSequential method call for the CAS-like update mechanism, with only one node at the top of the tree. It can be observed that this method performs far better, that too with increasing number of nodes. However, the fact that there is only one node at the top of the hierarchy implies that this mechanism is prone to failures, both node as well as network failures. If the number of nodes forming the top of the hierarchy is increased, failures can be tolerated better. But, this leads to issues of synchronization between the top level nodes. This update mechanism becomes similar to the optimized Virat mechanism (with a 2PC at object repository level), which handles consistency such issues.

Figure 2: Performance of CAS-like Update Mechanism on Campus Intranet

Figure 3 shows the performance of the WriteSequential method in optimized Virat. Since the overheads of 2PC are reduced by increasing the granularity of consistency, the performance degrades slowly, even in the wide-area testbed.

The overhead of object replication has been measured in both the naive Virat (Virat without P2P routing between OMRs) and the P2P based Virat. These overheads have been measured by creating an object replica (by the Create method call on the Virat interface) from an OMR in a different cluster than the one on which the object was originally created. The originating OMR sequentially searches through each of the other OMRs to find which OMRs maintain information about this object in naive Virat. In the P2P version, the originating OMR does a P2P routing to find the OMR which maintains meta-data about this object. Consequently, the overhead of object replication across clusters is much higher in naive Virat compared to the P2P based Virat. This is depicted in figure 4.

Figure 3: Performance of Optimized Virat on Campus Intranet and Wide-Area Testbeds
Figure 4: Object Replication Overhead in Naive Virat and P2P Based Virat on Campus Intranet Testbed

6 RELATED WORK

Two other models have endeavoured to extend DSM across clusters: JDSM [31] and Lemuria [32]. JDSM presents an implementation of software DSM written in Java that can be ported to several cluster platforms. JDSM attempts to provide complete distribution transparency, which means that the programmer is totally transparent to the DSM. This results in non-trivial overheads to check object accesses. This, coupled with cache coherency maintenance costs and lack of adequate failure handling mechanisms limits the scalability of JDSM. Lemuria uses a new consistency criteria called cluster based release consistency to enable the DSM to be used in a wide-area environment. However, no detailed study of the performance and scalability of these DSMs have been documented in the literature.

T spaces is a shared object space from IBM [11] that adds database functionality to Linda tuplespace [4] and is implemented in Java to take advantage of its wider usability. In addition to the traditional Linda primitives of in, out, read, T spaces supports set oriented operators and a novel rendezvous operator called rhonda. However, scalability is not one of the design goals of T spaces nor has it been evaluated for scalability. It may have difficulty scaling up, as it uses tspace server as a centralized component. Even if an application uses multiple tspace servers, the communication between these servers is not P2P. Hence, just like a naive Virat where communication between OMRs was not P2P, T spaces may not scale. Further, it does not address failures. At the Internet scale, failures become a serious concern.

Legion is a middleware approach targeted at providing a global virtual computer abstraction, that is built on top of Mentat, a parallel programming platform. Legion addresses wide area scalability issues explicitly. However, it does not provide distribution transparency to applications. It is well known that though object request brokers such as CORBA provide distribution transparency and various services such as transactions, replication and fault tolerance is not easily handled. FT-CORBA, the Fault Tolerance specification of CORBA [33] which addresses replication, still has serious limitations with respect to: nondeterminism, transparency, three kinds of state (application, POA, infrastructure), weak identity, replication of clients and factories etc. [34].

Globe [35] is a middleware for constructing wide-area applications that addresses scalability issues explicitly. However, for scaling up to the wide-area, it needs scalable object location services as documented in [35]. They have addressed scalable location services for Globe in [36]. The location service is based on a worldwide distributed search tree that can be compared to the Distributed Hash Table (DHT) on which many structured P2P systems [37] including Pastry and consequently Virat is based on. The tree based approach is scalable in terms of performance but does not handle node failures, whereas DHTs handle both.

7 CONCLUSIONS

We have analyzed both the geographic and numerical scalability of Virat, a shared object space by using an analytical model that we had proposed earlier. We had to redesign (and re-implement) Virat along two dimensions: consistency granularity - maintain consistency at the level of OMRs; fault tolerance - make the OMRs form a P2P overlay to handle failures and scale up better. Thus, Virat becomes a unique scalable P2P based wide-area shared object space. To the best of our knowledge, shared object space or Distributed Shared Memory (DSM) implementations have not been realized over P2P systems, nor scaled to the wide-area. We have also presented performance studies over a wide-area testbed to show the scalability improvements made in Virat.

An interesting future direction is to make Virat a completely P2P shared object space, instead of making only the OMRs form a P2P overlay. All the nodes could form the overlay, with OMRs being super-peers. We are integrating Virat into Vishva, a two layered P2P routing substrate that we have developed as a middleware for grid computing systems [38]. This would enable OMR data to be replicated within a zone and make it easier to keep the data consistent. We are also building a replication service for large data grids over Virat and Vishva. This is a step towards our end goal of realizing large data grids over Virat. The idea of building grids over shared object spaces is also being exploited by other researchers [39].


Footnotes

1 This is due to its choice of which objects to replicate - those with high read/write ratios and efficient implementation of totally ordered group communication.

2 This method normally notifies events to the local subscribers (who are within this cluster).

3 Another city in Tamil Nadu, about 200 miles from Chennai.

REFERENCES

[1] C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. TreadMarks: Shared Memory Computing on Networks of Workstations. IEEE Computer, 29(2):18–28, February 1996.

[2] Weimin Yu and Alan Cox. Java/DSM: A Platform for Heterogeneous Computing. In ACM 1997 Workshop on Java for Science and Engineering Computation, June 1997.

[3] John B Carter. Design of the Munin Distributed Shared Memory System. Journal of Parallel and Distributed Computing, 29(2):219–227, February 1995.

[4] Nicholas Carriero and David Gelenter. Linda in Context. Communications of the ACM, 4(32):444–458, April 1989.

[5] Henri E Bal, M Frans Kaashoek, and Andrew S Tanenbaum. Orca: A Language for Parallel Programming of Distributed Systems. IEEE Transactions on Software Engineering, 18(3):190–205, 1992.

[6] Tsun-Yu Hsiao and Shyan-Ming Yuan. Practical Middleware for Massively Multiplayer Online Games. IEEE Internet Computing, 9(5):47–54, September/October 2005.

[7] Weijian Fang, Cho-Li Wang, and Francis C M Lau. On the Design of Global Object Space for Efficient Multi-threading Java Computing on Clusters. Parallel Computing, 29(11-12):1563–1587, 2003.

[8] Sun Microsystems. JS - JavaSpaces Service Specification. http://java.sun.com/products/jini/2.0/doc/specs/html/js-spec.html, 2001.

[9] Henri E Bal, Raoul Bhoedjang, Rutger Hofman, Ceriel Jacobs, Koen Langendoen, Tim Ruhl, and M Frans Kaashoek. Performance evaluation of the orca shared-object system. ACM Transactions on Computer Systems, 16(1):1–40, 1998.

[10] A Vijay Srinivas and D Janakiram. A Model for Characterizing the Scalability of Distributed Systems. ACM SIGOPS Operating Systems Review, 39(3):64–72, 2005.

[11] P Wyckoff, S W McLaughry, T J Lehman, and D A Ford. T Spaces. IBM Systems Journal, 37(3):454–474, 1998.

[12] M Raynal, G Rhia-kime, and M Ahamad. Serializable to Causal Transactions for Collaborative Applications. In Proceedings of the 23rd Euromicro Conference. Budapest, Hungary, September 1997.

[13] Mustaque Ahamad and Rammohan Kordale. Scalable Consistency Protocols for Distributed Services. IEEE Transactions on Parallel and Distributed Systems, 10(9):888–903, September 1999.

[14] Antony Rowstron and Peter Druschel. Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. In Proceedings of the 18th IFIP/ACM International Conference on Distributed Systems Platforms (Midleware 2001), pages 329–350, Heidelberg, Germany, November 2001.

[15] Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C Rhea, Anthony D Joseph, and John D Kubiatowicz. Tapestry: A resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications, 22(1), January 2004.

[16] A Vijay Srinivas, D Janakiram, and Raghevendra Koti. Virat: An Internet Scale Distributed Shared Object, Event and Service Space. Technical Report IITM-CSE-DOS-2004-03, Distributed & Object Systems Lab, Indian Institute of Technology, Madras, 2004.

[17] A Vijay Srinivas, Raghavendra Koti, A Uday Kumar, and D Janakiram. Realizing Large Scale Distributed Event Style Interactions. In Proceedings of the European Conference on Object Oriented Programming (ECOOP) Workshop on Communication Abstractions for Distributed Systems. Oslo, Norway, 2004.

[18] D Janaki Ram, N S K Chandra Sekhar, and M Uma Mahesh. A Data-Centric Concurrency Control Mechanism for Three Tier Systems . In Proceedings of the IEEE Symposium on Web caching for E-business, part of IEEE Conference on Systems, Man and Cybernetics. Arizona, USA, 2001.

[19] Prasad Jogalekar and Murray Woodside. Evaluating the Scalability of Distributed Systems. IEEE Transactions on Parallel and Distributed Systems, 11(6):589–603, June 2000.

[20] J A Rolia and K C Sevcik. The Method of Layers. IEEE Transactions on Software Engineering, 21(8):689–700, August 1995.

[21] J E Neilson, C M Woodside, D C Petriu, and S Majumdar. Software Bottlenecking in Client-Server Systems and Rendezvous Networks. IEEE Transactions on Software Engineering, 21(9):776–782, September 1995.

[22] Douglas Schmidt, Michael Stal, Hans Robert, and Frank Buschmann. Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects. John Wiley & Sons, Inc., 2000.

[23] Paul Brebner and Jeffrey Gosper. How scalable is J2EE technology? ACM SIGSOFT Software Engineering Notes, 28(3):4–10, May 2003.

[24] Eric Brewer. Lessons from Giant Scale Services. IEEE Internet Computing, 5(4):46–55, July/August 2001.

[25] Chi Zhang and Zheng Zhang. Trading Replication Consistency for Performance and Availability: an Adaptive Approach. In Proceedings of the 23rd International Conference on Distribut ed Computing Systems. Providence, Rhode Island, USA, May 2003.

[26] D. Janaki Ram, M. Uma Mahesh, N.S.K. Chandra Sekhar, and Chitra Babu. Causal Consistency In Mobile Environment. ACM Operating Systems Review, 35(1):34–40, January 2001.

[27] Steven D Gribble, Eric A Brewer, Joseph M Hellerstein, and David Culler. Scalable, Distributed Data Structures for Internet Service Construction. In 4th Symposium on Operating System Design & Implementation. San Diego, California, USA, October 2000.

[28] Maurice Herlihy, Victor Luchangco, and Mark Moir. Obstruction-Free Synchronization: Double-Ended Queues as an Example. In 23rd International Conference on Distributed Computing Systems (ICDCS 2003). Providence, Rhode Island, USA, May 2003.

[29] Mustaque Ahamad, Ranjit John, Prince Kohli, and Gil Neiger. Causal memory meets the consistency and performance needs of distributed applications! In EW 6: Proceedings of the 6th workshop on ACM SIGOPS European workshop, pages 45–50, New York, NY, USA, 1994. ACM Press.

[30] Mustaque Ahamad, Phillip W Hutto, Gil Neiger, James E Burns, and Prince Kohli. Causal memory: Definitions, implementation and programming. Technical Report GIT-CC-93/55, Georgia Institute of Technology, 1994.

[31] Yukihiko Sohda, Hidemoto Nakada, Hirotaka Ogawa, and Satoshi Matsuoka. Implementation of Portable Software DSM in Java . In Joint ACM Java Grande - ISCOPE 2001 Conference, pages 163–172. Palo Alto, CA , USA, June 2001.

[32] Saito Shoichi and Kunieda Yoshitoshiand Ojubo Eiji. An Implementation of Distributed Shared Memory in Wide Area Network and Its Performance Evaluation. IPSJ JOURNAL, 6(40), 2001.

[33] Object Management Group. Fault Tolerant CORBA (Final Adopted Specification), December 2001. formal/01-12-29.

[34] Pascal Felber and Priya Narasimhan. Experiences, Strategies, and Challenges in Building Fault-Tolerant CORBA Systems. IEEE Transaction on Computers, 53(5):497–511, May 2004.

[35] Maarten van Steen and Philip Homburg and Andrew S. Tanenbaum. Globe: A Wide-Area Distributed System . IEEE Concurrency, 7(1):70–78, January-March 1999.

[36] M Van Steen, F J Hauck, P Homburg, and A S Tanenbaum. Locating Objects in Wide-Area Systems. IEEE Communications Magazine, 36(1):104–109, January 1998.

[37] Ion Stoica, Robert Morris, David Karger, M Frans Kaashoek, and Hari Balakrishnan. Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. IEEE/ACM Transactions on Networking, 11(1):17–32, February 2003.

[38] M Venkateswara Reddy, M A Maluk Mohamed, Tarun Gopinath, A Vijay Srinivas, and D. Janakiram. Vishwa: A Paradigm for Anonymous Remote Computation and Communication for Peer-to-Peer Grid Computing. Technical Report IITM-CSE-DOS-2005-12, Distributed & Object Systems Lab, Department of Computer Science & Engineering, Indian Institute of Technology Madras, 2005.

[39] Tobin J Lehman and James H Kaufman. Optimal Grid: Middleware for Automatic Deployment of Distributed FEM Problems on an Internet-Based Computing Grid. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’03), pages 164– 171, Kowloon, Hong Kong, December 2003.

APPENDIX: PROGRAMMING WITH VIRAT SHARED OBJECT SPACE

We first illustrate how the client program will look like. This program creates an object and makes it sharable across nodes. The code to be written for accessing this shared object from a different node is shown next. The application is a simple chat application. It is taken for illustrative purposes and one can develop complex applications over Virat.



The above program has created a shared object and made it sharable. The object (identified by its oid) can be accessed from a remote node by writing the following code. The assumption here is that the object identifier is known at the remote node. By using other operations such as Read, Remove in addition to the Read and Write shown here, complex distributed applications can be developed over the Internet, as the shared object space has been shown to be scalable. The programmer only has to write this client code and run it on the available machines. He/She must also run an OMR object per cluster and the DSM object in each node.

About the authors



  A Vijay Srinivas obtained the MS (By Research) degree from the Distributed & Object Systems Lab, Indian Institute of Technology, Madras in 2001 and a Bachelors in Engineering from the University of Madras in 1998. He is currently finishing up his PhD from the same lab under Prof D Janakiram. His research interests span distributed systems, object technology and software engineering. His thesis is focused on building a data management platform for Peerto-Peer grids. He is a student member of the IEEE and a member of the ACM. He can be reached at avs@cs.iitm.ernet.in. See also www.cs.iitm.ernet.in/~avs India


space

D Janakiram is currently a professor in the Department of Computer Science and Engineering, Indian Institute of Technology (IIT), Madras, India. He obtained his Ph.D degree from IIT, Delhi. He heads and coordinates the research activities of the Distributed and Object Systems Lab at IIT Madras. He has published over 30 international journal papers and 60 international conference papers and edited 5 books. His latest book on Grid Computing has been brought out by Tata Mcgraw Hill Publishers in 2005. He served as program chair for 8th International Conference on Management of Data (COMAD). He is the founder of the Forum for Promotion of Object Technology, which conducts the National Conference on Object Oriented Technology(NCOOT) and Software Design and Architecture (SoDA) workshop annually. He is the prinicipal investigator for a number of projects which include the grid computing project from Department of Science and Technology, Linux redesign project from Department of Information Technology, Middleware Design for Wireless Sensor Networks from Honeywell Research Labs and A Mobile Data Grid Framework for Telemedicine from Intel Corporation, USA.

He has taught courses on distributed systems, software engineering, object-oriented software development, operating systems, and programming languages at graduate and undergraduate levels at IIT, Madras. He is a consulting engineer in the area of software architecture and design for various organizations. His research interests include distributed and grid computing, object technology, software engineering, distributed mobile systems and wireless sensor networks, and distributed and object databases. He is a member of the IEEE, the IEEE Computer Society, the ACM, and a life member of the Computer Society of India. He can be reached at djram@iitm.ac.in. See also www.cs.iitm.ernet.in/~djram


Cite this document as follows: A Vijay Srinivas and D Janakiram: Scaling a Shared Object Space to the Internet: Case Study of Virat, in Journal of Object Technology, vol. 5, no. 7, September–October 2006, pp. 75-95 http://www.jot.fm/issues/issue_2006_09/article4


Previous article

Next article