Previous article

Next article


Matching Objects Without Language Extension

Joost Visser, Departamento de Informática, Universidade do Minho, Portugal

space REFEREED
ARTICLE


PDF Icon
PDF Version

Pattern matching is a powerful programming concept which has proven its merits in declarative programming. The absence of pattern-matching in object-oriented programming languages is felt especially when tackling source code processing problems. But existing proposals for pattern matching in such languages rely on language extension, which makes their adoption overly intrusive.

We propose an approach to support pattern matching in mainstream object-oriented languages without language extension. In this approach, a pattern is a first-class entity, which can be created, be passed as argument, and receive method invocations, just like any other object.

We demonstrate how our approach can be used in conjunction with existing parser generators to perform pattern matching on various kinds of abstract syntax representation. We elaborate our approach to include concrete syntax patterns, and mixing of patterns and visitors for the construction of sophisticated syntax tree traversals.

Keywords: Pattern matching, object-oriented programming, syntax, parsing, tree traversal, term rewriting, strategic programming.


1 INTRODUCTION

Pattern matching is a programming concept that plays a central role in declarative paradigms, such as term rewriting and functional programming. Given a term and a pattern, i.e. a term in which variables may occur, the pattern matching problem consists in finding a substitution for the variables in the pattern such that the pattern and the term become equal. For instance, given the following pattern and term:

where x and y are variables, pattern matching yields the following substitutions:

Note that variable x occurs twice. When variables are allowed to occur more than once in a pattern, the pattern-matching problem is called non-linear.

Mainstream object-oriented programming languages, such as Java, C#, and C++, do not count pattern matching among their features. This makes these general purpose languages less suitable for the specific purpose of building source code processors. Switching to a declarative language is often not desirable for compelling practical reasons, such as the availability of programmers, tools, and libraries. Several proposals for extending object-oriented languages with pattern matching features exist, but adoption of such language extensions is likewise intrusive, since they interfere with the use of existing tools, development environments, libraries, modeling methods, refactoring aids, etc.

Figure 1: The basic idea of matching object graphs. The pattern is an object graph that contains variable objects, indicated by empty dashed circles. After matching, these variables are bound to corresponding objects in the term, indicated by the dotted arrows.

As we will see below, pattern-matching can be realized within the boundaries of existing mainstream object-oriented programming languages. Rather than providing pattern matching as a language construct, we will model patterns as objects that implement a particular behavioral interface. Thus, in this approach, pattern matching is not a syntactic construct that the compiler transforms into a matching engine. Rather, patterns are first-class entities that are created, can be passed as arguments, and respond to method invocations, just like any other object.

In Section 2 we provide an abstract specification of object matching in the form of a pattern interface and associated implementation obligations. Subsequently, in Section 3, we provide several concrete implementations of that interface for various kinds of compound objects. In Section 4, we elaborate the implementations for objects that represent abstract syntax trees by showing how patterns can be created via concrete syntax expressions. In Section 5 we show how object matching can be combined with object graph traversal based on visitor combinators. Section 6 briefly relates lessons learned from commercial application of our approach. Section 7 discusses related work, and Section 8 concludes.

2 OBJECT MATCHING, ABSTRACTLY

The basic idea of matching object graphs is illustrated in Figure 1. A pattern is an object graph that contains designated variable objects (indicated with dashed circles). These variables have no name and are not syntactically distinguished; they are identified by reference, as are all objects. The pattern is matched against an object graph that does not contain variable objects. Matching involves a comparison between the structure of both object graphs as well as between the types and states of the objects that appear in them. During matching, bindings are created from variables to corresponding objects (dotted arrows in the figure). To prevent pollution of class hierarchies with artificial variable objects, and to enable reuse of the same pattern for several matches, the variables and their bindings can be modeled with external collections (lists and maps, respectively).

Figure 2: The Pattern interface and an abstract class that implements its addVariable method. The reusable matchVariable method creates variable bindings. In case of non-linear patterns, the value of a previously bound variable is retrieved from the bindings for comparison.

The pattern interface

Given this basic idea, we can give an abstract specification of object matching in terms of a Java class interface, called Pattern in Figure 2, and its implementation obligations. The interface consist of three methods, one of which can be implemented once and for all, as shown by the abstract class AbstractPattern. Concrete implementations of the Pattern interface should have a constructor that initializes a field referencing the root of the pattern object graph. After pattern creation, the addVariable method is used for pattern initialization. It can be implemented by adding the variables to a list of variables in the pattern’s state, as demonstrated in AbstractPattern. Typically, several invocations of the addVariable method will follow the construction of the pattern.

The match method is intended to supply the actual pattern matching behavior. It takes the root of the term object graph, as well as a map into which to store the resulting variable bindings. The boolean result indicates whether the match was successful or not. This method should be implemented by simultaneously traversing the two object graphs. At each step, the pattern node should be checked for being a variable, as exemplified by the method matchVariable. If the pattern node is a variable that has been bound before, its value should be compared for equivalence with the corresponding node in the term graph. If the pattern node is a free variable, a new binding to the corresponding term node is created. Traversal should not descend into variables. In case of pattern nodes that are not variables, the corresponding nodes in the term graph should be checked for type equivalence and state equivalence; when equivalent, traversal proceeds deeper; when different, the traversal can be broken off and false should be returned.

The substitute method takes a map holding variable bindings as argument, and substitutes variables in the pattern by their corresponding values. As a result, an object graph of the type of the root of the pattern is returned. If the map does not contain mappings for all variables in the pattern, the substitution is unsuccessful, and returns null.

Note that the method implementations in AbstractPattern implicitly rely on the behaviour of the equals methods of objects that act as variables and their values. For example, the contains and containsKey tests will result in equality tests on the elements of the variable list and the keys of the map of bindings. In addition, there is an explicit call to the equals method involving the value of a previously bound variable. In cases where it is not appropriate to rely on the equals methods of the objects in question, one may provide alternative implementations of the addVariable and matchVariable methods, or resort to wrapper objects with alternative implementations of equals.

Using the interface

Returning to the example of Figure 1, a typical usage scenario starts with creating a pattern, as follows:

Here we assume that FPattern is a concrete implementation of the pattern interface for objects of type F (specific implementations follow below). Subsequently, the created pattern can be used to perform a match:

The match being successful, the bindings will hold entries for x and y, and the value of success will be true. The bindings can be queried for the values of these variables, or, optionally, to perform a substitution on a second pattern:

Figure 3: Pattern implementation for Lists. The worker method matchList performs simultaneous iteration over the pattern list and the term list. The substitute method creates a new list while iterating over the pattern list.

In this particular case, the resulting object assigned to g would be equal to new G(new B(),new H(new A())).

3 IMPLEMENTATIONS

The abstract specification of matching behaviour given above can be implemented differently for different types of objects, or even differently for the same type. We will discuss implementations for specific collection types and abstract syntax tree types. Finally, we will show a reflection-based implementation that handles a variety of different types by dispatching to type-specific implementations.

Containers

Starting simple, we will explain how a pattern implementation can be provided for any ordered container, such as an array or a list. A code example, provided in Figure 3, shows the implementation for the List class. The state of ListPattern holds a root object of type List, which is initialized with the constructor. The match method first checks the type of the term object, and then calls a worker method matchList. If the two lists are of the same size, this worker performs simultaneous iteration over both. During iteration, the matchVariable method is called to deal with elements of the first list that are registered as variables. Other elements are compared with the equals method1.

The substitute method creates a new list while iterating over the pattern list. At each step, either an element of the pattern list is added to the new list, or, in case of a variable element, the value element to which it is bound.

For other ordered containers, such as arrays, implementations are similar. Matching on containers without order is also imaginable, but would be more involved and computationally more demanding due to a larger search space of possible bindings.

Trees

Tree representations come in different flavors. For instance, the Antlr parser generator [14] comes with an abstract syntax tree interface, called AST, where tree nodes hold references to their first child and to their first sibling. Figure 4 shows this interface, together with a corresponding implementation of Pattern. To avoid repetition, we show only fragments of the worker method matchAST; the remainder of the code is similar to the code for ListPattern, modulo the type of the pattern root2. As can be gleaned from the code, the matching algorithm performs a simultaneous pre-order traversal of two ASTs. At each step, pattern nodes that are free variables give rise to new bindings. Pattern nodes that are previously bound variables, lead to equality checks. Pattern nodes that are constants lead to equality checks on the states of the nodes, and to further traversal into the respective first children and their siblings. The substituteAST method (not shown) likewise performs a pre-order traversal, where each encountered variable node is replaced by its value, and all other nodes are cloned.

For other tree representations where all nodes share a common node interface, similar implementations of the Pattern interface can be given. For instance, the JJTraveler visitor combinator framework [19, 4] offers a common tree interface, called Visitable, where subtrees are accessed by index. The ATerm interface of the Annotated Term Library [1] provides similar access to a maximally shared tree representation. The Node interface of the Document Object Model (DOM) [21] provides access to child nodes both by index, and by child and sibling references.

Figure 4: Pattern implementation for ASTs. The pattern AST and term AST are traversed simultaneously.

In cases where the tree (or graph) is represented with objects with disjunct node interfaces, one must provide equally many implementations of the pattern interface which call each other on pairs of sub-nodes with matching types. In the event that some references between nodes are not typed (for instance when they are held in an untyped collection like List), one can resort to reflection to select an appropriate pattern implementation when available. In the next section, we show how such reflection-based matching can be harnessed in a generic implementation of the pattern interface.

Generic Object Matching

Each of the implementations of the pattern interface discussed so far provides matching functionality for objects of a single type. In each case, the matching process starts with checking (using instanceof) whether the term object type is equal to the designated type of the pattern graph. The GenericPattern implementation in Figure 5 takes a different approach: the pattern graph can be of any type, and a choice is made between different matching algorithms, based on the types of both pattern and term. As can be seen in the figure, the choice is modeled with the logical or operator (cf. ||) where each disjunct represents a different matching attempt.

Figure 5: Generic pattern matching. Various matching policies are tried in turn. Run-time type inspection and type casts guard the dispatch to appropriate methods.

The first attempt takes care of the case where the pattern graph is a variable, using again the matchVariable method. Subsequently, a series of attempts is made using the match policies for various types of objects, exactly as discussed in the previous implementations except for an additional check on the type of the pattern type. For instance, the matchList method tests both pattern and term for being of type List before calling a list matching worker method. To ensure that this worker method descends into container elements, the equivalence check on elements (cf. Figure 3) is replaced by a recursive call to the generic matching algorithm. Finally, as default policy when all other matching attempts fail, an equality test is performed.

4 CONCRETE SYNTAX PATTERNS

The first-class status of patterns in our approach opens opportunities. For instance, pattern creation can be done not only by direct calls to constructor methods, but also by computation from concrete syntax expressions. We explain how this is done with two different parser generators: JJForester and Antlr.

Parsing patterns with JJForester

The input to JJForester are grammars in the purely declarative grammar formalism Sdf [8]. From such a grammar, a Java class hierarchy is generated to represent parse trees: non-terminals become abstract classes, and production rules become their concrete subclasses. The benefit of such a many-typed representation of parse trees is that Java’s type system can be leveraged to guard well-formedness of tree construction and access with respect to the abstract syntax.

Each generated abstract class has a static parse method that allows files or strings to be parsed by invocation of an external parser using the corresponding non-terminal as start symbol. The parsing technology employed is Generalized LR parsing [15, 2]. All generated classes implement the aforementioned Visitable interface of the JJTraveler framework, which allows subtree access by index. We supplied an Sdf grammar of the Java language to JJForester and obtained a class hierarchy of 104 abstract classes and 276 concrete classes.

Now, patterns can be created from concrete syntax expressions, for instance as follows:

Thus, invocations of three different parse methods result in the construction of the pattern itself and its two variables. This style of pattern creation from concrete syntax expressions is not only more convenient than constructor invocations, it also provides some isolation against grammar refactorings by abstracting over the underlying abstract syntax tree shapes.

Of course, these parser invocations are performed at run-time, rather than during compilation, leading to performance loss and potential run-time parse errors. These risks can be attenuated by taking care that each pattern is constructed only once, and by proper (unit) testing of all pattern construction code.

Note that the availability of concrete pattern matching reduces the utility of the generated class hierarchies. Without concrete patterns, the generated classes are used to guard well-formedness. With concrete syntax patterns, the parser performs this task, albeit at run-time only. As a consequence, one may opt to dispense with JJForester’s conversion of generic ASTs to strongly typed ASTs, processing the former directly. We have followed this approach in a case study involving Transact-SQL, on which we comment in Section 6.

Parsing patterns with Antlr

When using Antlr, the situation is slightly more complex. This parser generator consumes Yacc-like grammars, containing semantic actions. Depending on the decision of the programmer, these semantic actions may build abstract syntax trees of the AST type. The structure of the syntax trees does not necessarily strictly follow the structure of the grammar. The tool generates a parser class that provides one method per non-terminal. These parse methods take no arguments; the string or file to be parsed is passed to the parser via its constructor method. We have supplied a C# grammar to Antlr leading to a parser class with 219 parse methods and AST building capability.

To abstract over details of parser creation, we have defined a class CSharpParser-Factory with a method parse that takes as argument a string to be parsed and the name of a non-terminal to use as start symbol. This class allows us to conveniently create patterns from concrete syntax expressions. For instance:

This pattern can subsequently be used to match abstract syntax trees of conditional statements with negative conditions.

Note that this approach works only with Antlr grammars that are in some sense well-behaved with respect to tree construction. If the grammar contains nonterminals that do not return trees, or that return collections of subtrees that are joined only at higher levels of the grammar, then such non-terminals can not appear as pattern roots or in variable positions.

5 MATCHING AND TRAVERSAL

To take further advantage of the first-class status of our patterns, this section investigates some possibilities of combining generic pattern matching with generic visitor combinators. Generic visitor combinators are small, reusable classes that capture basic functionality, and can be combined in different constellations to construct more complex behavior [19].

We will generalize our pattern matching approach by combining patterns with visitors in two complementary ways:

patterns in visitors What if we want to match a pattern, not at the root of an object graph, but at a deeper node? If we capture the pattern inside a visitor combinator, it can be combined with a visitor combinator that performs traversal to realize pattern matching at arbitrary depths.

visitors in patterns What if we want to apply a visitor to a subgraph that matches with a variable in our pattern? We can eliminate the intermediate steps of binding to the variable and retrieving its value to be visited, simply by associating visitors directly to the pattern’s variables. Since visitors encapsulate behavior, this makes our patterns active.

Before explaining these two generalizations in detail, we provide a brief exposition of the concept of visitor combinators. For a more elaborate account, the reader should look elsewhere [19, 8, 4, 20].

Figure 6: Overview of JJTraveler, which offers a simple framework and a library of reusable visitor combinators.

Generic visitor combinators

Visitor combinator programming was introduced in [19], and can be understood as an improvement over the classic visitor design pattern [5] where three well-known short-comings are removed. Classic visitors resist combination, allow little traversal control, and are tied to a given class hierarchy. In contrast, visitor combinators capture pieces of functionality that can easily be combined, they allow the construction of sophisticated traversal strategies, and they are reusable across class hierarchies.

Visitor combinator programming in Java is supported by JJTraveler, which offers a simple framework and a library of reusable visitor combinators, as depicted schematically in Figure 6.

The JJTraveler framework offers two class-hierarchy independent interfaces, Visitor and Visitable. The latter, as mentioned before, provides a minimal interface for nodes that can be visited. The visitor interface provides a single visit method that takes any visitable node as argument. Each visit can succeed or fail, which can be used to control traversal behavior. Failure is indicated by a VisitFailure exception3.

The library of JJTraveler contains a number of predefined visitor combinators. These rely only on the generic visitor and visitable interfaces, not on any specific underlying class hierarchy. An excerpt from the library is shown in Table 1. There are basic combinators for sequencing, biased choice, and one-step traversal. From these, defined combinators can be composed. An example of a recursively defined visitor is the traversal strategy combinator TopDown(v), specified as follows:

Thus, TopDown first applies v to the current node, and then recursively applies the top down strategy to each of the children of the current node, yielding a preorder traversal of a tree. Similar specifications are provided in Figure 1 for other defined combinators. In fact, visitor combinators can be used to program all kinds of sophisticated generic traversal strategies. Details about the actual encodings of such defined combinators can be found elsewhere [19, 4].

Table 1: JJTraveler’s library (excerpt). Basic combinators are shown in the upper part. Defined combinators appear in the lower part together with their specifications in terms of the basic ones.

To use JJTraveler on the class hierarchy of an application, one needs to instantiate the framework appropriately. This is a simple job for tree representations with a single node interface, such as those of Antlr, DOM, or the ATerm Library (cf. Section 3). For more elaborate hierarchies, generative support may be used. For example, JJForester generates classes that implement JJTraveler’s visitor and visitable interfaces. Whether manually coded or generated, an important element in instantiations of the JJTraveler framework is a classical hierarchy-specific visitor which additionally implements the generic visitor interface by dispatching when possible to classical hierarchy-specific visit methods (cf. ClassicVisitor in Figure 6). By virtue of this additional implements relation, hierarchy-specific visitors can be passed as arguments to generic visitor combinators, and their hierarchy-specific behaviour will be triggered whenever appropriate.

Patterns inside visitors

The encapsulation of a pattern inside a visitor is shown in Figure 7. The class Match holds a reference to a pattern of type VisitablePattern that is initialized with its constructor. It implements the Visitor interface of JJTraveler. When visiting a visitable term, it tries to match it with the pattern. The success or failure of the visitor is determined by the success or failure of the match.

 

Figure 7: Patterns in visitors. The visitor combinator Match implements the visit method by matching the pattern to which it holds a reference against the visitable that it visits. Each successful visit leads to a new entry in a list of maps.

An example of using the Match combinator is to find all targets of null assignments inside a Java file:

After the traversal, the getMaps method applied to m will return a list of maps that each represents a singleton set of bindings. Each singleton contains a binding of an Identifier to the pattern’s variable x.

Patterns can also be used inside visitors to implement rewriting. Consider the following rewrite rule:

This law of programming [6] states that an if statement with a negative condition can be replaced by an if statement with a positive condition with swapped branches.

This rewrite rule can be implemented with the visitor combinator defined in Figure 8. The Rewrite visitor holds two patterns that represent the left-hand and right-hand sides of a rewrite rule. It implements the visit method by attempting to match the left-hand side pattern against the term being visited. If this match succeeds, the resulting bindings are substituted into the right-hand side pattern to obtain the rewrite result. Match failure makes the visitor fail.

Figure 8: Rewriting. The visitor Rewrite combines two patterns into a rewrite rule. When the first pattern matches on a visitable, the resulting bindings are used to create a new visitable from the second pattern.

Negative condition elimination can now be implemented as follows:


In this case, we selected the Innermost combinator, which captures the leftmost innermost rewriting strategy. This combinator is part of the JJTraveler library.

Visitors inside patterns

Active patterns, or patterns with visitors inside, can be implemented by refining a concrete pattern class, as shown in Figure 9. The class ActiveATermPattern extends the pattern interface with a putVisitor method for associating visitors with pattern graph nodes. These associations are stored in a map. The method visit applies a visitor associated to a pattern node, if such an association exists, to a corresponding term node. Success of the visit determines success of the match. ActiveATermPattern overrides the match method to attempt such a visitor-based match before the regular match attempt defined in ATermPattern (cf. super).

An example of using active patterns is to find if statements that contain null tests in their condition. First we create a visitor that searches for such tests, using the Match visitor:

Figure 9: Visitors in patterns. The specialization ActiveATermPattern of ATermPattern extends the pattern interface with putVisitor for associating visitors with pattern graph nodes. Also it extends the matchATerm method with an additional match attempt, which calls visit.

Then we embed this visitor in an active pattern, this one for Visitables:

Finally, we embed the active pattern in another Match visitor, and apply it to a file via a traversal combinator:

After traversal, the bindings stored in the state of match hold expressions tested for nullity in conditions of if statements. As we will explain in the related work section, this style of mixing traversal strategies with first-class matching patterns was inspired by the capabilities of the strategic term rewriting language Stratego.

6 APPLICATION AND ITS LESSONS

Our techniques have been applied in the construction of source code analysis components for the Software Analysis Toolkit (SAT) of the Software Improvement Group [16]. The SAT is used for software risk assessment [3] and software portfolio monitoring [9]. The SAT has an open architecture including frameworks for source code analysis and for storing, processing, and visualizing analysis results. The main implementation language is Java. Both Antlr and JJForester are used in its source code analysis components.

Our pattern-matching techniques have been applied for processing Java, PL/SQL, Transact-SQL, and C#. JJForester was used for the first three languages and Antlr for the last. In case of Java, generated AST classes were employed while generic ASTs were used for the SQL dialects and for C#. For each language, patterns have been developed for the construction of several metrics collectors and dependency extractors. In all cases, concrete patterns were employed in combination with implementations of the Pattern interface for common node interfaces, such as antlr.AST, aterm.ATerm, and jjtraveler.Visitable.

Based on lessons learned from using our techniques in a commercial software construction context, some preliminary practical guidelines can be formulated:

  • To prevent repeated parsing of concrete patterns and their variables, caching techniques such as the singleton design pattern can be employed.
  • At least one unit test should be developed for each concrete syntax pattern, to check the syntactic well-formedness of the pattern.
  • Concrete syntax patterns can be decoupled from particular grammars to make them reusable across dialects or families of languages. For examples, in spite of many deviations, the PL/SQL and Transact-SQL dialects are sufficiently similar to allow many patterns to be developed once and used for both.

The application experiences have also led to the development of additional library functionality, for instance for combining multiple patterns, for more convenient variable insertion, and for recurring test tactics.

7 RELATED WORK

Java language extensions The Pizza language [13] is an extension of the Java language with three concepts taken from functional programming: higher-order functions, parametric polymorphism (type parameters), and algebraic data types with pattern-matching. JMatch [10] is also an extension of the Java language with pattern-matching. In this case, matching is not based on algebraic datatypes, but through a generalization of constructor methods, called pattern constructors. Tom [11] also provides support for algebraic data types with pattern-matching. Tom is not a full fledged language extension, but rather a pre-compiler.

In comparison to our approach, which stays completely within Java, language extensions provide several advantages. They allow optimizing compilation of matching expressions. They allow static checking beyond the capabilities of the Java type system. They may provide more concise syntax. On the flip side, the introduction of a new language is more intrusive on the existing programming practices of potential users. Also, modeling pattern matching within Java provides more flexibility, allowing, as we have demonstrated, to introduce variations or to combine the approach with other techniques.

Scala The novel Scala programming language [12] provides a fusion of functional and object-oriented programming and counts pattern matching among its features. This is realized by allowing the user to tag classes with a case modifier, with two effects regarding its primary4 constructor method. Firstly, the constructor can be called without using the new keyword. Secondly, the constructor can be used in the pattern positions of case branches in switch-like pattern-match expressions.

Though Scala programs resemble Java programs and are interoperable with them, Scala is a distinct language with distinct semantics, documentation, libraries, tool support, and user community. Our pattern-matching approach is viable within the confines of any main-stream object-oriented language such as Java, C#, or C++.

Nevertheless, a comparison between Scala and our approach is interesting. Patterns in Scala are syntactically nicer than ours, due to the possibility of dropping the new keyword. Scala’s patterns can be nested, just as ours, but non-linear matches are not supported. Also, there is no notion of active patterns. Scala’s patterns are not first class, but the blocks of case branches in which they occur are anonymous functions, which are first class.

Strategic pattern matching The Stratego language [18, 17] is a term rewriting language that does not offer a single fixed rewriting strategy, but allows the programmer to compose different rewriting strategies from basic strategy combinators. The notion of visitor combinator can be seen as an object-oriented counterpart of Stratego’s strategies. Stratego’s pattern matching features include congruences, which are term patterns that appear in strategy positions, and that can contain strategies in subterm positions. The semantics of a congruence is to match against an incoming term, and apply the strategies in subterm positions to the subterms of the incoming term. Thus, congruences allow mixing of traversal and pattern matching, in a similar vein as our Match and ActivePattern combinators.

8 CONCLUDING REMARKS

We have demonstrated that sophisticated pattern matching support can be obtained within mainstream object-oriented languages without resorting to a language extension. Fundamental to the approach is the observation that object graphs themselves can serve as patterns and their variables. We have provided an abstract specification of object graph matching in terms of a behavioral interface and its implementation obligations. We have shown that these obligations can be met for specific types of objects as well as in a generic, but customizable manner. Taking advantage of the first-class status of patterns, we have elaborated support for concrete syntax patterns, match and rewrite visitors, and active patterns.

Since we do not rely on extending an existing language, the adoption of our approach does not interfere with the use of existing tools, development environments, libraries, modeling methods, refactoring aids, etc. Still, our pattern matching approach is not intended to disavow the merits of language extensions. In fact, a number of caveats of our approach are worth pointing out. For example, since pattern matches are not compiled, there is no opportunity for static optimizations. Also, the concrete syntax expressions used to create patterns are only syntactically checked at run-time and require testing to dispel errors.

A final trade-off between using a language extension or the programmatic matching technique presented in this paper can only be made in the context of particular development projects and processes.

Future work

The work presented in this paper can be generalized in several ways. We list some.

Unification When the pattern is not matched against a closed term, but against another pattern, a more general problem, called unification, must be solved. Unification plays a central role in logic programming, type inference in functional programming, and in some artificial intelligence approaches. A comprehensive survey of unification can be found in [7]. We would like to investigate a generalization of object matching to object unification.

Further host languages We implemented our ideas in Java, but other host languages are equally viable. A C# implementation is high on the wish list.

Availability

The code presented in this paper is part of a library distribution, named MatchO, which is available from the author’s web pages.

Acknowledgments

Thanks to Rob van der Leek of the Software Improvement Group for valuable feedback regarding this paper and the MatchO library. The author is recipient of a research grant from the Fundação para a Ciência e a Tecnologia, under grant number SFRH/BPD/11609/2002.

Footnote

1 At the end of this section, this equivalence check becomes a recursive invocation of our more generic, reflection-based implementation.

2 For simplicity, we ignore that Antlr ASTs do not necessarily implement deep equality (traversing into children) in their equals method. The actual implementation uses a wrapper class to fix that.

3 A variation of this design uses boolean return values to indicate success and failure, at the expense of supporting node replacement via Visitable return values.

4 In Scala, a class definition can have parameters, which at once define instance variables and a so-called primary constructor with those parameters, that initializes these instance variables.


REFERENCES

[1] M.G.J. van den Brand, H.A. de Jong, P. Klint, and P.A. Olivier. Efficient annotated terms. Software, Practice and Experience, 30(3):259–291, 2000.

[2] M.G.J. van den Brand, J. Scheerder, J. Vinju, and E. Visser. Disambiguation filters for scannerless generalized LR parsers. In N. Horspool, editor, Proceedings of the 11th International Conference on Compiler Construction, volume 2304 of LNCS, pages 143–158. Springer, 2002.

[3] A. van Deursen and T. Kuipers. Source-based software risk assessment. In Proceedings 21 International Conference on Software Maintenance (ICSM’03), pages 385–388. IEEE Computer Society, 2003.

[4] A. van Deursen and J. Visser. Source model analysis using the JJTraveler visitor combinator framework. Software: Practice and Experience, 34(14):1345–1379, 2004.

[5] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1994.

[6] C.A.R. Hoare. Laws of programming. Communications of the ACM, 30(8):672–686, August 1987.

[7] K. Knight. Unification: a multidisciplinary survey. ACM Computing Surveys, 21(1):93–124, 1989.

[8] T. Kuipers and J. Visser. Object-oriented tree traversal with JJForester. Science of Computer Programming, 47(1):59–87, November 2002.

[9] T. Kuipers and J. Visser. A tool-based methodology for software portfolio monitoring. In Mario Piattini and Manuel Serrano, editors, Proceedings of the 1st International Workshop on Software Audits and Metrics, pages 118–128. INSTICC Press, 2004.

[10] J. Liu and A.C. Myers. JMatch: Iterable abstract pattern matching for Java. In Proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages, January 2003.

[11] P.-E. Moreau, C. Ringeissen, and M. Vittek. A pattern matching compiler for multiple target languages. In G. Hedin, editor, Proceedings of the 12th International Conference on Compiler Construction, volume 2622 of LNCS, pages 61–76. Springer, 2003.

[12] M. Odersky et al. An overview of the Scala programming language. Technical Report IC/2004/64, EPFL, 2004.

[13] M. Odersky and P. Wadler. Pizza into Java: Translating theory into practice. In Proceedings of the 24th ACM Symposium on Principles of Programming Languages, pages 146–159. ACM Press, 1997.

[14] T. Parr. The Antlr web page. www.antlr.org.

[15] J. Rekers. Parser Generation for Interactive Environments. PhD thesis, University of Amsterdam, 1992.

[16] Software Improvement Group home page. www.sig.nl.

[17] E. Visser. Strategic pattern matching. In P. Narendran and M. Rusinowitch, editors, Proceedings of the 10th International Conference on Rewriting Techniques and Applications, volume 1631 of LNCS, pages 30–44. Springer, 1999.

[18] E. Visser, Z. Benaissa, and A. Tolmach. Building program optimizers with rewriting strategies. ACM SIGPLAN Notices, 34(1):13–26, 1999. Proceedings of the International Conference on Functional Programming, ICFP’98.

[19] J. Visser. Visitor combination and traversal control. ACM SIGPLAN Notices, 36(11):270–282, 2001. Proceedings of the 2001 ACM Conference on Object-Oriented Systems, Languages and Applications, OOPSLA 2001.

[20] J. Visser. Generic Traversal over Typed Source Code Representations. PhD thesis, University of Amsterdam, 2003.

[21] World Wide Web Consortium. Document object model. www.w3.org/DOM/.

About the author







Joost Visser Currently, Joost Visser is Post-Doctoral fellow at the Departamento de Informática of the Universidade do Minho in Braga, Portugal. Before, he was a researcher at the Centre for Mathematics and Computer Science (CWI) in Amsterdam, The Netherlands, and he worked as senior consultant at the Software Improvement Group (SIG) in Amsterdam, The Netherlands. He can be reached at joost.visser@di.uminho.pt. See also http://www.di.uminho.pt/˜joost.visser.


Cite this article as follows: Joost Visser: "Matching Objects Without Language Extension", in Journal of Object Technology, vol. 5, no. 8, November-December 2006, pages 81-100, http://www.jot.fm/issues/issue_2006_11/article2

 


Previous article

Next article