1 INTRODUCTION This is a multi-part column that will appear in the next several issues of JOT. This first installment outlines a well known branch and bound algorithm for solving the Traveling Salesperson Problem (TSP). A single-threaded Java implementation of this algorithm is presented and discussed along with some results on several moderate sized potentially intractable problems. The implementation provides an opportunity to discuss several important Java implementation issues. The second column presents another well-known branch and bound algorithm and its single-threaded Java implementation. This implementation also provides an opportunity to discuss some interesting implementation issues. In the third column, a multi-threaded implementation will be presented based on the second branch and bound algorithm introduced in the second column. Its performance will be compared to the single-threaded implementations presented earlier. This multi-threaded implementation sets the stage for the multi-process distributed implementation to be presented in the fourth column. This fourth column shall present a distributed implementation for solving TSP using remote method invocation (RMI) in Java. Results using five networked computers running in parallel and featuring three different operating systems will be presented and compared with the single and multi-threaded solutions. Some of the results presented in these columns might be of some value to educators teaching algorithm design or advanced Java or distributed computing. 2 THE TSP PROBLEM As researchers in the area of algorithm design know, the Traveling Salesperson Problem (TSP) is one of many combinatorial optimization problems in the set NP-complete. The only known solutions are of exponential complexity and are therefore intractable. Recall that the TSP assumes that the distances between n cities are specified in a cost matrix that specifies the non-negative cost between any pair of cities. A salesperson is given the task of starting at city 1, traveling to each of the other cities and then returning to city 1. Each city must be visited exactly once and no cities can be skipped. The goal is to find the sequence of cities that starts and ends with city 1 such that the overall cost of the tour is minimized. When one first encounters this classic problem, it seems relatively easy to solve. Perhaps that is because the problem itself is easy to understand. But consider the fact that there are (n – 1)! tours in an n-city problem since each tour must start and end with city 1. So only for toy-size problems, say n <= 14, would a brute force evaluation of all possible tours and subsequent computation of the minimum tour be feasible. For a problem with 4 cities, the 3! = 6 possible tours are: 1, 2, 3, 4, 1 Two different fairly well known branch and bound approaches to solving the TSP will be explored and implemented in Java. Several moderate sized problems of sizes 20, 24 and 33 will be used to test the implementations. 3 BRANCH AND BOUND ALGORITHM 1 FOR TSP WITH SYMMETRIC COST MATRIX A tree of nodes is generated where each node has specified constraints regarding edges connecting two cities in a tour that must be present and edges connecting two cities in a tour that cannot be present. Based on the constraints in a given node, a lower bound is formulated for the given node. This lower bound represents the smallest solution that would be possible if a sub-tree of nodes leading eventually to leaf nodes containing legal tours were generated below the given node. If this lower bound is higher than the best known solution to-date, the node may be pruned. This pruning has the effect of sparing the computational process the need to generate nodes below the given node. This could result in a significant saving if the pruned node were relatively near the top of the tree. Let us explore the mechanism for computing lower bounds for a node.
Consider the tour 1, 2, 3, 1. The sum of the edges going into and out of each node is: Node 1: 2 + 3 = 5 Therefore the tour cost = (5 + 8 + 7) / 2 = 10 For the tour 1, 3, 2, 1 the sum of the edges going into and out of each node is: Node 1: 3 + 4 = 7 Therefore the tour cost = (7 + 8 + 7) / 2 = 11 In general,
A solution tree is constructed by adding edges in lexicographic order. Each time we add a new node we employ decision tree logic regarding which nodes must be included or excluded from tours represented by the nodes. The rules that are used are: - If excluding an edge (x, y) would make it impossible for x or y to have as many as two adjacent edges in the tour, then (x, y) must be included.
- If including (x, y) would cause x or y to have more than two edges adjacent in the tour, or would complete a non-tour cycle with edges already included, then (x, y) must be excluded.
When the algorithm branches, and after imposing the decision logic to include or exclude edges, a lower bound is computed for the node. If the lower bound for a given node is as high or higher than the lowest cost tour found so far, we prune the node. If neither child can be pruned, the algorithm descends to the node with smaller lower bound using a depth-first search in the tree. After considering one child, we must again consider whether the sibling can be pruned since a new best solution may have been found.
Consider the six-city problem with cost matrix given as follows:
Although for this toy-sized problem it would be easy to enumerate the 120 possible tours and compute the tour with lowest cost, we shall illustrate the branch and bound process by constructing a solution tree. To compute the lower bound for a solution to the problem for the root node that contains no constraints (all tours are possible): The five edges incident on node 1 are: 8, 5, 3, 1, 2. The two smallest edges are 1 and 2. From the root node we generate two child nodes with the constraints 12 and *12. The constraint Let us walk through the computation of the lower bound 35 when the constraint is 12. The five edges incident on node 1 are: 8, 5, 3, 1, 2. Since edge 12 must be present, the sum of the two smallest edges in the presence of this constraint is 1 + 8 = 9. The five edges incident on node 2 are: 8, 4, 9, 2, 8. Since edge 12 must be present, the sum of the two smallest edges in the presence of this constraint is 2 + 8 = 10. The five edges incident on node 3 are: 5, 4, 9, 6, 7. The two smallest edges are 4 and 5. Twice the lower bound with the constraint 12 is therefore 9 + 10 + 4 + 5 + 1 + 1 + 1 + 1 + 1 + 2 = Using the same approach, it is easy to show that the lower bound on the node with constraint *12 is Since the right child node has a smaller lower bound than the left child node, the best first algorithm constructs nodes below the right child node. The left child of the node with constraint *12 has the constraints *12 and 13 and the right child the constraints *12 and *13 (the 13 and *13 are introduced in lexicographical order). The reader is left to verify that twice the lower bounds under these constraints are 28 and 26 respectively. The depth-first generation of the tree continues using the node with lower bound 26. Let us consider the right node with twice the lower bound of 26. The three constraints are 15, 16 and *56. These are constraints derived from the decision rules stated earlier. The 15 and 16 are present because of the presence of the *12, *13 and *14 (each tour must have two edges incident from node 1). The *56 is present since a non-tour cycle given by 1, 5, 6, 1 would be possible if the constraint *56 were not present. We shall perform one last lower bound computation on the node with constraints: *12, *13, 15, 16, *56. The five edges incident on node 1 are: 8, 5, 3, 1, 2. Edges 1 and 2 are smallest. For the problem stated above and using a descriptive symbolic notation, we show the complete sequence of nodes generated and pruned using the best-first, branch and bound algorithm outlined above. We define each node by its constraints, its computed lower bound (actually twice the lower bound) and its left and right children, each with their lower bounds. The actual sequence of actions may be followed by examining the sequence of bold-faced lines. It would be instructive to sketch the entire tree based on the sequence of events shown below.
branchAndBound(*12 *13 14 15 *16 23 *45 )
Prune node *12 13 *14 15 *16 *35Prune node 12
Cost of optimum tour: 15 Even for this toy-sized problem, the number of nodes actually generated is a relatively small fraction of the total number of nodes that would have been required if a bounding algorithm were not in play. For larger problems, there is of course no guarantee that the number of nodes that are pruned would make the solution computationally tractable. There is no way (at least no way known to this author) to predict the success of this algorithm other than simply deploying it on a given problem and deciding when “enough-is-enough.” 4 JAVA IMPLEMENTATION There are many challenges involved in implementing the algorithm described above. These include: - Determining how to generate and store the correct set of constraints for each node.
- Determining how to compute the lower bound for a node given its set of constraints.
- Determining when a node represents a complete tour.
- Determining a mechanism for generating nodes in the correct order and pruning nodes based on their computed lower bound.
The answers to the first three questions are encapsulated in a class Among the fields defined for class The methods The method It is recommended that the interested reader “walk-through” the various methods using a specific node taken from the example presented above. Verify that the methods
The fourth question is embedded in the next class TSP presented below. The methods
5 EMPIRICAL RESULTS The process of generating nodes is relatively expensive both in computational as well as memory resources. A tree containing millions, if not billions, of nodes may need to be generated in solving even a moderate sized TSP problem. As nodes are pruned (and hopefully there will be many of these), the facilities of automatic garbage collection is critically important. Storage for these nodes must be reclaimed by the garbage collector. Earlier attempts using languages without automatic garbage collection were not as successful because of heap overflow problems as well as other memory-related problems. A GUI application was constructed that enables the user to specify the size of a TSP problem and either enter the cost matrix by hand or load an external text file that contains one number per line (c A screen shot of the GUI using the 6 city problem discussed earlier is shown below. The entries in each row that comprise a solution are shown in red. The Console window provides a running account of the progress of the algorithm as well as a useful history that may be reviewed later. The most challenging problem solved to-date using this implementation is a 33 city problem that utilizes real data taken from the Rand McNally Road Atlas for 33 US cities. Running on a 1.7 GHz Pentium 4 machine under Windows 2000 with 256Mbytes of RAM, it took 5637 seconds (1.57 hours) to find the optimum solution of 10861 after generating 872,000 nodes of which 435,729 were pruned. It should be clear from the number of nodes generated on this relatively fast processor that the generation of nodes is computationally expensive. The average number of nodes generated per second was about 154 nodes per second. This is very slow. The reader must again be reminded that although a 33 city problem may sound like a small problem, there are 32! tours possible so such a problem qualifies as a computationally hard (perhaps intractable) combinatorial optimization problem. It is rather remarkable that any solution at all was found in reasonable time. It should be mentioned that the optimum solution of 10,861 was determined rather early in the computation process (reported in the system Console) even though it took almost two hours to finalize the solution. One might therefore be tempted to use this algorithm as a heuristic approximation in cases where a solution is not finalized in a reasonable period of time. This author has little experience in doing this for larger problem with known solutions. This suggests an area of potentially fruitful research.
Cite this column as follows: Richard Wiener: Branch and Bound Implementations for the Traveling Salesperson Problem - Part 1, in Journal of Object Technology, vol. 2, no. 2, March-April 2003, pp. 65-86. http://www.jot.fm/issues/issue_2003_03/column7 |