Branch and Bound Implementations for the Traveling
Salesperson Problem
Part 3: Multi-threaded solution with many inexpensive nodes
Richard Wiener, Editor-in-Chief,
JOT, Associate Professor, Department of Computer Science, University
of Colorado at Colorado Springs
|
 |
COLUMN

PDF Version |
The multi-threaded implementation
presented in this column sets the stage for the distributed processing
implementation to be presented in the next column.
In the previous column a best-first branch and bound algorithm was introduced
and implemented. This algorithm forms the basis for the current work
(please review the details of that algorithm before continuing with
this paper).
Class Node is unchanged from the single-threaded implementation
presented in the previous column. A new thread class ProcessNodes
is introduced and is presented in Listing 1.
Listing 1 – Thread class ProcessNodes

Thread instances are spawned by the revised class TSP presented
in Listing 2. When a ProcessNode thread is spawned, the constructor
takes as input a thread number (assigned by a TSP object),
a priority queue (a single node at level 2 in the tree structure –
there are n – 1 such nodes where n is the number of cities in
the TSP problem), the number of cities, the cost matrix, the total node
count to-date for the thread and an instance of class TSP (so
the thread can communicate with the tsp object).
Listing 2 – Class TSP






In method generateSolution, the root node and all its children
(at level 2) are generated before any threads are spawned. Then the
following segment of code is used to create the threads:

Each thread is handed a child node (partial tour consisting of node
1 as well as another city). The child nodes at level 2 are prioritized
based on their lower bound (the smallest lower bound having the highest
priority). The threads are held in an array of ProcessNode
(called threads). The thread[0] starts with a child
node of smallest lower bound. The thread[1] has the second
lowest lower bound. There is of course no guarantee that the solution
exists under any of the initial level 2 nodes handed off to the various
threads (6 are shown by default). When a thread completes the processing
of the node that it was handed (this will probably require the generation
of millions of nodes), it sends the tsp object a stop command.
If the pool of available tsp nodes is depleted (queue size
of size 0), the value of the field numberStopped is incremented
by one (see method stop in class TSP). Otherwise a
new node from among the original level 2 nodes is handed off to a fresh
thread that is spawned (recall that once a thread is terminated it cannot
be re-started) and removed from the tsp queue. The application
terminates when all the level 2 nodes in the tsp object have
been removed (and handed off to threads) and all the threads have terminated
(the value numberStopped is equal to the number of threads).
Empirical Results
We consider some results run on two computers: a single-processor Pentium
4 with a 1.7Ghz processor and a dual G4 processor Powermac running under
Mac OS 10.2.3. It is interesting to see how the Powermac with its dual
processors is able to take advantage of the multiple threads. JDK 1.4.1
is used on each of the machines (a beta release on the Powermac).
The results are summarized in the table below.
Comparison of Single-Threaded vs. Mult-Threaded
TSP Implementations on a Single vs. Dual Processor Machines
| Machine |
Number of Cities |
Execution Time
(Single Threaded Previous Implementation) |
Execution Time
(Using 6 threads with current implementation) |
| Pentium 4, 1.7 GHz |
20 |
21.801 seconds
|
19.358 seconds |
Powermac (Dual g4 processors) |
20 |
30.597 seconds |
16.486 seconds |
| Pentium 4, 1.7GHz |
24 |
299.25 seconds
|
343.944 seconds
|
| Powermac (Dual G4 processors) |
24 |
410.984 seconds |
276.729 seconds |
Several conclusions may be gleaned from the results in the above table.
- The Mac OS is able to take good advantage of multiple threads by
efficiently dispatching threads to each of its two processors. For
both TSP problems the Mac’s multi-threaded solution was significantly
faster than the single-threaded solution. In fact for both problems
the Powermac’s multi-threaded solution was the fastest among
the two computers even allowing for the fact that the Pentium 4 machine
has a clock speed that is 1.36 times faster than the Powermac’s
clock speed.
- The execution time for the single-threaded implementation on the
two computers is close to the ratio of their respective clock speeds.
This suggests that the Powermac is not able to take advantage of its
dual processors in the single-threaded implementation.
- The multi-threaded solution on the Pentium 4 machine was slightly
faster for the 20 city problem and significantly slower for the 24
city problem. This clearly suggests (as should be fairly obvious)
that the problem instance itself affects the ability of the multi-threaded
solution to take advantage of the input data. This is further reflected
in the relative improvement that the multi-threaded solution on the
Powermac displayed for the 20 and 24 city problems. The degree of
improvement was smaller for the 24 city problem than the 20 city problem
where the execution time was almost cut in half.
About the author

|
 |
Richard Wiener is Associate
Professor of Computer Science at the University of Colorado at Colorado
Springs. He is also the Editor-in-Chief of JOT and former Editor-in-Chief
of the Journal of Object Oriented Programming. In addition to University
work, Dr. Wiener has authored or co-authored 21 books and works
actively as a consultant and software contractor whenever the possibility
arises. |
Cite this column as follows: Richard Wiener: “Branch and Bound
Implementations for the Traveling Salesperson Problem – Part 3”,
in Journal of Object Technology, vol. 2, no. 4, July-August
2003, pp. 89-100. http://www.jot.fm/issues/issue_2003_07/column8
|