Part 3: Multi-threaded solution with many inexpensive nodes

Richard Wiener, Editor-in-Chief, JOT, Associate Professor, Department of Computer Science, University of Colorado at Colorado Springs

COLUMN

PDF Version

The multi-threaded implementation presented in this column sets the stage for the distributed processing implementation to be presented in the next column.
In the previous column a best-first branch and bound algorithm was introduced and implemented. This algorithm forms the basis for the current work (please review the details of that algorithm before continuing with this paper).

Class Node is unchanged from the single-threaded implementation presented in the previous column. A new thread class ProcessNodes is introduced and is presented in Listing 1.

Listing 1 – Thread class ProcessNodes

Thread instances are spawned by the revised class TSP presented in Listing 2. When a ProcessNode thread is spawned, the constructor takes as input a thread number (assigned by a TSP object), a priority queue (a single node at level 2 in the tree structure – there are n – 1 such nodes where n is the number of cities in the TSP problem), the number of cities, the cost matrix, the total node count to-date for the thread and an instance of class TSP (so the thread can communicate with the tsp object).

Listing 2 – Class TSP

In method generateSolution, the root node and all its children (at level 2) are generated before any threads are spawned. Then the following segment of code is used to create the threads:

Each thread is handed a child node (partial tour consisting of node 1 as well as another city). The child nodes at level 2 are prioritized based on their lower bound (the smallest lower bound having the highest priority). The threads are held in an array of ProcessNode (called threads). The thread[0] starts with a child node of smallest lower bound. The thread[1] has the second lowest lower bound. There is of course no guarantee that the solution exists under any of the initial level 2 nodes handed off to the various threads (6 are shown by default). When a thread completes the processing of the node that it was handed (this will probably require the generation of millions of nodes), it sends the tsp object a stop command. If the pool of available tsp nodes is depleted (queue size of size 0), the value of the field numberStopped is incremented by one (see method stop in class TSP). Otherwise a new node from among the original level 2 nodes is handed off to a fresh thread that is spawned (recall that once a thread is terminated it cannot be re-started) and removed from the tsp queue. The application terminates when all the level 2 nodes in the tsp object have been removed (and handed off to threads) and all the threads have terminated (the value numberStopped is equal to the number of threads).

Empirical Results

We consider some results run on two computers: a single-processor Pentium 4 with a 1.7Ghz processor and a dual G4 processor Powermac running under Mac OS 10.2.3. It is interesting to see how the Powermac with its dual processors is able to take advantage of the multiple threads. JDK 1.4.1 is used on each of the machines (a beta release on the Powermac).

The results are summarized in the table below.

Comparison of Single-Threaded vs. Mult-Threaded TSP Implementations on a Single vs. Dual Processor Machines

Machine	Number of Cities	Execution Time (Single Threaded Previous Implementation)	Execution Time (Using 6 threads with current implementation)
Pentium 4, 1.7 GHz	20	21.801 seconds	19.358 seconds
Powermac (Dual g4 processors)	20	30.597 seconds	16.486 seconds
Pentium 4, 1.7GHz	24	299.25 seconds	343.944 seconds
Powermac (Dual G4 processors)	24	410.984 seconds	276.729 seconds

Several conclusions may be gleaned from the results in the above table.

The Mac OS is able to take good advantage of multiple threads by efficiently dispatching threads to each of its two processors. For both TSP problems the Mac’s multi-threaded solution was significantly faster than the single-threaded solution. In fact for both problems the Powermac’s multi-threaded solution was the fastest among the two computers even allowing for the fact that the Pentium 4 machine has a clock speed that is 1.36 times faster than the Powermac’s clock speed.
The execution time for the single-threaded implementation on the two computers is close to the ratio of their respective clock speeds. This suggests that the Powermac is not able to take advantage of its dual processors in the single-threaded implementation.
The multi-threaded solution on the Pentium 4 machine was slightly faster for the 20 city problem and significantly slower for the 24 city problem. This clearly suggests (as should be fairly obvious) that the problem instance itself affects the ability of the multi-threaded solution to take advantage of the input data. This is further reflected in the relative improvement that the multi-threaded solution on the Powermac displayed for the 20 and 24 city problems. The degree of improvement was smaller for the 24 city problem than the 20 city problem where the execution time was almost cut in half.

About the author

Richard Wiener is Associate Professor of Computer Science at the University of Colorado at Colorado Springs. He is also the Editor-in-Chief of JOT and former Editor-in-Chief of the Journal of Object Oriented Programming. In addition to University work, Dr. Wiener has authored or co-authored 21 books and works actively as a consultant and software contractor whenever the possibility arises.

Cite this column as follows: Richard Wiener: “Branch and Bound Implementations for the Traveling Salesperson Problem – Part 3”, in Journal of Object Technology, vol. 2, no. 4, July-August 2003, pp. 89-100. http://www.jot.fm/issues/issue_2003_07/column8

Previous column