Synthetic Image Sequence Compression

Douglas Lyon, Fairfield University, Fairfield CT, U.S.A.

COLUMN

Abstract

This paper describes a technique for compressing computer screen shots into a GIF animation file. The goal is to distribute the animations, to a variety of browsers, without requiring a plug-in or helper application. We seek to minimize the size of the image sequence, while maximizing the signal to noise ratio of the sequence. The GIF animation format has several constraints; images may have a maximum of 256 colors, and the images must all be of the same size. Further, to minimize overhead, we seek to make use of a single color lookup table for the entire animation.

Several color quantization algorithms are compared, using SNR (Signal to Noise Ratio) as the metric of quality (as well as subjective appearance).

We present the design of an interface, written in Java, and distributed freely using Java Web Start, that employs a well know neural network program and a color quantization algorithm to capture screen shots and save them to the GIF animation.

GIF animations represent silent movies, as they have no sound to accompany them. However, they are still in wide use and have applications in entertainment and education.

The techniques described are a part of the JSnap project, a joint project between the skunk works of DocJava, Inc. and Fairfield University.

1 INTRODUCTION

Synthetic images sourced from the screen snapshots are different from the typical image sequences sourced from standard sensors (video cameras, scanners, etc.). The images are characterized by having little to no sensor noise. Further, the images can typically range in size from 64x64 pixels to 1024x768 (or larger). Computer displays with 24-bit color depth are presently standard. Image sequences are typically displayed with a refresh rate of 60 Hz or better. However, for the purpose of distribution on the Web, only difference images need to be transmitted and the rate of image change can be up to several seconds per frame, depending on the material and the application.

One of the basic problems with Java (before JDK 1.3) is that there was no way to portably capture the screen without resorting to native method invocations. As of JDK 1.3, a new class was introduced into the Abstract Windowing Toolkit (AWT) called the Robot class. The Robot class was designed for testing of GUI’s and event processing. However, we have made use of it to perform screen captures in order to generate image sequences. Thus, the technique presented in this paper provides the enabling technology for others to acquire image sequence data and perform compression experiments.

GIF images are constrained to 256 colors (i.e., they are 8-bit images) [Murray]. Thus, we are faced with a sub-problem of converting 24-bit color images into 8-bit color images. This is called color requantization and requires that some colors be discarded from the input image and remapped into new color in the output image. There are many algorithms for performing color-requantization, and many criteria for determining the optimality of the algorithms.

1.2 Distortion Metrics

This section describes some common distortion metrics used to measure one aspect of a quantization algorithms performance. The mean-square distortion is, perhaps, one of the most common metrics. Suppose, for example, an input, x is quantized by a function called a quantizer, Q. The mean-square distortion is computed by taking the expectation of the square of the difference between the algorithms output value and the input value of a pixel, then multiplying by the probability of the value. In the continuous one-dimensional domain, we write

(1.1)

where

Typically, the quantizer’s performance is measured using the signal-to-noise ratio (SNR), which is given in dB as

(1.2)

and

Where E(x) is the expected value for x.

Unfortunately, distortion measures, such as the SNR, are not necessarily reflective of any physiological metric for improving the subjective appearance of an image. Hence their use is open to question. For example, histogram equalization has been shown to improve an image’s appearance; however, according to (1.2) such a process will lower the SNR. The reason that the appearance is improved may actually have to do with the improved contrast ratio of the image. Such a subjective improvement is not taken into account with (1.2).

Given a discrete point set, we can modify (1.2) to reflect the distortion function by summing the Euclidean distances between the color of each pixel and its map. This is expressed in

(1.3)

where

In fact, we could obtain the mean-square distortion measure from the total quantization error by dividing it by the total number of pixels, that is;

(1.4)

This is computed by subtracting the original image from the quantized image, squaring the resulting error pixels, summing their color components, then dividing by the total number of pixels. The mean square error (MSE) represented by (1.4) is a widely used measure of distortion and is also called the coding noise power [Netravali].

Another metric of coding performance is the bit rate. It is typical to seek to minimize the bit rate and the MSE. Typical of most engineering tradeoffs, the MSE is inversely related to the bit rate. Bit rate is function of the number of bits needed per pixel. As this can change from pixel to pixel, and image to image, one method for computing bit rate is to measure the image (or image sequence) file size, in bits, then divide by the total number of pixels. This takes into account overhead in writing out the file, in any given format.

Perceptual difficulties aside, it is useful to have an objective fidelity citerion, such as the SNR, to use when evaluating a lossy coding scheme. In addition, SNR is one of the most used fidelity criteria. One way to compute the SNR in dB is

(1.5)

The SNR defined in (1.5) is consistent with [Myler] and can also be used on image compression algorithms (where the quantized image is replaced with the compressed image).

1.2. Color Quantization of Still Images

There are several techniques available for reducing the number of colors (i.e. dynamic range) in an image (or image sequence). A simple, fast method (still in wide use) is called the linear cut algorithm. It works by cutting off bits from the pixels’ least significant bits first, in the integral RGB color space.

Consider the integral RGB color space. Each component is constrained to range from 0 to 255 and resides in a 16 bit short array. To perform the linear cut algorithm on such an array, we need to mask the low-order bits that we want to “cut” out of the pixel, for example:

The mask in the linearCut method is computed, assuming that there are only 8-bits per color. If the programmer performs a linear cut of the last two bits, for example, we shift left 2 bits so that the least significant two bits are cleared (e.g., 11111111 << 2 becomes 1111111100). Thus the mask will thus remove the least significant two bits from the short stored in a[x][y].

Another algorithm in common use is called the median cut algorithm [Heckbert 82]. The goal of the median-cut algorithm is to have each color represent approximately the same number of pixels.

The algorithm is implemented by first computing a color histogram of the image in RGB color space. The histogram is then clustered into k groups. Once the clustering is performed, the pixels are mapped to the centroids of the clusters in order to minimize the color error in the image. A tightly fitting color cubed is created for the color space. It is then cut at the median of the longest axis (hence the name, median cut algorithm). The median cut procedure is applied to subcubes until there are k cubes. The centers of the cubes are used for the k output colors in the color map. Given a color map, each pixel is mapped from its original color to its nearest color neighbor. This mapping is done to minimize the color error metric given by (1.3).

One way to implement the median cut algorithm is to use a queue to enable a breadth-first cutting of the sub-cubes. The idea is that an instance of the Cube class has a static member variable that is used to keep track of the total number of Cube instances. The pseudo-Java follows:

The use of the CubeQueue is for illustration only. The implementation in the MedianCut class (as distributed by [Lyon 99]) does not make use of explicit recursion, due, in part, to the computational expense. In fact, given the a priori knowledge of the number of cubes needed, we allocate an array that is exactly k cubes in length. Then we simulate the queue, using an array index. Once the list of sub-cubes is known, it is a matter of assigning the colors to the centroids of the cubes that minimize the color error.

As initially formulated by Heckbert [Heckbert 80], the color error is measured as the Euclidean distance from the pixel’s original color to the remapped color. The sum of all the errors in the remapping is the objective function whose minima is sought. In fact, this is a standard metric in clustering.

In clustering, we are given a set of points in a Euclidean space and are asked to group them into k partitions to minimize a distortion function.

Two other algorithms we considered include the Wu color quantization algorithm and the Octree encoding algorithm. Java implementations for these algorithms are detailed in and available from http://www.docjava.com [Lyon 99] [Wu][Wu 97].

Converting to 256 colors, the test image, using Wu color quantization, yielded 45.1 dB SNR. Octree encoding yields a 42 dB SNR on the sample image. Linear cut yields a 32 dB SNR while Median cut yields a 45.2 dB SNR. The winner was the neural network color quantization algorithm, at 51 dB SNR.

Figure 1. Original Picture

Figure 1 shows the original picture, with over 300k different colors. The image was taken using a 24-bit color display that included a standard color test image. The over all image is 640x480 pixels in size.

Figure 2. The linear cut algorithm

Figure 2 shows the original test image after the application of a linear cut algorithm. The image was reduced to 9 bits (3 bits per pixel).

Figure 3 Octree color reduction

Figure 3 depicts the test image after being color quantized to 256 colors using the Octree algorithm.

Figure 4. The Median Cut algorithm

Figure 4 shows the test image after application of Heckbert’s median cut algorithm. The image was reduced to 256 colors.

Figure 5. Wu color quantization algorithm

Figure 5 shows the original image after it has been quantized to 256 colors using Wu’s color quantization algorithm.

Figure 6. Neural Network quantization algorithm

Figure 6 shows the original image after it has been quantized to 256 colors using the neural network algorithm presented by [Dekker]. As a result of the improved SNR, we have selected the neural network algorithm for the image sequence compression.

1.3 The JSnap Application

In this section we describe the interface and operation of the JSnap application.

Figure 7. The JSnap Application Interface

The JSnap GUI provides a feature to enable fixed-size resampling of the screen using an output combo-box menu items, as shown in Figure 8.

Figure 8. Setting the output side using a Combo-box

As an alternative you can manually set the output size by dragging a rectangle across the screen.

2 THE IMAGE SEQUENCE COMPRESSION ALGORITHM

The image sequence color requantization algorithm makes the assumption that the color look-up table for the first image is going to be used for all of the following images. The impact of such an assumption is that images will be degraded in their appearance if they contain colors that are radically different from the initial image. The algorithm is based on an API authored by Kevin Weiner [Weiner].

Given a color lookup table, a pixel is remapped to a color that is closest according to the minimum mean square error. This is performed using a search through all 256 colors in the color lookup table. After the color requantization, a difference frame is computed and the pixels are Lempel-Ziv compressed using the algorithm described in [LZ77].

The hybrid algorithm employed in this paper first color reduces the first image using the Wu color quantization algorithm [Wu 97]. The first image is then used to establish the color look-up table for subsequent images. To speed the color mapping to one of the selected colors in the look-up table, a neural network algorithm was employed [Dekker]. This is because the combination of the two algorithms appears to be better than either algorithm by itself.

Also, the subjective appearance was used to make this judgment, not the SNR.

3 EXPERIMENTAL RESULTS

In this experiment we take a series of screen snapshots at 800x600 resolution, converting them to a GIF animation. We then measure the size of the file produced and compare this against the 24-bit estimate of what the size of the image sequence would be, without compression. That is, for the 10 images, we would need 800x600 pixel * 3 bytes per pixel * 10 images = 14.4 MB. The GIF animation produced is 490,336 bytes (29:1 compression ratio). Or, to put it another way, we had (490,336*8) bits / (800x600x10) pixels to obtain 0.8 bits per pixel, on average. As GIF images are internally Lemple-Ziv compressed (after the difference frames are encoded), using utilities, like Gzip provides little benefit (490k is reduced to 487k). On the other hand, there are utilities that will recompress the JSnap output from 490k bytes to 392k bytes [ASG] promising no change in appearance. The question of why the program is out-performed by the utility design for this purpose remains open. It has been suggested that there is an additional advantage to be had in the software implementation by selecting for an appropriate “transparent” color in the GIF sequence, however, experiments in this direction have yet to validate this assertion. Thus, there is room for improvement.

One attribute, speed, was not addressed in the previous metrics of algorithmic performance. That is due, in part, to the stop-and-wait nature of the application. That is, the image sequence author must set up the next frame and then indicate to JSnap that the frame is ready for capture. Manual set-up time can range from a few 10’s of seconds to several minutes, depending on how much work is required. JSnap is able to take a snapshot of the screen and add it to the GIF animation, at 800x600 pixels resolution, in between 0.5 and 3 seconds per frame (on a rather slow 800 Mhz G4 laptop). As of this writing, machines that are 2 and 3 times faster are commonly available, and so speed of execution seems acceptable, for this application.

Another metric of performance is the subjective appearance of the image, to the viewer. This is perhaps, the most important metric, yet it is also one of the hardest to quantify.

Figure 9 shows the original image of Figure 1, after being resampled using JSnap and requantized to look-up table of the original image sequence. Since the look up table was established for an image other than the original image, we observe significant degradation as a result of the output.

Figure 9. The Original Image After Requantization by JSnap

On the other hand, if the original image is used as the start image, the output for the first image is almost identical to the initial image, for a single frame, and the SNR for such an image is 33 dB. Thus the use of the algorithm can go from being nearly as good as a still-frame requantization to being far worse.

4 CONCLUSION

The algorithm used to perform image sequence color requantization makes use of the first image in the sequence. Since the first image will probably not represent the color composition of the following images, a more sophisticated algorithm might, in general give a better SNR, on average. An alternative, a more computationally expensive algorithm could insert extra color look up tables, if the SNR falls below a given threshold. In addition to the computational cost of such an approach, there is the cost of inserting a new color lookup table and of transmitting a frame that contains all the pixels (rather than a difference frame), as well as the cost of computing the SNR. This type of algorithm is left for future work.

Given a color lookup table, a pixel is remapped to a color providing a minimum mean square error. This is performed using a neural network search through the color lookup table. This is probably where the algorithm could benefit most from an algorithm that speeds the search for a best color. Aside from nearest neighbor algorithms based on the construction of a three dimensional Voronoi diagram, there are hash table algorithms for mapping colors into efficient color tables that could be exploited [Lyon 99]. As an alternative, the color look up table can be mapped into a binary tree key derived by requantization of luminance. Efficient inverse color map computation is addressed in [Thomas]. Eppstein also has a fast hierarchical clustering algorithm that also appears to hold promise [Eppstein]. The exploration of such algorithms is left as a topic of future work.

REFERENCES

[ASG] A Smaller GIF, available from http://www.peda.com/smaller/.

[Dekker ] “Kohonen neural networks for optimal colour quantization”, by Anthony Dekker, Network: Computation in Neural Systems, Vol. 5, 1994 pps. 351-367.

[Eppstein] “Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs”, The ACM Journal of Experimental Algorithmics, by David Eppstein, University of California at Irvine, vol. 5, No. 1., 2000, pps. 1-23. Available from http://www.jea.acm.org/2000/EppsteinDynamic/

[Heckbert 80] Color Image Quantization for Frame Buffer Display, by Paul S. Heckbert, B.S. Thesis, 1980, Architecture Machine Group, MIT, Cambridge, MA. Available at http://www.cs.cmu.edu/~ph .

[Heckbert 82] “Color Image Quantization for Frame Buffer Display”, by Paul Heckbert, Computer Graphics, vol. 16, No. 3, July. 1982, pps. 297-307. Available at http://www.cs.cmu.edu/~ph .

[Lyon 99] Douglas A. Lyon, Java for Programmers, Prentice Hall, Upper Saddle River, NJ, 07458. 1999. Available from http://www.docjava.com .

[LZ77] Ziv J., Lempel A., “A Universal Algorithm for Sequential Data Compression," IEEE Transactions on Information Theory, vol. 23, No. 3, pp. 337-343.

[Murray ] Graphics File Formats, by James D. Murray and William Vanryper. O’Reilly & Associates. CD, 1996.

[Myler] Computer Imaging Recipes in C, by Harley R. Myler and Arthur R. Weeks. Prentice Hall, Englewood Cliffs, NJ. 1993

[Netravali] Digital Pictures, by Arun Netravali and Barry Haskell. Plenum Press, NY. 1988.

[Thomas] “Efficient Inverse Color Map Computation” in Graphics Gems Volume II, Adademic Press, Inc., Cambridge, MA, 1991, pps.116-125

[Weiner] AnimatedGifEncoder by Kevin Weiner, FM Software, kweiner@fmsware.com.

[Wu 97] “Lossless Compression of Continuous-Tone Images via Context Selection, Quantization and Modeling”, by Wu. IEEE Transactions on Image Processing, vol. 6, No. 5. May. 1997, pps. 656-664.

[Wu] “Efficient Statistical Computations For Optimal Color Quantization”, by Xiaolin Wu, in Graphics Gems, vol. II, edited, by James Arvo, Academic Press, Inc., Cambridge, MA. 1991, pp. 126-133.

About the author

After receiving his Ph.D. from Rensselaer Polytechnic Institute, Dr. Lyon worked at AT&T Bell Laboratories. He has also worked for the Jet Propulsion Laboratory at the California Institute of Technology. He is currently the Chairman of the Computer Engineering Department at Fairfield University, a senior member of the IEEE and President of DocJava, Inc., a consulting firm in Connecticut. E-mail Dr. Lyon at Lyon@DocJava.com. His website is http://www.DocJava.com.

Cite this column as follows: Douglas Lyon: “Synthetic Image Sequence Compressions", in Journal of Object Technology, vol. 4, no. 4, May-June 2005, pp. 19-31 http://www.jot.fm/issues/issue_2005_05/column2

Previous column

Next column