The Wavelet IDR Center Research Subteam

Image Compression

M. Orchard
Subteam Leader
I. Daubechies
R. DeVore
R. Sharpley
T. Yu



[ Top | Vision | Technology | Recent Work | Problems ]

With the massive sizes of image and volumetric datasets, compression becomes increasingly important to support efficient storage, transmission and processing. Besides its practical importance, the study of parsimonious (in terms of bitrate) representation of image sources offers valuable insight into their structure, suggesting improved methods for a variety of other data processing tasks such as feature extraction and registration. In the past 5 years, wavelet-based image compression algorithms have been introduced that demonstrate significant performance improvements over standard compression algorithms [PSNR]. The successes of these algorithms reflect the fact that the space-frequency localization of the wavelet transform is well matched to the class of typical images that are compressed. This project is directed at better understanding the relationship between characterization of general classes of images and the compression efficiency that is achievable over each class. Our goal is to refine our classifications in order to identify practically useful classes of images for which significant improvements in compression efficiency can be achieved and to extend these ideas to higher dimensional data.



[ Top | Vision | Technology | Recent Work | Problems ]

Compression takes two forms:

    (i) lossless: complete fidelity of the dataset is retained,
    (ii) lossy: some fidelity is sacrificed for more efficiency.

For images, lossless compression typically achieves at most 2-1 or 3-1 compression ratios, while lossy compression ratios are dramatically higher (sometimes more than 100-1) but still retaining visual fidelity required in an intended application. The primary focus of this subteam is research on lossy compression, which naturally must then address fundamental approximation and information theoretic issues. Lossy encoder algorithms which are compatible with progressive transmission provide lossless compression or may be terminated when sufficiently accurate detail has been obtained. Similar ideas apply to signals and other data formats.

A lossy compression scheme converts the file of pixel values from a digitized image I to a new compressed file that represents exactly a compressed image Î. One can view the compressed image Î as an approximation to I with error typically (but not exclusively) measured in the mean-square sense. Performance of the compression method is then measured by the behavior of the error (distortion) D(I,R) for a given bit allocation R. Much of our understanding of image compression has been developed from the study of rate-distortion relationships for either discrete gaussian random processes or samples of bandlimited continuous gaussian random processes. Current image compression standards (e.g. JPEG and MPEG) apply transform coding techniques developed from this approach.


Recent Work

[ Top | Vision | Technology | Recent Work | Problems ]

Recent wavelet-based image compression algorithms have been shown to significantly outperform standard algorithms [PSNR] by exploiting the space-frequency localization of the wavelet decomposition ([Sh], [XRO a], [SP]). Smooth areas of the image are efficiently represented with a few low-frequency wavelet coefficients, while important edge features are represented with a few high-frequency coefficients, localized around the edge. Wavelet-based algorithms have been adopted by government agencies as a standard method for coding fingerprint images [FBI], and are being considered in the [JPEG2000] JPEG2000 standardization activity. Some basic software building blocks for experimenting with wavelet coding of images can be accessed at [Davis]. A general idea of the performance of wavelet based compression for standard images is available on-line (see Compression DEMO).

The successes of these algorithms reflect the fact that the space-frequency localization of the wavelet transform is well matched to the class of typical images that are compressed. Any given coding strategy provides a classification of images according to the rate of decrease of D(I,R) that is achieved as D tends to zero, and strategies that achieve the fastest D(I,R) decay can be understood as uncovering structure within the given class of images. Current research is directed at understanding the interplay between classification and rate-distortion analysis in order to achieve optimal performance over general classes of images.

Mathematically, there are several important theorems which give understanding of the interplay between classification and rate distortion analysis. Approximation theory is very useful since it gives precise characterizations of functions with a given approximation rate. For example, wavelet-based compression can be viewed as a form of n-term wavelet approximation. In [DJP] a theory is given which characterizes those functions that can be compressed in an Lp metric with a prescribed decrease of error. This theory does not apply directly to rate distortion for images because: (i) it measures performance in the number of coefficients rather than bits, (ii) encoders do not retain entire coefficients but only a certain number of bits, and (iii) the location of the coefficients must also be encoded. The first issue was addressed in the companion article [DJL]. More recently, a theory similar to [DJP], counting bits rather than coefficients, has been given in [CDGO] (see also [MF]). These results apply directly to image compression, and merge ideas from nonlinear approximation and information theory to both explain the good performance of a broad collection of algorithms, and to identify the important features (membership in certain Besov spaces) of typical images that allow good coding performance. The problem of encoding significant wavelet coefficient locations was first addressed in [Sh] and more recently in [GAS], [CDDD].



[ Top | Vision | Technology | Recent Work | Problems ]

Redundant representations have already had an impact in image compression. For example, wavelet packet libraries have been used to provide flexible bases for compressing images [XHRO], [XRO b]. A particular wavelet packet basis has proven very efficient for representing fingerprint images, and has been incorporated into current fingerprint compression schemes employed by the FBI. The use of redundant representations alter compression in two regards. The selection and specification of an optimal basis provides another nonlinear scheme for representing image information, a scheme that is not covered by the usual theory of nonlinear approximation [DeV] which requires an analysis of the distribution of images conditioned on having already specified an optimal basis.

In this direction, it would be desirable to

The Besov classes arise naturally when one fixes a wavelet basis and an efficient strategy for coding coefficients, and identifies the images to which the method is well suited. However, real images seem to be a small subset of the Besov class, indicating the possibility of higher compression. This may be illustrated by observing that since wavelets form an unconditional bases in normed spaces where approximation is taking place, any selection of signs for the coefficients will produce an image in the space, but not necessarily one that is of much interest.

Original Image of Lena

Image produced by randomly
altering signs of coefficients

Additionally, our experience with current wavelet coders ([RO], [CDGO]) suggests that a critical issue limiting current performance is the high cost (in bitrate) of identifying the location of edge coefficients. Better classifications of images will rest on better exploiting the structure of location information in the form of higher-order contour features. In this direction, it would be desirable to: We intend to exploit the multiscale (tree) structure of wavelet coefficients to incorporate higher-order contour features in the form of direction and smoothness.
As we develop improved classifications of images, it is desirable to understand: This is closely related to the first problem listed above. A good stochastic model for images would allow the application of Shannon entropy. A good deterministic model would allow the application of the theory of widths or Kolmogorov entropy. There are relations between the two approaches [Do].

[Go to the top]