Michelle Effros (Caltech) : Applications / Universal Data Compression

home

data compression lab:			universal data compression
			related papers
basics In the area of personal telecommunications, where transmission channel capacity is often severely limited by power and bandwidth constraints, it is particularly important to compress the data maximally before transmission. That is, it is important to compress the data to a bit rate that is as close as possible to the rate-distortion bound R(D), which is fundamentally the lowest possible rate to which the source of data can be compressed without distorting the data by more than some amount D (D may be zero in the case of lossless compression). The rate-distortion bound and the data compression algorithms that achieve it depend on the precise source statistics. For example, the Huffman code, which is a lossless compression scheme, is designed for a particular probability distribution of source symbols. The same can be said for lossy compression schemes or quantizers. Standard implementations of JPEG, for example, match their single bit allocation strategy to the expected class of source statistics. If a lossless code or a quantizer is designed for one source and operated on another, the rate-distortion bound for the source in operation is not achieved. Unfortunately, especially in personal telecommunications, the exact source statistics are almost never known in advance and may indeed be time-varying. Hence it is important to study data compression schemes that are robust or adaptive to the source statistics. Universal data compression is data compression that asymptotically achieves the rate-distortion bound as more and more data is processed, regardless of the source (within a class of sources). By observing more and more data, universal compressors essentially learn the statistics of the source in operation. This must be done so that optimal performance is eventually achieved, with convergence to the optimum at the fastest possible rate. top theory Given a class of source distributions, a universal quantizer is a sequence of codes for which the expected performance on any single source in that class approaches, as the length of the incoming data stream increases to infinity, the rate-distortion bound associated with the source in operation. Work on the theory of universal source codes includes investigations into conditions on their existence and on the rates of convergence of a variety of sequential and nonsequential algorithms. top two-stage codes In the attempt to design codes which meet the above criteria for robustness across a class of sources, one approach involves the use of multi-codebook compression systems. A multi-codebook compression system employs a collection of source codes. In using a multi-codebook compression system, each length-N block of source data is encoded with a single member of the collection, but the code used is allowed to vary from block to block. While any single code can achieve maximal compression on only a single source, careful design of multi-codebook compression systems can result in good performance over a class of source distributions. Multi-codebook quantization systems involve two stages: the first stage maps the incoming data block to a particular quantizer or codebook, and the second stage describes the data block given the chosen codebook. For optimal coding, the codes in the collection should be vector quantizers. For (sub-optimal) codes giving better performance for typical complexity constraints, transform or wavelet-based codes may be substituted for vector quantizers. top lossless codes (Burrows Wheeler Transform) The Burrows Wheeler Transform (BWT) is a slightly expansive reversible sequence transformation currently receiving considerable attention from researchers interested in practical lossless data compression. Experimental results on algorithms using this transformation indicate lossless coding rates better than those achieved by Ziv-Lempel codes (LZ'77 and LZ'78) but typically not quite as good as those achieved by the prediction by partial mapping (PPM) schemes. To date, these comparisons have primarily involved experimental studies of implementation speeds and achieved lossless description rates. Ongoing work in the Data Compression Lab involves the design of new universal lossless data compression algorithms based on the Burrows Wheeler Transform and the theoretical analysis of these and existing BWT-based schemes. The algorithms introduced are extremely efficient in their computation and memory requirements and achieve excellent performance. top