basics
In
the area of personal telecommunications, where transmission channel
capacity is often severely limited by power and bandwidth constraints,
it is particularly important to compress the data maximally before
transmission.
That is, it is important to compress the data to a bit rate that is
as close as possible to the rate-distortion bound R(D), which
is fundamentally the lowest possible rate to which the source of data
can be compressed without distorting the data by more than some amount
D (D may be zero in the case of lossless compression).
The rate-distortion bound and the data compression algorithms that
achieve it depend on the precise source statistics. For example, the
Huffman code, which is a lossless compression scheme, is designed
for a particular probability distribution of source symbols. The same
can be said for lossy compression schemes or quantizers. Standard
implementations of JPEG, for example, match their single bit allocation
strategy to the expected class of source statistics. If a lossless
code or a quantizer is designed for one source and operated on another,
the rate-distortion bound for the source in operation is not achieved.
Unfortunately, especially in personal telecommunications, the exact
source statistics are almost never known in advance and may indeed
be time-varying. Hence it is important to study data compression schemes
that are robust or adaptive to the source statistics.
Universal
data compression is data compression that asymptotically achieves
the rate-distortion bound as more and more data is processed, regardless
of the source (within a class of sources).
By observing more and more data, universal compressors essentially
learn the statistics of the source in operation. This must be done
so that optimal performance is eventually achieved, with convergence
to the optimum at the fastest possible rate.
top
theory
Given
a class of source distributions, a universal quantizer is a sequence
of codes for which the expected performance on any single source in
that class approaches, as the length of the incoming data stream increases
to infinity, the rate-distortion bound associated with the source
in operation. Work on the theory of universal
source codes includes investigations into conditions on their existence
and on the rates of convergence of a variety of sequential and nonsequential
algorithms.
top
two-stage
codes
In the
attempt to design codes which meet the above criteria for robustness
across a class of sources, one approach involves the use of multi-codebook
compression systems. A multi-codebook compression system employs a
collection of source codes. In using a multi-codebook compression
system, each length-N block of source data is encoded with
a single member of the collection, but the code used is allowed to
vary from block to block. While any single code can achieve maximal
compression on only a single source, careful design of multi-codebook
compression systems can result in good performance over a class of
source distributions.
Multi-codebook
quantization systems involve two stages: the first stage maps the
incoming data block to a particular quantizer or codebook, and the
second stage describes the data block given the chosen codebook. For
optimal coding, the codes in the collection should be vector quantizers.
For (sub-optimal) codes giving better performance for typical complexity
constraints, transform or wavelet-based codes may be substituted for
vector quantizers.
top
lossless
codes (Burrows Wheeler Transform)
The
Burrows Wheeler Transform (BWT) is a slightly expansive reversible
sequence transformation currently receiving considerable attention
from researchers interested in practical lossless data compression.
Experimental results on algorithms using
this transformation indicate lossless coding rates better than those
achieved by Ziv-Lempel codes (LZ'77 and LZ'78) but typically not quite
as good as those achieved by the prediction by partial mapping (PPM)
schemes. To date, these comparisons have primarily involved
experimental studies of implementation speeds and achieved lossless
description rates. Ongoing work in the Data Compression Lab involves
the design of new universal lossless data compression algorithms based
on the Burrows Wheeler Transform and the theoretical analysis of these
and existing BWT-based schemes. The algorithms introduced are extremely
efficient in their computation and memory requirements and achieve
excellent performance.
top