skip to main content
About  /  Projects

Effros Lab - Research Areas & Current Projects


The Effros Laboratory is focused on applying mathematical tools to problems of information transmission and storage in both technological and biological systems. In technological systems, this work involves demonstrating the limits of what is possible in information storage and communication systems and then working to devise practical new techniques to help us bridge the often enormous gap between what is possible and what today's technology can achieve. In biological systems, this work seeks to understand fundamental questions about how natural systems like our brains accomplish the goals of information storage and transmission and to use this understanding to expand the boundaries of science. At its heart, all of the work done in the Effros Lab relies on an understanding of the mathematics that explains how even seemingly unpredictable individual events achieve a kind of consistency and predictability when viewed in aggregate. The applications listed below are just some of the areas that may benefit from the fundamental research of Professor Effros' group.

Communication in a World of Uncertainty

Today's communication systems are faced with enormous variability. The number of users communicating over a given access point, such as a cell phone tower or WiFi hotspot, can increase enormously from one hour to the next and then go back down just as quickly. Wireless channel conditions can change with network congestion, weather, and external interference. Network use -- whether by people or the devices now scattered about our homes and communities -- is often erratic, as our uses of the network change. Each of these types of uncertainty presents its own special challenge. How can we achieve the best possible performance for current network conditions if we don't know what network conditions to expect?

Wireless Communications

Wireless communications systems suffer a number of difficulties caused by the great system uncertainties to which they are subjected. For example, the noise characteristics of a wireless communication channel can vary widely and unpredictably. As a user travels through space and time, the interference, distance to base station, and competing network traffic experienced by the wireless unit may all vary enormously. Devising channel coding strategies to deal with these changing channel characteristics presents a wide array of new challenges. For example, as the channel characteristics change, so too does the maximal rate at which information can be reliably communicated across the channel. Thus good channel codes for wireless communications systems must be adaptive, or at least robust, to channel variations.

Yet developing good channel codes for wireless communications systems, may have consequences not only for channel code design but also for source coding design, leading to a variety of joint-source channel coding issues. For example, if the channel coding rate is allowed to vary over time, then the source coding rate must also vary to make maximum use of the available channel coding rate. Multiple resolution source codes are particularly well-suited to this application, since they allow for varying rate but achieve that varying rate with a single source code, rather than a family of source codes, as would otherwise be required.

An alternative example of the interaction between source and channel codes in wireless communications systems may be seen in wireless data communication systems. Wireless data systems are typically implemented as packet-based networks in order to allow for more users to simultaneously use the same bandwidth. In packet-based networks, information is broken into small pieces or "packets", each of which is carefully labelled and independently sent over the wireless link. Due to network congestion or interference, some of the packets may be lost in transmission, and designing source codes for which data can be reconstructed appropriately even when some subset of the packets is never received (or received too late) at the decoder requires a special form of source codes, known as multiple description source codes.

Speech Recognition

Many of the techniques used to create robust data compression systems may be generalized for creating robust systems for other applications. A current study ongoing in the Data Compression Laboratory treats the problem of speech recognition. In particular, the goal of this project is to develop speech recognition systems that achieve good speech recognition performance despite the enormous inter- and intra-speaker variations that occur in natural speech.

Universal Data Compression

In the area of personal telecommunications, where transmission channel capacity is often severely limited by power and bandwidth constraints, it is particularly important to compress the data maximally before transmission. That is, it is important to compress the data to a bit rate that is as close as possible to the rate-distortion bound R(D), which is fundamentally the lowest possible rate to which the source of data can be compressed without distorting the data by more than some amount D (D may be zero in the case of lossless compression). The rate-distortion bound and the data compression algorithms that achieve it depend on the precise source statistics. For example, the Huffman code, which is a lossless compression scheme, is designed for a particular probability distribution of source symbols. The same can be said for lossy compression schemes or quantizers. Standard implementations of JPEG, for example, match their single bit allocation strategy to the expected class of source statistics. If a lossless code or a quantizer is designed for one source and operated on another, the rate-distortion bound for the source in operation is not achieved. Unfortunately, especially in personal telecommunications, the exact source statistics are almost never known in advance and may indeed be time-varying. Hence it is important to study data compression schemes that are robust or adaptive to the source statistics.

Universal data compression is data compression that asymptotically achieves the rate-distortion bound as more and more data is processed, regardless of the source (within a class of sources). By observing more and more data, universal compressors essentially learn the statistics of the source in operation. This must be done so that optimal performance is eventually achieved, with convergence to the optimum at the fastest possible rate.

Theory

Given a class of source distributions, a universal quantizer is a sequence of codes for which the expected performance on any single source in that class approaches, as the length of the incoming data stream increases to infinity, the rate-distortion bound associated with the source in operation. Work on the theory of universal source codes includes investigations into conditions on their existence and on the rates of convergence of a variety of sequential and nonsequential algorithms.

Lossless Source Coding

Basics

Lossless or "noiseless" source coding is data compression for applications where the data must be represented efficiently but cannot be altered or distorted in the compression process. Data types that typically require lossless coding include text and computer exectuables, where any variation in the original data string may have catastrophic consequences. Lossless data compression is employed in a wide range of technologies, including data storage devices, modems, and fax machines.

Theory

The goal in studying lossless source coding theory is both to understand the best performance that could conceivably be achieved in lossless source coding and to understand how well current algorithms perform relative to those bounds. Current work in the Data Compression Laboratory focuses on the analysis of new and existing low-complexity universal lossless codes.

Practice

While theoretical analyses are useful for bounding the expected performance of a variety of lossless source codes, they do not tell the full story about algorithms considered. Work currently under way considers the design and implementation of practical universal lossless source codes and the comparison of their performances on text data to the performance of competing schemes on the same data sets.