The rapid development and availability of cheap storage and sensing devices has led to a deluge of multi-dimensional data. While the dimensionality of modern datasets is growing at an unprecented rate, our saving grace is that these data often exhibit low-dimensional structure that can be exploited to reduce the storage and in some cases, reveal relationships amongst large collections of signals.
In order to store, analyze, classify, and organize increasingly large collections of multi-dimensional data, modern signal processing and machine learning methods have moved towards data-driven approaches that aim to learn underlying features or low-dimensional structures present within collections of training data. These approaches typically benefit from nonlinear approximation schemes such as sparse recovery to select a set of these learned features (dictionary) to represent a new test signal of interest. The application of sparse recovery to supervised and unsupervised learning is a demonstration of the power of sparsity when a signal of interest is appropriately matched to the structures in a dictionary.
Feature learning plays a crucial role in the success of a wide range of methods for classification and clustering. However, in many cases where we possess a large number of examples that we wish to cluster or organize, we can simply use the data as our dictionary to form sparse representations of signals in an ensemble with respect to other signals in the ensemble. Just as in sparse representation-based classification/clustering with learned dictionaries, this idea of “self expression” can be used to classify/cluster large ensembles of data. An interesting consequence of using the data to represent itself is that this leads to a provable method for segmenting signals that lie on a union of subspaces.
The goal of this thesis is to explore the use of data-driven sparse recovery to both reveal and exploit low-dimensional structure present in large collections of data. In particular, we develop new geometric insights into the behavior of sparse recovery for revealing local low rank structure in a collection of the data. After introducing this geometric analysis, we introduce a host of novel applications of self-expressive signal recovery that exploit local low rank structure for compression, clustering, and distributed computing. We conclude with a discussion of how the proposed methods can be used to reduce the wiring complexity and communication within neural networks and neuromorphic circuits. The proposed methods are applied to a number of important imaging applications including: (i) hyperspectral remote sensing of terrain for crop segmentation and (ii) compression and clustering of light field data arising in 3D scene modeling and understanding.