Graduate and Postdoctoral Studies
High-dimensional and dependent data with additional structure
Tuesday, April 18, 2017
to 3:00 PM
1049 Duncan Hall
The age of computing has enabled the collection of massive amounts of data. These data present numerous statistical challenges, because many data sets are high-dimensional and dependent. While statistical inference for high-dimensional and dependent data is challenging, many data come with additional structure that can be exploited to facilitate statistical inference.
This thesis considers two widely used classes of models for high-dimensional and dependent data, high-dimensional multivariate time series and exponential-family random graph models.
In the case of high-dimensional multivariate time series, there is often additional structure in the form of spatial structure, e.g., air pollution is monitored by monitors and the geographical locations of monitors are known. If air pollutants cannot travel long distances, then the estimation of past-present and present-present dependencies of air pollution at a monitor can be restricted to short distances.
Here, a novel two-step estimation approach is proposed to estimate the range of dependence along with the parameters of multivariate time series models in high-dimensional settings. Theoretical results show that the two-step estimation approach reduces statistical error in high-dimensional settings. Simulation results confirm that the two-step estimation approach reduces statistical error and computing time. An application to air pollution in the U.S. demonstrates that the two-step estimation approach gives rise to results that are in line with scientific knowledge, whereas estimation approaches ignoring the spatial structure report results that are in conflict with scientific knowledge.
In the case of exponential-family random graph models, it is likewise common that there is additional structure: e.g., it is known that many networks, such as insurgencies and terrorist networks, are local in nature. Here, a novel two-step estimation approach is proposed to estimate the local structure along with the dependence pattern of networks. The proposed two-step estimation approach can be implemented in parallel and hence enables massive-scale estimation of exponential-family random graph models. Theoretical results are provided along with simulation results. An application to a large Amazon product network demonstrates the usefulness of the proposed two-step estimation approach.