Kristoffer Herland Hellton: Consistency in high-dimensional principal component analysis

Kristoffer Herland Hellton (Department of Biostatistics, UiO) will talk about

Consistency in high-dimensional principal component analysis

Abstract

Principal component analysis (PCA) is one of the most widely used dimension reduction techniques also for high dimensional data, where the number of variables exceeds the sample size. PCA reduces the data to a small set of component scores, which can be used in visualization and conventional classification and regression methods.

However, in the high-dimensional setting, PCA is not asymptotically consistent, as the population eigenvalues and –vectors are not consistently estimated by the sample eigenvalues and –vectors. The resulting asymptotic bias has been investigated by Johnstone and Lu (2009) for fixed population eigenvalues and by Jung and Marron (2009), when the largest eigenvalues depend polynomially on the number of variables. We propose a model for high-dimensional data, where the largest eigenvalues will depend linearly on the variable dimension and illustrate this with genomics data. The consequence of the linear dependence is investigated for eigenvectors and component scores. We also discuss how the asymptotic bias can affect subsequent analysis on the scores, such as principal component regression.

Published Nov. 5, 2012 2:29 PM - Last modified Feb. 12, 2013 2:47 PM