Comparative analysis of principal components can be misleading

Last week we read a paper that used Principal Component analysis, so this week we will discuss why this type of analysis can mislead inferences. Friday, 25th : "Comparative Analysis of Principal Components Can be Misleading" Uyeda et al. 2015.


Most existing methods for modeling trait evolution are univariate, although researchers are often interested in investigating evolutionary patterns and processes across multiple traits. Principal components analysis (PCA) is commonly used to reduce the dimensionality of multivariate data so that univariate trait models can be fit to individual principal components. The problem with using standard PCA on phylogenetically structured data has been previously pointed out yet it continues to be widely used in the literature. Here we demonstrate precisely how using standard PCA can mislead inferences: The first few principal components of traits evolved under constant-rate multivariate Brownian motion will appear to have evolved via an “early burst” process. A phylogenetic PCA (pPCA) has been proprosed to alleviate these issues. However, when the true model of trait evolution deviates from the model assumed in the calculation of the pPCA axes, we find that the use of pPCA suffers from similar artifacts as standard PCA. We show that data sets with high effective dimensionality are particularly likely to lead to erroneous inferences. Ultimately, all of the problems we report stem from the same underlying issue—by considering only the first few principal components as univariate traits, we are effectively examining a biased sample of a multivariate pattern. These results highlight the need for truly multivariate phylogenetic comparative methods. As these methods are still being developed, we discuss potential alternative strategies for using and interpreting models fit to univariate axes of multivariate data.

Published Sep. 23, 2015 9:15 AM - Last modified Sep. 23, 2015 9:15 AM