Robust approaches to principal component analysis for high-dimensional and directional data Date: 16 October 2020, Friday Time: 4pm Speaker: Prof. Inge Koch (The University of Western Australia) Abstract: Principal component analysis (PCA) is a widespread tool for selecting a smaller number of dimensions and key features in multivariate and high-dimensional data. More recently a number of variants of PCA have been developed including sparse PCA for high-dimensional data and robust PCA. In this talk we focus on PCA developments for multivariate and high-dimensional directional random vectors and data which have been transformed to live on the surface of the d-dimensional sphere. These random vectors are also know as special signs. For directional random vectors we review robust covariance-related matrices, including the sign and rank covariance matrices, and we present theoretical results of these and relate their relationships to the canonical population covariance matrix. For random vectors and data from the elliptic distribution we point out relationships between these robust population covariance matrices and their sample couterparts. For non-elliptic data, much less is known at the population level and the sample level about behaviour of these various covariance matrices. We begin with sample versions of the robust covariance matrices, and show the relationships between them and between sample and corresponding population quantities. We complement these comparisons with calculations based on real data and simulated data ranging from multivariate Gaussian and skew-normal to bimodal and data with high kurtosis and outliers. For such data we study the behaviour of the first few eigenvectors and calculate the closeness of eigenvectors arising from different robust covariance matrices. For simulated data we also calculate their closeness to the eigenvector of the population covariance matrix for a range of dimensions as the sample size increases. Our findings show that kurtosis is a key feature which affects the closeness of the sample eigenvectors to those of the population and we suggest criteria based on the amount of kurtosis which may provide a guide to choosing the `bestâ sample covariance to use for particular datasets. Link: https://au.bbcollab.com/guest/fcf219c74ac743e89565a9e6e8d349a9