Principal Component Analysis (PCA) can be thought of as a method of dimension reduction. It is a set of variables analysed to reveal major dimensions of variation. Original data set in PCA can be replaced by a new and smaller data set with minimum loss of information. This reveals the relationship among these variables. (IBM SPSS)
Principal component analysis seeks the standard linear combination of the original variables which have proximal variance. It makes use of a few linear combinations which can be used to summarize the data; losing as little information as possible in the process.
It is a way of identifying patterns in data and expressing the data in such a way as to highlight their similarities and differences.
PCA is a powerful tool for analyzing data and is used in image compression.
In PCA orthogonal axes are re-orientated towards the direction of higher variation to avoid redundancy and the level of variation is measured. The level of variation indication vary as one moves from the first principal component outwards such that the first component provides the strongest variation which will reduce progressively from the second, third etc and the last one is basically noise.
The main objective of principal component analysis is to reduce original data set of variables into smaller components which are not correlated.
These new components represent most of the information in the original variables and useful when working with large number of variables. By reducing the dimensions one can interpret a few component than a large number of variables.
PCA works on the assumption that numeric variables have a linear relationship.
CHARACTERISTICS OF PRINCIPAL COMPONENT ANALYSIS
WORKING OUT PRINCIPAL COMPONENT ANALYSIS
Get some data e.g a 2-dimensional plot
Subtract the mean from each of the dimensions where X values are subtracted from the mean of X and the values of Y subtracted from the mean of Y such that data set exist with mean = 0
Calculate the covariance matrix
Data compression using subtracted mean and eigenvectors to plot.
Fig: PCA example data, original data on the left, data with subtracted means on the left
In general; the eigenvector with the highest eigenvalues is the PCA of the data set. Once eigenvectors are got from covariance matrix and ranked from the lowest to highest value; one can omit the less significant values.
RELATIONSHIP BETWEEN FACTOR ANALYSIS AND PRINCIPAL COMPONENT ANALYSIS
Factor analysis like principal component analysis is an attempt to explain a set of data in a smaller number of dimensions than one starts with.
Both are variable reduction methods that can be used to identify groups of observed variables that tend to hang together empirically.
They can be performed with the SAS system’s Factor procedure and they sometimes even provide very similar results.
September 23, 2019