admin    August 18, 2021    0


Principal Component Analysis (PCA) can be thought of as a method of dimension reduction. It is a set of variables analysed to reveal major dimensions of variation. Original data set in PCA can be replaced by a new and smaller data set with minimum loss of information. This reveals the relationship among these variables. (IBM SPSS)

Principal component analysis seeks the standard linear combination of the original variables which have proximal variance. It makes use of a few linear combinations which can be used to summarize the data; losing as little information as possible in the process.

It is a way of identifying patterns in data and expressing the data in such a way as to highlight their similarities and differences.

PCA is a powerful tool for analyzing data and is used in image compression.

In PCA orthogonal axes are re-orientated towards the direction of higher variation to avoid redundancy and the level of variation is measured. The level of variation indication vary as one moves from the first principal component outwards such that the first component provides the strongest variation which will reduce progressively from the second, third etc and the last one is basically noise.

The main objective of principal component analysis is to reduce original data set of variables into smaller components which are not correlated.

These new components represent most of the information in the original variables and useful when working with large number of variables. By reducing the dimensions one can interpret a few component than a large number of variables.

PCA works on the assumption that numeric variables have a linear relationship.


  • The new variables are linear combinations of the old ones.
  • The new axes are orthogonal and oriented in directions of maximum variation.
  • The new variables account for the same total variance, but in decreasing proportions.
  • If explanation of less than 100% of the total variation is acceptable, one can drop the less relevant new variables.


Step I

Get some data e.g a 2-dimensional plot

Step II

Subtract the mean from each of the dimensions where X values are subtracted from the mean of X and the values of Y subtracted from the mean of Y such that data set exist with mean = 0

Step III

Calculate the covariance matrix

Step IV

Data compression using subtracted mean and eigenvectors to plot.


Fig: PCA example data, original data on the left, data with subtracted means on the left

In general; the eigenvector with the highest eigenvalues is the PCA of the data set. Once eigenvectors are got from covariance matrix and ranked from the lowest to highest value; one can omit the less significant values.




Factor analysis like principal component analysis is an attempt to explain a set of data in a smaller number of dimensions than one starts with.

Both are variable reduction methods that can be used to identify groups of observed variables that tend to hang together empirically.

They can be performed with the SAS system’s Factor procedure and they sometimes even provide very similar results.


  • Factor analysis assumes that the covariation in the observed variables is due to the presence of one or more latent variables that exert casual influence on these observed variables while PCA works on the assumption that numeric variables have a linear relationship.
  • Principal component analysis is merely a transformation of the data with no assumptions made about the form of the covariance matrix from which the data comes while factor analysis supposes that the data comes from the well-defined model where the underlying factors satisfy the assumptions.
  • In principal component analysis; the emphasis is on a transformation from the observed variables to the principal components whereas in factor analysis the emphasis is on a transformation from the underlying factors to the observed variables.