Principal Component Analysis ( PCA) using Scikit

Principal Component Analysis ( PCA) is generally used as an unsupervised algorithm for reducing the data dimensions to address Curse of Dimensionality, detecting outliers, removing noise, speech recognition and other such areas.

The underlying algorithm in PCA is generally a linear algebra technique called Singular Value Decomposition (SVD). PCAs take the original data and create orthogonal components (uncorrelated components) that capture the information contained in the original data however with significantly less number of components.

Either the components themselves or  key loading of the components can be plugged in any further modeling work, rather than the original data to minimize information redundancy and noise.

There are three main ways to select the right number of components-

  1. Number of components should explain at least 80% of the original data variance or information [Preferred One]
  2. Eigen value of each PCA component should be more than or equal to 1. This means that they should express at least one variable worth of information
  3. Elbow or Scree method- look for the elbow in the percentage of variance explained by each components and select the components where an elbow or kink is visible.

You can use any one of the above or combination of the above to select the right number of components. It is very critical to standardize or normalize data before conducting PCA.

In the below case study we will use the first criterion shown above, i.e. 80% or more of the original data variance should be explained by the selected number of components.

PCA1PCA2PCA3PCA4PCA5PCA6

2 thoughts on “Principal Component Analysis ( PCA) using Scikit

  1. Pingback: Linear Discriminant Analysis ( LDA) with Scikit | RP's Blog on data science

  2. Pingback: Learn Data Science using Python Step by Step | RP's Blog on data science

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s