Linear Discriminant Analysis (LDA) is similar to Principal Component Analysis (PCA) in reducing the dimensionality. However, there are certain nuances with LDA that we should be aware of-
- LDA is supervised (needs categorical dependent variable) to provide the best linear combination of original variables while providing the maximum separation among the different groups. On the other hand, PCA is unsupervised
- LDA can be used for classification also, whereas PCA is generally used for unsupervised learning
- LDA doesn’t need the numbers of discriminant to be passed on ahead of time. Generally speaking the number of discriminant will be lower of the number of variables or number of categories-1.
- LDA is more robust and can be conducted without even standardizing or normalizing the variables in certain cases
- LDA is preferred for bigger data sets and machine learning
Let the action begin now-
Principal Component Analysis ( PCA) is generally used as an unsupervised algorithm for reducing the data dimensions to address Curse of Dimensionality, detecting outliers, removing noise, speech recognition and other such areas.
The underlying algorithm in PCA is generally a linear algebra technique called Singular Value Decomposition (SVD). PCAs take the original data and create orthogonal components (uncorrelated components) that capture the information contained in the original data however with significantly less number of components.
Either the components themselves or key loading of the components can be plugged in any further modeling work, rather than the original data to minimize information redundancy and noise.
There are three main ways to select the right number of components-
- Number of components should explain at least 80% of the original data variance or information [Preferred One]
- Eigen value of each PCA component should be more than or equal to 1. This means that they should express at least one variable worth of information
- Elbow or Scree method- look for the elbow in the percentage of variance explained by each components and select the components where an elbow or kink is visible.
You can use any one of the above or combination of the above to select the right number of components. It is very critical to standardize or normalize data before conducting PCA.
In the below case study we will use the first criterion shown above, i.e. 80% or more of the original data variance should be explained by the selected number of components.