0. Problem setup
Suppose we observe data points
We write the data matrix as where each row corresponds to one observation and each column corresponds to one variable.
Principal Component Analysis (PCA) is a dimension reduction method that represents high-dimensional data through a small number of orthogonal directions that preserve as much variation as possible.
To do this, PCA finds a direction such that the projected values
are as spread out as possible. A direction with larger projected variance captures more variation in the data.
1. Population PCA
Let be a random vector with covariance matrix .
First principal component direction
The first population principal component direction is defined as
Thus, is the unit direction that maximizes the variance of the projection of onto .
Subsequent principal component directions
Similarly, each subsequent principal component directions are obtained by maximizing the variance of the projection of onto that direction, while being orthogonal to the previously chosen directions.
For ,
Therefore, PCA gives a sequence of mutually orthogonal directions ordered by decreasing projected variance.
2. Eigenvalue decomposition
The solutions to the above variance maximization problems are obtained from the eigenvalue decomposition of the covariance matrix .
We can write
where
The eigenvectors are the population principal component directions, and each eigenvalue gives the variance of the projection of onto .
Lagrangian formulation
To see why the PCA directions are eigenvectors, consider the first principal component problem
Using the constraint , define the Lagrangian
Taking the derivative with respect to and setting it equal to zero gives
Therefore,
Hence, the optimizer must be an eigenvector of , and the corresponding Lagrange multiplier is the associated eigenvalue.
For a unit eigenvector , we have
Thus,
The first principal component direction is the eigenvector corresponding to the largest eigenvalue , and the maximum projected variance is .
Repeating this procedure under orthogonality constraints gives the remaining eigenvectors. Therefore,
with
The -th eigenvalue can be interpreted as the variance of the projected random variable along the -th principal component direction
3. Sample PCA
In practice, the population covariance matrix is unknown, so we use the sample covariance matrix instead.
Let
be the sample mean. The sample covariance matrix is
Equivalently, if denotes the centered data matrix, then
Sample principal component directions
The first sample principal component direction is
For ,
The -th sample principal component direction is obtained as the -th eigenvector of , and the corresponding sample eigenvalue is
This value is the sample variance of the data projected onto the direction .
From an estimation point of view, and estimate the population quantities and , respectively.
4. Principal component score and scree
PC scores
Assume that the data matrix has been centered. Let be the -th sample principal component direction. The -th PC score vector is defined as
Equivalently, the -th entry of is
which is the coordinate of the -th observation after projection onto the -th principal component direction.
If the first principal component directions are used, the score matrix is
A score plot usually displays two score vectors, such as , as a two-dimensional scatter plot. It is used to explore low-dimensional patterns in the data, such as clusters, outliers, or separation between groups.
PC Scree values
Let
be the eigenvalues of the sample covariance matrix . We call the -th sample scree value. It represents the sample variance explained by the -th principal component direction.
In terms of the score vector ,
A scree plot displays the sequence of eigenvalues
or the proportion of variance explained by each principal component,
It summarizes how much variation is explained by each principal component. Since the eigenvalues are ordered decreasingly, the scree plot is often used to decide how many principal components should be retained.
Relationship between score plots and scree plots
The scree plot and the score plot show PCA results in different ways.
The score plot shows the observations after projection onto selected principal component directions. For the -th principal component direction , the scree value is
This is the sample variance of the projected scores .
The score plot shows the observations in the principal component coordinate system. If
then the score matrix is
The rows of are the low-dimensional coordinates of the observations.
Therefore, the scree plot helps decide which components are important, while the score plot visualizes the data using those components. For instance, when the first two scree values are large, the two-dimensional score plot
can give an informative view of the main structure in the data.