In ordinary PCA, the principal component directions are obtained from
the eigenvectors of the sample covariance matrix. In dppca,
these directions can be computed in two different ways.
- Non-private PC directions: eigenvectors of the sample covariance matrix.
- Differentially private PC directions: private principal component directions obtained through the g-DPPCA procedure.
Notation
Let
be the data matrix used for PCA, where is the -th observation. We assume that has been centered, and optionally standardized.
The principal component direction matrix is denoted by
where each column is a unit vector representing the -th pc direction.
The corresponding score matrix is .
1. Non-private PC directions
The classical sample covariance matrix is
The non-private PCA directions are obtained from the eigenvalue decomposition
where
The -th sample principal component direction is .
Equivalently,
In the non-private option of dppca, the direction matrix
used for projection is
2. DP PC directions
Kim and Jung (2025) proposed
g-DPPCA by adding matrix Gaussian mechanism on the
generalized multivariate Kendall’s tau matrix which based on the robust
data transformation called generalized spatial sign proposed by Raymakers and Rousseeuw (2019).
For a positive valued scale function , consider a map defined as
is called as a generalized spatial sign with respect to .
The generalized multivariate Kendall’s tau matrix with respect to is defined as
where is an independent copy of . Importantly, if follows an elliptical distribution (which including Gaussian and multivariate -distributions), shares the same eigenvectors with same order to the . So, one can conduct a PCA by estimating and then get eigenvectors of it.
For a convenience, we write as the given sign function. For a random sample , the second order U-statistic of can be written as
Note that the sensitivity of with respect to the Frobenius norm can be upper bounded by
So, for a dataset the randomized mechanism defined as
where and , satisfies -DP.
Define
as the matrix of the first
eigenvectors of
.
Then,
satisfies
-DP
due to the post-processing property, and it can be served as a DP
principal components. Kim and Jung (2025) calls these process as a
g-DPPCA.
In the implementation of the function dp_pc_dir with
option g_dppca=TRUE, we use the spherical transformation
to output differentially private PC directions
.
In this case, it holds that
,
and thus the variance of additive Gaussian noise is set as
.
Summary
The principal component direction step in dppca can be
summarized as follows.
- Start with a preprocessed data matrix .
- Choose a direction estimation method.
- Obtain a direction matrix .
- Compute projected scores .
- Use the scores for private scree estimation or private score visualization.
The main distinction is whether is obtained from the ordinary sample covariance matrix or from a differentially private robust PC direction estimator.
References
Minwoo Kim and Sungkyu Jung (2025), “Robust and differentially private principal component analysis,” Statistical Analysis and Data Mining, 18(6), https://doi.org/10.1002/sam.70053
Jakob Raymaekers and Peter Rousseeuw (2019), “A generalized spatial sign covariance matrix,” Journal of Multivariate Analysis, 171:94–111, https://doi.org/10.1016/j.jmva.2018.11.010