This function computes estimates of scree values, eigenvalues of covariance matrix, for principal component analysis, including both the usual non-private estimates and differentially private estimates. The private estimates are computed as the private mean of the squared principal component scores. See Details for the estimating equations and method-specific construction.
Usage
dp_scree(
X,
k,
method = c("clipped", "pmwm", "huber"),
control = NULL,
eps,
delta,
center = TRUE,
standardize = FALSE,
g_dppca = FALSE,
cpp.option = FALSE,
mono = TRUE
)Arguments
- X
A numeric matrix or data frame. Rows correspond to observations and columns correspond to variables.
- k
Positive integer defining the number of leading principal components to estimate. Must be an integer between
1and the number of columns inX.- method
Scree value estimation method or methods. One or more of
"clipped","pmwm", or"huber". If omitted,"clipped"is used.- control
Optional method-specific control list created by
clipped_control(),pmwm_control(), orhuber_control(). When multiple methods are requested, use a named list with method names.- eps
Positive number defining the total
epsilonprivacy parameter. Ifg_dppca = TRUE, it is split between private direction estimation and private scree estimation.- delta
Number in
(0, 1)defining the totaldeltaprivacy parameter. Ifg_dppca = TRUE, it is split between private direction estimation and private scree estimation.- center
A logical value indicating whether to center the columns of
Xbefore computing principal component directions. The default isTRUE.- standardize
A logical value indicating whether to scale the columns of
Xby their sample standard deviations after optional centering. The default isFALSE.- g_dppca
A logical value indicating whether to use private principal component directions for scree estimation. The default is
FALSE. Seedp_pc_dir()for details.- cpp.option
A logical value passed to
dp_pc_dir()wheng_dppca = TRUE. The default isFALSE. Wheng_dppca = TRUE,dp_pc_dir()is called with argumentseps = eps / 2anddelta = delta / 2.- mono
A logical value indicating whether to apply monotone post-processing to the vector of private scree values. The default is
TRUE.
Value
If one method is requested, a list with components:
method: scree value estimation method.scree_np: non-private scree estimates.pve_np: non-private proportions of variance explained.scree: differentially private scree value estimates.pve: differentially private proportions of variance explained.
If multiple methods are requested, a named list of method-specific results is returned.
Details
Let \(X\) denote the preprocessed data matrix and let \(v_l\) be the \(l\)th principal component direction. The \(l\)th score vector is \(z_l = X v_l\). The corresponding sample scree value can be written as $$ \hat{\lambda}_l = v_l^\top \widehat{\Sigma} v_l = \frac{1}{n - 1}\sum_{i = 1}^n z_{il}^2 = \frac{n}{n - 1}\left(\frac{1}{n}\sum_{i = 1}^n w_{il}\right), \qquad w_{il} = z_{il}^2. $$ Therefore, each scree value is estimated by privately estimating the mean of \(w_{1l}, \ldots, w_{nl}\) and multiplying by \(n/(n - 1)\).
The supported methods differ in how this private mean is estimated:
"clipped"clips the squared scores \(w_{i\ell}\) atC_clipand then applies the Gaussian mechanism (Dwork and Roth 2014) . This is the simplest option but depends directly on the clipping threshold."pmwm"uses the private modified winsorized mean approach of Ramsay and Spicker (2025) , adapted from the accompanying Python implementation into R. It privately estimates tail cutoffs, winsorizes the squared scores \(w_{i\ell}\), and releases a noisy winsorized mean."huber"uses a Huber-type private robust mean estimator based on noisy gradient descent, following Yu et al. (2024) .
The argument g_dppca controls how the principal component directions are
obtained. If g_dppca = FALSE, the directions are computed non-privately as
an eigenvector of sample covariance, and the full privacy parameters
eps and delta are used for private scree value estimation. If g_dppca = TRUE,
the directions are computed privately using dp_pc_dir().
In that case, the privacy parameters are split equally:
dp_pc_dir() receives eps = eps / 2 and delta = delta / 2 for
private direction estimation, and the remaining eps / 2 and delta / 2
are used for private scree value estimation.
When mono = TRUE, the final monotone adjustment is a post-processing step and
does not change the privacy guarantee.
For a detailed procedure and mathematical formulations, refer https://yejinjo0220.github.io/dppca/articles/dp_scree.
References
Dwork C, Roth A (2014). “The Algorithmic Foundations of Differential Privacy.” Found. Trends Theor. Comput. Sci., 9(3–4), 211–407. ISSN 1551-305X, doi:10.1561/0400000042 .
Ramsay K, Spicker D (2025). “Improved subsample-and-aggregate via the private modified winsorized mean.” Code available at https://github.com/12ramsake/PMWM, 2501.14095, https://arxiv.org/abs/2501.14095.
Yu M, Ren Z, Zhou W (2024). “Gaussian differentially private robust mean estimation and inference.” Bernoulli, 30(4), 3059–3088.
Kim M, Jung S (2025). “Robust and Differentially Private Principal Component Analysis.” Statistical Analysis and Data Mining: An ASA Data Science Journal, 18(6), e70053. doi:10.1002/sam.70053 .
See also
dp_pc_dir() for principal component direction estimation.
clipped_control(), pmwm_control(), and huber_control() for
method-specific tuning parameters.
Examples
data(gau, package = "dppca")
# Use a small subset to keep the example fast.
X <- gau[1:100, ]
# Estimate the private scree values using the clipped mean method.
set.seed(123)
dp_scree(
X,
k = 2,
method = "clipped",
control = clipped_control(C_clip = 3),
eps = 2,
delta = 1e-3
)
#> $method
#> [1] "clipped"
#>
#> $scree_np
#> [1] 2.061105 1.771498
#>
#> $pve_np
#> [1] 0.5377819 0.4622181
#>
#> $scree
#> [1] 1.196163 1.196163
#>
#> $pve
#> [1] 0.5 0.5
#>
# Other scree methods can be used by changing `method` and `control`, e.g.,
# method = "pmwm",
# control = pmwm_control(a = 0, b = 50, trim_const = 10, eta = 0.01)
#
# method = "huber",
# control = huber_control(k_min_m2 = -10, k_max_m2 = 10, m2_frac = 1 / 4)