This function computes two-dimensional principal component scores and returns differentially private histogram estimates on the score space. It returns the score coordinates, the plotting frame, the non-private histogram, and the requested private histogram estimates.
Arguments
- X
A numeric matrix or data frame. Rows correspond to observations and columns correspond to variables.
- eps
Positive number defining the total
epsilonprivacy parameter.- delta
Number in
(0, 1)defining the totaldeltaprivacy parameter.- bins
Integer vector of length 2 defining the number of histogram bins along the first and second score axes, respectively.
- method
Character vector specifying which private histogram methods to compute. Use
"add"for the additive Gaussian histogram and"sparse"for the sparse thresholded histogram. The default isc("add", "sparse").- center
A logical value indicating whether to center the columns of
Xbefore computing principal component directions. The default isTRUE.- standardize
A logical value indicating whether to scale the columns of
Xby their sample standard deviations after optional centering. The default isFALSE.- g_dppca
A logical value indicating whether to use private principal component directions. The default is
FALSE. Seedp_pc_dir()for details.- cpp.option
A logical value passed to
dp_pc_dir()wheng_dppca = TRUE. The default isFALSE.- axes
Integer vector of length 2 specifying the principal components used to construct the score coordinates. The default is
c(1, 2).
Value
A list with components:
- score
An \(n \times 2\) matrix containing the PC scores for the two selected axes.
- frame
A list with components
xlimandylim.- none
Data frame for the non-private empirical histogram.
- add
Data frame for the additive Gaussian private histogram, or
NULLif not requested.- sparse
Data frame for the sparse private histogram, or
NULLif not requested.- method
Character vector of private histogram methods used.
Details
Let \(v_a\) and \(v_b\) be the principal component directions selected
by axes = c(a, b) for some \(1 \le a < b \le ncol(X)\).
After preprocessing, the score point for \(i\)th observation
is \(s_i = (x_i^\top v_a, x_i^\top v_b)\). A non-private score
plot would display the points \(s_1, \ldots, s_n\) directly. This function
instead summarizes their empirical distribution by a two-dimensional histogram
and releases private versions of the histogram for the visualization.
The plotting frame is constructed privately from the score coordinates. The frame center is estimated by coordinate-wise private medians, and the frame radius is estimated by the private 0.99 quantile of the Euclidean distances from this private center. The resulting private radius is inflated by a fixed factor and used to form a square plotting frame. The private frame is computed using a smooth-sensitivity based quantile mechanism (Nissim et al. 2007) .
The private histogram is computed on the rectangular grid defined by the
private frame and the bin counts in bins. Under
row-level adjacency, changing one observation can increase one bin count by
one and decrease another by one, giving \(\ell_1\) sensitivity at most
\(2\) and \(\ell_2\) sensitivity at most \(\sqrt{2}\) for the count
vector.
Two private histogram mechanisms are supported:
"add"constructs an additive differentially private histogram by adding Gaussian noise to all bin counts, clipping negative noisy counts to zero, and normalizing the result. This additive-noise approach is commonly used for private histograms; see Wasserman and Zhou (2010) ."sparse"constructs a sparse differentially private histogram for settings where many bins are empty. It perturbs only nonzero empirical bin proportions and keeps bins whose noisy values exceed a stability threshold, following the stability-based private histogram idea of Karwa and Vadhan (2018) .
The privacy parameters are allocated across the privacy-consuming steps. If
g_dppca = FALSE, half of eps and delta is used for private frame
construction and half for the private histogram. If g_dppca = TRUE, the
parameters are split equally among private direction estimation, private frame
construction, and private histogram release.
For a detailed procedure and mathematical formulations, refer https://yejinjo0220.github.io/dppca/articles/dp_score.
References
Dwork C, Roth A (2014). “The Algorithmic Foundations of Differential Privacy.” Found. Trends Theor. Comput. Sci., 9(3–4), 211–407. ISSN 1551-305X, doi:10.1561/0400000042 .
Nissim K, Raskhodnikova S, Smith A (2007). “Smooth Sensitivity and Sampling in Private Data Analysis.” In STOC'07: Proceedings of the 39th Annual ACM Symposium on Theory of Computing, 75–84. ISBN 9781595936318, doi:10.1145/1250790.1250803 .
Wasserman L, Zhou S (2010). “A Statistical Framework for Differential Privacy.” Journal of the American Statistical Association, 105(489), 375–389. doi:10.1198/jasa.2009.tm08651 .
Karwa V, Vadhan S (2018). “Finite Sample Differentially Private Confidence Intervals.” In Proceedings of the 9th Innovations in Theoretical Computer Science Conference, volume 94 of Leibniz International Proceedings in Informatics, 44:1–44:9. doi:10.4230/LIPIcs.ITCS.2018.44 .
Kim M, Jung S (2025). “Robust and Differentially Private Principal Component Analysis.” Statistical Analysis and Data Mining: An ASA Data Science Journal, 18(6), e70053. doi:10.1002/sam.70053 .
See also
dp_score_plot() for plotting the output of this function.
dp_score_group() and dp_score_plot_group() for group-wise score
histograms.
dp_pc_dir() for private principal component direction estimation.
Examples
data(gau, package = "dppca")
# Use a small subset to keep the example fast.
X <- gau[1:300, ]
# Compute private two-dimensional PCA scores using the additive histogram method.
set.seed(123)
score_gau <- dp_score(
X,
eps = 2,
delta = 1e-3,
method = "add",
bins = c(10, 10)
)
head(score_gau$score)
#> PC1 PC2
#> 1 -1.6418971 -2.9417503
#> 2 2.4192805 -1.9747774
#> 3 -1.5647289 2.4500389
#> 4 1.1818664 0.6632302
#> 5 -0.7668155 -2.7729387
#> 6 1.4701354 3.1142919
head(score_gau$add)
#> xmin xmax ymin ymax prob
#> 1 -5.3429455 -4.4593527 -3.298014 -2.414422 0.016055676
#> 2 -4.4593527 -3.5757599 -3.298014 -2.414422 0.000000000
#> 3 -3.5757599 -2.6921671 -3.298014 -2.414422 0.016720579
#> 4 -2.6921671 -1.8085743 -3.298014 -2.414422 0.000000000
#> 5 -1.8085743 -0.9249815 -3.298014 -2.414422 0.005652482
#> 6 -0.9249815 -0.0413887 -3.298014 -2.414422 0.009703807