Skip to contents

This function computes two-dimensional principal component scores and returns differentially private histogram estimates on the score space. It returns the score coordinates, the plotting frame, the non-private histogram, and the requested private histogram estimates.

Usage

dp_score(
  X,
  eps,
  delta,
  bins,
  method = c("add", "sparse"),
  center = TRUE,
  standardize = FALSE,
  g_dppca = FALSE,
  cpp.option = FALSE,
  axes = c(1, 2)
)

Arguments

X

A numeric matrix or data frame. Rows correspond to observations and columns correspond to variables.

eps

Positive number defining the total epsilon privacy parameter.

delta

Number in (0, 1) defining the total delta privacy parameter.

bins

Integer vector of length 2 defining the number of histogram bins along the first and second score axes, respectively.

method

Character vector specifying which private histogram methods to compute. Use "add" for the additive Gaussian histogram and "sparse" for the sparse thresholded histogram. The default is c("add", "sparse").

center

A logical value indicating whether to center the columns of X before computing principal component directions. The default is TRUE.

standardize

A logical value indicating whether to scale the columns of X by their sample standard deviations after optional centering. The default is FALSE.

g_dppca

A logical value indicating whether to use private principal component directions. The default is FALSE. See dp_pc_dir() for details.

cpp.option

A logical value passed to dp_pc_dir() when g_dppca = TRUE. The default is FALSE.

axes

Integer vector of length 2 specifying the principal components used to construct the score coordinates. The default is c(1, 2).

Value

A list with components:

score

An \(n \times 2\) matrix containing the PC scores for the two selected axes.

frame

A list with components xlim and ylim.

none

Data frame for the non-private empirical histogram.

add

Data frame for the additive Gaussian private histogram, or NULL if not requested.

sparse

Data frame for the sparse private histogram, or NULL if not requested.

method

Character vector of private histogram methods used.

Details

Let \(v_a\) and \(v_b\) be the principal component directions selected by axes = c(a, b) for some \(1 \le a < b \le ncol(X)\). After preprocessing, the score point for \(i\)th observation is \(s_i = (x_i^\top v_a, x_i^\top v_b)\). A non-private score plot would display the points \(s_1, \ldots, s_n\) directly. This function instead summarizes their empirical distribution by a two-dimensional histogram and releases private versions of the histogram for the visualization.

The plotting frame is constructed privately from the score coordinates. The frame center is estimated by coordinate-wise private medians, and the frame radius is estimated by the private 0.99 quantile of the Euclidean distances from this private center. The resulting private radius is inflated by a fixed factor and used to form a square plotting frame. The private frame is computed using a smooth-sensitivity based quantile mechanism (Nissim et al. 2007) .

The private histogram is computed on the rectangular grid defined by the private frame and the bin counts in bins. Under row-level adjacency, changing one observation can increase one bin count by one and decrease another by one, giving \(\ell_1\) sensitivity at most \(2\) and \(\ell_2\) sensitivity at most \(\sqrt{2}\) for the count vector.

Two private histogram mechanisms are supported:

  • "add" constructs an additive differentially private histogram by adding Gaussian noise to all bin counts, clipping negative noisy counts to zero, and normalizing the result. This additive-noise approach is commonly used for private histograms; see Wasserman and Zhou (2010) .

  • "sparse" constructs a sparse differentially private histogram for settings where many bins are empty. It perturbs only nonzero empirical bin proportions and keeps bins whose noisy values exceed a stability threshold, following the stability-based private histogram idea of Karwa and Vadhan (2018) .

The privacy parameters are allocated across the privacy-consuming steps. If g_dppca = FALSE, half of eps and delta is used for private frame construction and half for the private histogram. If g_dppca = TRUE, the parameters are split equally among private direction estimation, private frame construction, and private histogram release.

For a detailed procedure and mathematical formulations, refer https://yejinjo0220.github.io/dppca/articles/dp_score.

References

Dwork C, Roth A (2014). “The Algorithmic Foundations of Differential Privacy.” Found. Trends Theor. Comput. Sci., 9(3–4), 211–407. ISSN 1551-305X, doi:10.1561/0400000042 .

Nissim K, Raskhodnikova S, Smith A (2007). “Smooth Sensitivity and Sampling in Private Data Analysis.” In STOC'07: Proceedings of the 39th Annual ACM Symposium on Theory of Computing, 75–84. ISBN 9781595936318, doi:10.1145/1250790.1250803 .

Wasserman L, Zhou S (2010). “A Statistical Framework for Differential Privacy.” Journal of the American Statistical Association, 105(489), 375–389. doi:10.1198/jasa.2009.tm08651 .

Karwa V, Vadhan S (2018). “Finite Sample Differentially Private Confidence Intervals.” In Proceedings of the 9th Innovations in Theoretical Computer Science Conference, volume 94 of Leibniz International Proceedings in Informatics, 44:1–44:9. doi:10.4230/LIPIcs.ITCS.2018.44 .

Kim M, Jung S (2025). “Robust and Differentially Private Principal Component Analysis.” Statistical Analysis and Data Mining: An ASA Data Science Journal, 18(6), e70053. doi:10.1002/sam.70053 .

See also

dp_score_plot() for plotting the output of this function. dp_score_group() and dp_score_plot_group() for group-wise score histograms. dp_pc_dir() for private principal component direction estimation.

Examples

data(gau, package = "dppca")

# Use a small subset to keep the example fast.
X <- gau[1:300, ]

# Compute private two-dimensional PCA scores using the additive histogram method.
set.seed(123)
score_gau <- dp_score(
  X,
  eps = 2,
  delta = 1e-3,
  method = "add",
  bins = c(10, 10)
)

head(score_gau$score)
#>          PC1        PC2
#> 1 -1.6418971 -2.9417503
#> 2  2.4192805 -1.9747774
#> 3 -1.5647289  2.4500389
#> 4  1.1818664  0.6632302
#> 5 -0.7668155 -2.7729387
#> 6  1.4701354  3.1142919
head(score_gau$add)
#>         xmin       xmax      ymin      ymax        prob
#> 1 -5.3429455 -4.4593527 -3.298014 -2.414422 0.016055676
#> 2 -4.4593527 -3.5757599 -3.298014 -2.414422 0.000000000
#> 3 -3.5757599 -2.6921671 -3.298014 -2.414422 0.016720579
#> 4 -2.6921671 -1.8085743 -3.298014 -2.414422 0.000000000
#> 5 -1.8085743 -0.9249815 -3.298014 -2.414422 0.005652482
#> 6 -0.9249815 -0.0413887 -3.298014 -2.414422 0.009703807