This function computes two-dimensional principal component scores and releases group-wise differentially private histograms on a common score frame and grid. It is useful when observations have group labels and the low-dimensional score distribution should be compared across groups.
Arguments
- X
A matrix or data frame where rows correspond to observations and columns correspond to variables.
Xcan additionally include a named column representing the group label for each observation.- group
Group labels. This can be a vector of length
nrow(X)or a single column name inX. If a column name is supplied, that column is used as the group label and removed from the feature matrix.- eps
Positive number defining the total
epsilonprivacy parameter.- delta
Number in
(0, 1)defining the totaldeltaprivacy parameter.- bins
Integer vector of length 2 defining the number of histogram bins along the first and second score axes, respectively.
- method
Character vector specifying which private histogram methods to compute. Use
"add"for the additive Gaussian histogram and"sparse"for the sparse thresholded histogram. The default isc("add", "sparse").- center
A logical value indicating whether to center the columns of
Xbefore computing principal component directions. The default isTRUE.- standardize
A logical value indicating whether to scale the columns of
Xby their sample standard deviations after optional centering. The default isFALSE.- g_dppca
A logical value indicating whether to use private principal component directions. The default is
FALSE. Seedp_pc_dir()for details.- cpp.option
A logical value passed to
dp_pc_dir()wheng_dppca = TRUE. The default isFALSE.- axes
Integer vector of length 2 specifying the principal components used to construct the score coordinates. The default is
c(1, 2).
Value
A list with components:
- score
An \(n \times 2\) matrix containing the PC scores for the two selected axes.
- frame
A list with components
xlimandylim.- groups
A named list of group-specific histogram outputs.
- method
Character vector of private histogram methods used.
Details
The score directions, plotting frame, and histogram grid are shared across all groups. For each group \(g\), the group-specific count in bin \(B_k\) is \(c_k^{(g)} = \sum_i 1\{s_i \in B_k, g_i = g\}\). Private histograms are then computed separately for each group on the common grid. Because the groups form a partition of the rows, the group-wise histograms use the same histogram privacy parameters for each group by parallel composition.
See also
dp_score_plot_group() for plotting group-wise score histograms.
dp_score() for pooled score histograms.
Examples
data(gau_g, package = "dppca")
# Compute private grouped PCA scores.
set.seed(123)
score_gau_g <- dp_score_group(
gau_g,
group = "group",
eps = 3,
delta = 1e-3,
bins = c(8, 8)
)
head(score_gau_g$score)
#> PC1 PC2
#> [1,] 0.5752441 0.9711984
#> [2,] 3.1987535 1.5765379
#> [3,] 2.0936989 1.0882390
#> [4,] 0.9725891 1.5285036
#> [5,] 1.0839100 0.4533152
#> [6,] 1.5430497 -1.0954815
head(score_gau_g$groups$group1$add)
#> xmin xmax ymin ymax prob
#> 1 -13.6247864 -10.0056291 -13.7689 -10.14974 0.007860830
#> 2 -10.0056291 -6.3864719 -13.7689 -10.14974 0.000000000
#> 3 -6.3864719 -2.7673146 -13.7689 -10.14974 0.008186366
#> 4 -2.7673146 0.8518427 -13.7689 -10.14974 0.000000000
#> 5 0.8518427 4.4709999 -13.7689 -10.14974 0.000000000
#> 6 4.4709999 8.0901572 -13.7689 -10.14974 0.001209186