Control options for private modified winsorized scree estimation
Source:R/scree_controls.R
pmwm_control.RdCreates a control list for the private modified winsorized mean scree
estimator used by dp_scree() and dp_scree_plot() when method = "pmwm".
Arguments
- a, b
Finite lower and upper search bounds supplied to the private quantile routine. The private lower and upper clipping cutoffs are searched within this range. These values have no defaults because they should be chosen on the scale of squared principal component scores.
- trim_const
Positive number controlling the baseline clipping level in the practical clipping proportion. This value has no default.
- eta
Nonnegative number controlling the expected contamination level in the practical clipping proportion. This value has no default.
- beta
Positive number greater than
1defining the log-binning base used by the private quantile routine. The default is1.001.- split_mode
A logical value indicating whether to split the sample into quantile-estimation and mean-estimation subsets. The default is
TRUE.
Details
The PMWM method privately estimates lower and upper tail cutoffs, winsorizes the squared scores to those cutoffs, and then releases a noisy winsorized mean. It is based on the private modified winsorized mean of Ramsay and Spicker (2025) .
The implementation used here is an R adaptation of the publicly available Python implementation accompanying Ramsay and Spicker (2025) . The adaptation is used for scree estimation by applying the PMWM estimator to squared principal component scores.
The PMWM scree estimator uses additional control parameters for private
quantile estimation and winsorization. The parameter beta determines the
spacing of the geometric search grid used by the private quantile estimator
and must satisfy \(\beta > 1\). Smaller values of beta give a finer grid
but may increase computation.
The bounds a and b define the lower and upper search range supplied to the
private quantile routine. The private lower and upper winsorization cutoffs
are searched within this range. These bounds should be chosen on the scale of
the squared principal component scores.
The parameters trim_const and eta determine the practical clipping
proportion used by the modified winsorized mean. If \(n_q\) denotes the
number of observations used for private quantile estimation, the clipping
proportion is
$$
p = \min\left\{
\max\left(\frac{\mathrm{trim\_const}}{n_q}, \eta\right),
0.49
\right\}.
$$
Here, trim_const / n_q controls the baseline clipping level, while eta
gives a lower bound reflecting the expected contamination level.
If split_mode = TRUE, the sample is split into two parts: one part is used
for private quantile estimation and the other part is used for the winsorized
mean step. If split_mode = FALSE, all observations are used in both steps.
The parameters a, b, trim_const, and eta are intentionally not given
defaults. They are data- and robustness-dependent choices and should be set
deliberately by the user.
References
Ramsay K, Spicker D (2025). “Improved subsample-and-aggregate via the private modified winsorized mean.” Code available at https://github.com/12ramsake/PMWM, 2501.14095, https://arxiv.org/abs/2501.14095.
See also
dp_scree() for computing differentially private scree estimates using these
control options.
dp_scree_plot() for plotting scree estimates.