Skip to contents

A synthetic 20-dimensional Gaussian cluster dataset generated from multivariate normal distributions. It is used as an example for principal component analysis and differentially private PCA visualization.

Usage

gau

Format

A data frame with 5,000 rows and 20 numerical variables. The variables are named V1, V2, ..., V20.

Source

Generated by the authors of the package.

Details

The dataset contains 5,000 observations in 20 dimensions. It consists of five groups, with 1,000 observations in each group. The data were generated with seed 123. For each group \(g = 1, \ldots, 5\), observations were sampled independently from a 20-dimensional multivariate normal distribution \(N_{20}(\mu_g, \Sigma_g)\). The first two groups have covariance matrix \(I_{20}\), the third and fourth groups have covariance matrix \(5I_{20}\), and the fifth group has covariance matrix \(8I_{20}\). The group mean vectors differ across selected coordinate blocks, producing both separated and partially overlapping cluster structures.