Skip to contents

A numerical subset of the Adult dataset from the UCI Machine Learning Repository. The original Adult dataset is based on data extracted from the 1994 United States Census database.

Usage

adult

Format

A data frame with 32,561 rows and 5 columns:

age

Age of the individual.

education_num

Number of years of education.

capital_gain

Capital gain.

capital_loss

Capital loss.

hours_per_week

Number of working hours per week.

Source

UCI Machine Learning Repository, Adult dataset. The Adult dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Details

This package dataset retains five numerical variables: age, education_num, capital_gain, capital_loss, and hours_per_week. These selected numerical variables contain no missing values in the original data file used here. The resulting dataset contains 32,561 observations.

Since the variables have substantially different units and scales, scaling is recommended before applying PCA-based methods.

References

Becker B, Kohavi R (1996). “Adult.” UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5XW20, https://archive.ics.uci.edu/dataset/2/adult.