This is a wrapper function for various procedures to reduce the size of surveys by removing variables that are not harmonized.
Usage
subset_surveys(
survey_list,
survey_paths = NULL,
rowid = "rowid",
subset_name = "subset",
subset_vars = NULL,
crosswalk_table = NULL,
import_path = NULL,
export_path = NULL
)
subset_waves(waves, subset_vars = NULL)
subset_save_surveys(
crosswalk_table,
subset_name = "subset",
survey_list = NULL,
survey_paths = NULL,
import_path = NULL,
export_path = NULL
)Arguments
- survey_list
A list of surveys imported with
read_surveys. If set toNULL, thesurvey_pathshould give full path to the surveys.- survey_paths
A vector of full file paths to the surveys to subset.
- rowid
The unique row (observation) identifier in the files. Defaults to
"rowid", which is the default of the importing functions in this package.- subset_name
An identifier for the survey subset.
- subset_vars
The names of the variables that should be kept from all surveys in the list that contains the wave of surveys. Defaults to
NULLin which case it returns all variables without subsetting.- crosswalk_table
A crosswalk table created by
crosswalk_table_createor a manually created crosstable including at leastfilename,var_name_orig,var_name_targetand optionallyvar_label_origandvar_label_target. This parameter is optional and defaults toNULL.- waves
A list of surveys imported with
read_surveys.
Details
This function allows several workflows.
Subsetting can be based on a vector of variable names
given by survey_path, or on the basis of a crosstable.
The subset_save_surveys can be called directly.
subset_surveys will also harmonize the variable names if the var_name_target is
optionally defined in the crosswalk_table input.
harmonize_survey_variables is a wrapper and will require that the new (target) variable names are
present in a valid crosstable.
Examples
examples_dir <- system.file("examples", package = "retroharmonize")
survey_list <- dir(examples_dir)[grepl("\\.rds", dir(examples_dir))]
example_surveys <- read_surveys(
file.path( examples_dir, survey_list)
)
subset_surveys(survey_list = example_surveys,
subset_vars = c("rowid", "isocntry", "qa10_1", "qa14_1"),
subset_name = "subset_example")
#> [[1]]
#> # A tibble: 35 × 3
#> rowid isocntry qa10_1
#> <chr> <chr> <dbl+lbl>
#> 1 ZA5913_1 NL 2 [Tend not to trust]
#> 2 ZA5913_2 NL 2 [Tend not to trust]
#> 3 ZA5913_3 NL 3 (NA) [DK]
#> 4 ZA5913_4 NL 1 [Tend to trust]
#> 5 ZA5913_5 NL 1 [Tend to trust]
#> 6 ZA5913_6 NL 1 [Tend to trust]
#> 7 ZA5913_7 NL 1 [Tend to trust]
#> 8 ZA5913_8 NL 1 [Tend to trust]
#> 9 ZA5913_9 NL 2 [Tend not to trust]
#> 10 ZA5913_10 NL 2 [Tend not to trust]
#> # … with 25 more rows
#>
#> [[2]]
#> # A tibble: 50 × 3
#> rowid isocntry qa14_1
#> <chr> <chr> <dbl+lbl>
#> 1 ZA6863_1 NL 3 [DK]
#> 2 ZA6863_2 NL 1 [Tend to trust]
#> 3 ZA6863_3 NL 3 [DK]
#> 4 ZA6863_4 NL 1 [Tend to trust]
#> 5 ZA6863_5 NL 1 [Tend to trust]
#> 6 ZA6863_6 NL 2 [Tend not to trust]
#> 7 ZA6863_7 NL 1 [Tend to trust]
#> 8 ZA6863_8 NL 3 [DK]
#> 9 ZA6863_9 NL 1 [Tend to trust]
#> 10 ZA6863_10 NL 1 [Tend to trust]
#> # … with 40 more rows
#>
#> [[3]]
#> # A tibble: 45 × 3
#> rowid isocntry qa14_1
#> <chr> <chr> <dbl+lbl>
#> 1 ZA7576_1 ES 2 [Tend not to trust]
#> 2 ZA7576_2 NL 1 [Tend to trust]
#> 3 ZA7576_3 NL 1 [Tend to trust]
#> 4 ZA7576_4 NL 2 [Tend not to trust]
#> 5 ZA7576_5 NL 1 [Tend to trust]
#> 6 ZA7576_6 NL 1 [Tend to trust]
#> 7 ZA7576_7 NL 1 [Tend to trust]
#> 8 ZA7576_8 NL 3 [DK]
#> 9 ZA7576_9 NL 2 [Tend not to trust]
#> 10 ZA7576_10 NL 2 [Tend not to trust]
#> # … with 35 more rows
#>
