This is a wrapper function for various procedures to reduce the size of surveys by removing variables that are not harmonized.
Usage
subset_surveys(
survey_list,
survey_paths = NULL,
rowid = "rowid",
subset_name = "subset",
subset_vars = NULL,
crosswalk_table = NULL,
import_path = NULL,
export_path = NULL
)
subset_waves(waves, subset_vars = NULL)
subset_save_surveys(
crosswalk_table,
subset_name = "subset",
survey_list = NULL,
survey_paths = NULL,
import_path = NULL,
export_path = NULL
)
Arguments
- survey_list
A list of surveys imported with
read_surveys
. If set toNULL
, thesurvey_path
should give full path to the surveys.- survey_paths
A vector of full file paths to the surveys to subset.
- rowid
The unique row (observation) identifier in the files. Defaults to
"rowid"
, which is the default of the importing functions in this package.- subset_name
An identifier for the survey subset.
- subset_vars
The names of the variables that should be kept from all surveys in the list that contains the wave of surveys. Defaults to
NULL
in which case it returns all variables without subsetting.- crosswalk_table
A crosswalk table created by
crosswalk_table_create
or a manually created crosstable including at leastfilename
,var_name_orig
,var_name_target
and optionallyvar_label_orig
andvar_label_target
. This parameter is optional and defaults toNULL
.- waves
A list of surveys imported with
read_surveys
.
Details
This function allows several workflows.
Subsetting can be based on a vector of variable names
given by survey_path
, or on the basis of a crosstable
.
The subset_save_surveys
can be called directly.
subset_surveys
will also harmonize the variable names if the var_name_target
is
optionally defined in the crosswalk_table
input.
harmonize_survey_variables
is a wrapper and will require that the new (target) variable names are
present in a valid crosstable
.
Examples
examples_dir <- system.file("examples", package = "retroharmonize")
survey_list <- dir(examples_dir)[grepl("\\.rds", dir(examples_dir))]
example_surveys <- read_surveys(
file.path( examples_dir, survey_list)
)
subset_surveys(survey_list = example_surveys,
subset_vars = c("rowid", "isocntry", "qa10_1", "qa14_1"),
subset_name = "subset_example")
#> [[1]]
#> # A tibble: 35 × 3
#> rowid isocntry qa10_1
#> <chr> <chr> <dbl+lbl>
#> 1 ZA5913_1 NL 2 [Tend not to trust]
#> 2 ZA5913_2 NL 2 [Tend not to trust]
#> 3 ZA5913_3 NL 3 (NA) [DK]
#> 4 ZA5913_4 NL 1 [Tend to trust]
#> 5 ZA5913_5 NL 1 [Tend to trust]
#> 6 ZA5913_6 NL 1 [Tend to trust]
#> 7 ZA5913_7 NL 1 [Tend to trust]
#> 8 ZA5913_8 NL 1 [Tend to trust]
#> 9 ZA5913_9 NL 2 [Tend not to trust]
#> 10 ZA5913_10 NL 2 [Tend not to trust]
#> # … with 25 more rows
#>
#> [[2]]
#> # A tibble: 50 × 3
#> rowid isocntry qa14_1
#> <chr> <chr> <dbl+lbl>
#> 1 ZA6863_1 NL 3 [DK]
#> 2 ZA6863_2 NL 1 [Tend to trust]
#> 3 ZA6863_3 NL 3 [DK]
#> 4 ZA6863_4 NL 1 [Tend to trust]
#> 5 ZA6863_5 NL 1 [Tend to trust]
#> 6 ZA6863_6 NL 2 [Tend not to trust]
#> 7 ZA6863_7 NL 1 [Tend to trust]
#> 8 ZA6863_8 NL 3 [DK]
#> 9 ZA6863_9 NL 1 [Tend to trust]
#> 10 ZA6863_10 NL 1 [Tend to trust]
#> # … with 40 more rows
#>
#> [[3]]
#> # A tibble: 45 × 3
#> rowid isocntry qa14_1
#> <chr> <chr> <dbl+lbl>
#> 1 ZA7576_1 ES 2 [Tend not to trust]
#> 2 ZA7576_2 NL 1 [Tend to trust]
#> 3 ZA7576_3 NL 1 [Tend to trust]
#> 4 ZA7576_4 NL 2 [Tend not to trust]
#> 5 ZA7576_5 NL 1 [Tend to trust]
#> 6 ZA7576_6 NL 1 [Tend to trust]
#> 7 ZA7576_7 NL 1 [Tend to trust]
#> 8 ZA7576_8 NL 3 [DK]
#> 9 ZA7576_9 NL 2 [Tend not to trust]
#> 10 ZA7576_10 NL 2 [Tend not to trust]
#> # … with 35 more rows
#>