Subset surveys — subset_surveys • retroharmonize

This is a wrapper function for various procedures to reduce the size of surveys by removing variables that are not harmonized.

Usage

subset_surveys(
  survey_list,
  survey_paths = NULL,
  rowid = "rowid",
  subset_name = "subset",
  subset_vars = NULL,
  crosswalk_table = NULL,
  import_path = NULL,
  export_path = NULL
)

subset_waves(waves, subset_vars = NULL)

subset_save_surveys(
  crosswalk_table,
  subset_name = "subset",
  survey_list = NULL,
  survey_paths = NULL,
  import_path = NULL,
  export_path = NULL
)

Arguments

survey_list: A list of surveys imported with read_surveys. If set to NULL, the survey_path should give full path to the surveys.
survey_paths: A vector of full file paths to the surveys to subset.
rowid: The unique row (observation) identifier in the files. Defaults to "rowid", which is the default of the importing functions in this package.
subset_name: An identifier for the survey subset.
subset_vars: The names of the variables that should be kept from all surveys in the list that contains the wave of surveys. Defaults to NULL in which case it returns all variables without subsetting.
crosswalk_table: A crosswalk table created by crosswalk_table_create or a manually created crosstable including at least filename, var_name_orig, var_name_target and optionally var_label_orig and var_label_target. This parameter is optional and defaults to NULL.
waves: A list of surveys imported with read_surveys.

Value

A list of surveys or save individual rds files on the export_path.

Details

This function allows several workflows. Subsetting can be based on a vector of variable names given by survey_path, or on the basis of a crosstable. The subset_save_surveys can be called directly.

subset_surveys will also harmonize the variable names if the var_name_target is optionally defined in the crosswalk_table input. harmonize_survey_variables is a wrapper and will require that the new (target) variable names are present in a valid crosstable.

Examples

examples_dir <- system.file("examples", package = "retroharmonize")
survey_list <- dir(examples_dir)[grepl("\\.rds", dir(examples_dir))]

example_surveys <- read_surveys(
  file.path( examples_dir, survey_list)
  )
  
subset_surveys(survey_list = example_surveys, 
              subset_vars = c("rowid", "isocntry", "qa10_1", "qa14_1"), 
              subset_name = "subset_example")
#> [[1]]
#> # A tibble: 35 × 3
#>    rowid     isocntry                qa10_1
#>    <chr>     <chr>                <dbl+lbl>
#>  1 ZA5913_1  NL       2 [Tend not to trust]
#>  2 ZA5913_2  NL       2 [Tend not to trust]
#>  3 ZA5913_3  NL       3 (NA) [DK]          
#>  4 ZA5913_4  NL       1 [Tend to trust]    
#>  5 ZA5913_5  NL       1 [Tend to trust]    
#>  6 ZA5913_6  NL       1 [Tend to trust]    
#>  7 ZA5913_7  NL       1 [Tend to trust]    
#>  8 ZA5913_8  NL       1 [Tend to trust]    
#>  9 ZA5913_9  NL       2 [Tend not to trust]
#> 10 ZA5913_10 NL       2 [Tend not to trust]
#> # … with 25 more rows
#> 
#> [[2]]
#> # A tibble: 50 × 3
#>    rowid     isocntry                qa14_1
#>    <chr>     <chr>                <dbl+lbl>
#>  1 ZA6863_1  NL       3 [DK]               
#>  2 ZA6863_2  NL       1 [Tend to trust]    
#>  3 ZA6863_3  NL       3 [DK]               
#>  4 ZA6863_4  NL       1 [Tend to trust]    
#>  5 ZA6863_5  NL       1 [Tend to trust]    
#>  6 ZA6863_6  NL       2 [Tend not to trust]
#>  7 ZA6863_7  NL       1 [Tend to trust]    
#>  8 ZA6863_8  NL       3 [DK]               
#>  9 ZA6863_9  NL       1 [Tend to trust]    
#> 10 ZA6863_10 NL       1 [Tend to trust]    
#> # … with 40 more rows
#> 
#> [[3]]
#> # A tibble: 45 × 3
#>    rowid     isocntry                qa14_1
#>    <chr>     <chr>                <dbl+lbl>
#>  1 ZA7576_1  ES       2 [Tend not to trust]
#>  2 ZA7576_2  NL       1 [Tend to trust]    
#>  3 ZA7576_3  NL       1 [Tend to trust]    
#>  4 ZA7576_4  NL       2 [Tend not to trust]
#>  5 ZA7576_5  NL       1 [Tend to trust]    
#>  6 ZA7576_6  NL       1 [Tend to trust]    
#>  7 ZA7576_7  NL       1 [Tend to trust]    
#>  8 ZA7576_8  NL       3 [DK]               
#>  9 ZA7576_9  NL       2 [Tend not to trust]
#> 10 ZA7576_10 NL       2 [Tend not to trust]
#> # … with 35 more rows
#>