Function reference
Importing
Survey data, i.e., data derived from questionnaires or systematic data collection, such as inspecting objects in nature, recording prices at shops are usually stored databases, and converted to complex files retaining at least coding, labelling metadata together with the data. This must be imported to R so that the appropriate harmonization tasks can be carried out with the appropriate R types.
-
read_surveys()
read_survey()
- Read survey file(s)
-
read_rds()
- Read rds file
-
read_spss()
- Read SPSS (`.sav`, `.zsav`, `.por`) files. Write `.sav` and `.zsav` files.
-
read_dta()
- Read Stata DTA files (`.dta`) files
-
read_csv()
- Read csv file
-
pull_survey()
- Pull a survey from a survey list
Harmonizing concepts with metadata
After importing data with some descriptive metadata such as numerical coding and labelling, we need to create a map of the information that is in our R session to prepare a harmonization plan. We must find information related to sufficiently similar concepts that can be harmonized to be successfully joined into a single variable, and eventually a table of similar variables must be joined.
-
metadata_create()
metadata_waves_create()
- Create a metadata table from several surveys
-
metadata_survey_create()
- Create a metadata table
-
retroharmonize
- retroharmonize: Retrospective harmonization of survey data files
Crosswalk
Laying out the harmonization crosswalk scheme (unifying variable names, codes, labels.) See the vignette Working with a Crosswalk Table for examples and further clarification.
-
crosswalk_table_create()
is.crosswalk_table()
- Create a crosswalk table
-
crosswalk_surveys()
crosswalk()
- Crosswalk surveys
Subsetting
Remove variables that cannot be harmonized in your workflow either in memory (faster for smaller tasks) or sequentially from files. See the vignette Working with a Crosswalk Table for examples and further clarification.
-
subset_surveys()
subset_waves()
subset_save_surveys()
- Subset surveys
Harmonize variable names
Before joining variables containing responses about the same concept, make sure that they have identical names in the re-processed surveys. See the vignette Working with a Crosswalk Table for examples and further clarification.
-
harmonize_var_names()
- Harmonize the variable names of surveys
-
label_normalize()
var_label_normalize()
val_label_normalize()
- Normalize value and variable labels
-
harmonize_survey_variables()
- Harmonize survey variables
Harmonize numerical codes and labels
To merge variables from different surveys into a single variable, you must make sure that the numerical codes and labels, for example 0=‘no’ and 1=‘yes’ are processed identically. See the vignette Harmonize Value Labels for examples and further clarification.
-
collect_val_labels()
collect_na_labels()
- Collect labels from metadata file
-
harmonize_values()
- Harmonize the values and labels of labelled vectors
-
harmonize_survey_values()
harmonize_waves()
- Harmonize values in surveys
-
merge_surveys()
merge_waves()
- Merge surveys
Harmonize missing and special cases
Some variable codes have a special meaning, such as a various labels of missing values which need to be converted differently to numeric, factor or character representation. See the vignette Harmonize Value Labels for examples and further clarification.
-
collect_val_labels()
collect_na_labels()
- Collect labels from metadata file
-
na_range_to_values()
is.na_range_to_values()
- Harmonize user-defined missing value ranges
-
harmonize_na_values()
- Harmonize na_values in haven_labelled_spss
-
document_survey_item()
- Document survey item harmonization
-
document_surveys()
document_waves()
- Document survey lists
-
create_codebook()
codebook_waves_create()
codebook_surveys_create()
- Create a codebook
Type conversion
Consistently treat labels, missing value ranges, missing value labels imported from SPSS, STATA or other source to use R language statistical functions, which mainly work with the base class of numeric or factor. For data visualization, the base class character may be preferred. See vignette The labelled_spss_survey class for further information.
-
survey()
is.survey()
summary(<survey>)
- Create a survey data frame
-
labelled_spss_survey()
as_character()
is.labelled_spss_survey()
as_numeric()
- Labelled vectors for multiple SPSS surveys
-
as_labelled_spss_survey()
- Labelled to labelled_spss_survey
-
concatenate()
- Concatenate haven_labelled_spss vectors
-
as_factor()
- Convert labelled_spss_survey vector To Factor