Skip to contents


Survey data, i.e., data derived from questionnaires or systematic data collection, such as inspecting objects in nature, recording prices at shops are usually stored databases, and converted to complex files retaining at least coding, labelling metadata together with the data. This must be imported to R so that the appropriate harmonization tasks can be carried out with the appropriate R types.

read_surveys() read_survey()
Read survey file(s)
Read rds file
Read SPSS (`.sav`, `.zsav`, `.por`) files. Write `.sav` and `.zsav` files.
Read Stata DTA files (`.dta`) files
Read csv file
Pull a survey from a survey list

Harmonizing concepts with metadata

After importing data with some descriptive metadata such as numerical coding and labelling, we need to create a map of the information that is in our R session to prepare a harmonization plan. We must find information related to sufficiently similar concepts that can be harmonized to be successfully joined into a single variable, and eventually a table of similar variables must be joined.

metadata_create() metadata_waves_create()
Create a metadata table from several surveys
Create a metadata table
retroharmonize: Retrospective harmonization of survey data files


Laying out the harmonization crosswalk scheme (unifying variable names, codes, labels.) See the vignette Working with a Crosswalk Table for examples and further clarification.


Remove variables that cannot be harmonized in your workflow either in memory (faster for smaller tasks) or sequentially from files. See the vignette Working with a Crosswalk Table for examples and further clarification.

Harmonize variable names

Before joining variables containing responses about the same concept, make sure that they have identical names in the re-processed surveys. See the vignette Working with a Crosswalk Table for examples and further clarification.

Harmonize the variable names of surveys
label_normalize() var_label_normalize() val_label_normalize()
Normalize value and variable labels
Harmonize survey variables

Harmonize numerical codes and labels

To merge variables from different surveys into a single variable, you must make sure that the numerical codes and labels, for example 0=‘no’ and 1=‘yes’ are processed identically. See the vignette Harmonize Value Labels for examples and further clarification.

collect_val_labels() collect_na_labels()
Collect labels from metadata file
Harmonize the values and labels of labelled vectors
harmonize_survey_values() harmonize_waves()
Harmonize values in surveys
merge_surveys() merge_waves()
Merge surveys

Harmonize missing and special cases

Some variable codes have a special meaning, such as a various labels of missing values which need to be converted differently to numeric, factor or character representation. See the vignette Harmonize Value Labels for examples and further clarification.

collect_val_labels() collect_na_labels()
Collect labels from metadata file
na_range_to_values() is.na_range_to_values()
Harmonize user-defined missing value ranges
Harmonize na_values in haven_labelled_spss

Documentation functions

These functionality requires a thorough review.

Document survey item harmonization
document_surveys() document_waves()
Document survey lists
create_codebook() codebook_waves_create() codebook_surveys_create()
Create a codebook

Type conversion

Consistently treat labels, missing value ranges, missing value labels imported from SPSS, STATA or other source to use R language statistical functions, which mainly work with the base class of numeric or factor. For data visualization, the base class character may be preferred. See vignette The labelled_spss_survey class for further information.

survey() is.survey() summary(<survey>)
Create a survey data frame
labelled_spss_survey() as_character() is.labelled_spss_survey() as_numeric()
Labelled vectors for multiple SPSS surveys
Labelled to labelled_spss_survey
Concatenate haven_labelled_spss vectors
Convert labelled_spss_survey vector To Factor