Skip to contents

R Tools for Eurostat Open Data

This rOpenGov R package provides tools to access Eurostat database, which you can also browse on-line for the data sets and documentation. For contact information and source code, see the package website.

Installation

Release version (CRAN):

install.packages("eurostat")

Development version (Github):

library(remotes)
remotes::install_github("ropengov/eurostat")

Overall, the eurostat package includes the following functions:

check_access_to_data    Check access to ec.europe.eu
clean_eurostat_cache    Clean Eurostat Cache
cut_to_classes          Cuts the Values Column into Classes and
                        Polishes the Labels
dic_order               Order of Variable Levels from Eurostat
                        Dictionary.
eu_countries            Countries and Country Codes
eurostat-package        R Tools for Eurostat open data
eurostat_geodata_60_2016
                        Geospatial data of Europe from GISCO in 1:60
                        million scale from year 2016
eurotime2date           Date Conversion from Eurostat Time Format
eurotime2date2          Date Conversion from New Eurostat Time Format
eurotime2num            Conversion of Eurostat Time Format to Numeric
eurotime2num2           Conversion of Eurostat Time Format to Numeric
get_bibentry            Create A Data Bibliography
get_eurostat            Read Eurostat Data
get_eurostat_dic        Download Eurostat Dictionary
get_eurostat_geospatial
                        Download Geospatial Data from GISCO
get_eurostat_json       Get Data from Eurostat API in JSON
get_eurostat_raw        Download Data from Eurostat Database
get_eurostat_raw2       Download Data from Eurostat Dissemination API
get_eurostat_toc        Download Table of Contents of Eurostat Data
                        Sets
harmonize_country_code
                        Harmonize Country Code
label_eurostat          Get Eurostat Codes
label_eurostat2         Get Eurostat Codes for data downloaded from new
                        dissemination API
search_eurostat         Grep Datasets Titles from Eurostat
set_eurostat_cache_dir
                        Set Eurostat Cache
tgs00026                Auxiliary Data
evaluate <- curl::has_internet()

Finding data

Function get_eurostat_toc() downloads a table of contents of eurostat datasets. The values in column ‘code’ should be used to download a selected dataset.

# Load the package
library(eurostat)
# library(rvest)

# Get Eurostat data listing
toc <- get_eurostat_toc()

# Check the first items
library(knitr)
kable(tail(toc))
title code type last update of data last table structure change data start data end values
Persons living in households with very low work intensity by age and sex (population aged 0 to 64 years) ilc_lvhl11n dataset 07.07.2023 04.07.2023 2014 2022 NA
People living in households with very low work intensity by group of country of birth (population aged 18 to 64 years) ilc_lvhl16n dataset 07.07.2023 12.06.2023 2014 2022 NA
In-work at-risk-of-poverty rate by age and sex - EU-SILC survey ilc_iw01 dataset 27.06.2023 12.06.2023 2003 2022 NA
Severe housing deprivation rate by age, sex and poverty status - EU-SILC survey ilc_mdho06a dataset 09.09.2022 19.05.2021 2003 2020 NA
Overcrowding rate by age, sex and poverty status - total population - EU-SILC survey ilc_lvho05a dataset 07.07.2023 12.06.2023 2003 2022 NA
Housing cost overburden rate by age, sex and poverty status - EU-SILC survey ilc_lvho07a dataset 07.07.2023 12.06.2023 2003 2022 NA

Some of the data sets (e.g. in the ‘comext’ type) are not accessible through the standard interface. See the get_eurostat() function documentation for more details.

With search_eurostat() you can search the table of contents for particular patterns, e.g. all datasets related to passenger transport. The kable function to produces nice markdown output. Note that with the type argument of this function you could restrict the search to for instance datasets or tables.

# info about passengers
kable(head(search_eurostat("passenger transport")))
title code type last update of data last table structure change data start data end values
Air passenger transport enps_avia_pa dataset 13.03.2023 13.03.2023 2005 2021 NA
Modal split of air, sea and inland passenger transport tran_hv_ms_psmod dataset 29.06.2023 29.06.2023 2008 2021 NA
Modal split of inland passenger transport tran_hv_psmod dataset 29.06.2023 29.06.2023 1990 2021 NA
Volume of passenger transport relative to GDP tran_hv_pstra dataset 11.08.2023 29.06.2023 1990 2021 NA
Maritime passenger transport performed in the Exclusive Economic Zone (EEZ) of the countries mar_tp_pa dataset 25.07.2023 21.02.2023 2005 2021 NA
Air passenger transport by reporting country avia_paoc dataset 31.08.2023 31.08.2023 1993 2023Q2 NA

Codes for the dataset can be searched also from the Eurostat database. The Eurostat database gives codes in the Data Navigation Tree after every dataset in parenthesis.

Downloading data

The package supports two of the Eurostats download methods: the bulk download facility and the Web Services’ JSON API. The bulk download facility is the fastest method to download whole datasets. It is also often the only way as the JSON API has limitation of maximum 50 sub-indicators at a time and whole datasets usually exceeds that. To download only a small section of the dataset the JSON API is faster, as it allows to make a data selection before downloading.

A user does not usually have to bother with methods, as both are used via main function get_eurostat(). If only the table id is given, the whole table is downloaded from the bulk download facility. If also filters are defined the JSON API is used.

Here an example of indicator ‘Modal split of passenger transport’. This is the percentage share of each mode of transport in total inland transport, expressed in passenger-kilometres (pkm) based on transport by passenger cars, buses and coaches, and trains. All data should be based on movements on national territory, regardless of the nationality of the vehicle. However, the data collection is not harmonized at the EU level.

Pick and print the id of the data set to download:

# For the original data, see
# http://ec.europa.eu/eurostat/tgm/table.do?tab=table&init=1&plugin=1&language=en&pcode=tsdtr210
id <- search_eurostat("Modal split of passenger transport",
  type = "table"
)$code[1]
print(id)

[1] NA

Get the whole corresponding table. As the table is annual data, it is more convenient to use a numeric time variable than use the default date format:

dat <- get_eurostat(id, time_format = "num")

Investigate the structure of the downloaded data set:

str(dat)
kable(head(dat))

Or you can get only a part of the dataset by defining filters argument. It should be named list, where names corresponds to variable names (lower case) and values are vectors of codes corresponding desired series (upper case). For time variable, in addition to a time, also a sinceTimePeriod and a lastTimePeriod can be used.

dat2 <- get_eurostat(id, filters = list(geo = c("EU28", "FI"), lastTimePeriod = 1), time_format = "num")
kable(dat2)

Replacing codes with labels

By default variables are returned as Eurostat codes, but to get human-readable labels instead, use a type = "label" argument.

datl2 <- get_eurostat(id,
  filters = list(
    geo = c("EU28", "FI"),
    lastTimePeriod = 1
  ),
  type = "label", time_format = "num"
)
kable(head(datl2))

Eurostat codes in the downloaded data set can be replaced with human-readable labels from the Eurostat dictionaries with the label_eurostat() function.

datl <- label_eurostat(dat)
kable(head(datl))

The label_eurostat() allows conversion of individual variable vectors or variable names as well.

Vehicle information has 3 levels. You can check them now with:

levels(datl$vehicle)

Selecting and modifying data

EFTA, Eurozone, EU and EU candidate countries

To facilitate smooth visualization of standard European geographic areas, the package provides ready-made lists of the country codes used in the eurostat database for EFTA (efta_countries), Euro area (ea_countries), EU (eu_countries) and EU candidate countries (eu_candidate_countries). These can be used to select specific groups of countries for closer investigation. For conversions with other standard country coding systems, see the countrycode R package. To retrieve the country code list for EFTA, for instance, use:

data(efta_countries)
kable(efta_countries)

EU data from 2012 in all vehicles:

dat_eu12 <- subset(datl, geo == "European Union - 28 countries" & time == 2012)
kable(dat_eu12, row.names = FALSE)

EU data from 2000 - 2012 with vehicle types as variables:

Reshaping the data is best done with spread() in tidyr.

library("tidyr")
dat_eu_0012 <- subset(dat, geo == "EU28" & time %in% 2000:2012)
dat_eu_0012_wide <- spread(dat_eu_0012, vehicle, values)
kable(subset(dat_eu_0012_wide, select = -geo), row.names = FALSE)

Train passengers for selected EU countries in 2000 - 2012

dat_trains <- subset(datl, geo %in% c("Austria", "Belgium", "Finland", "Sweden") &
  time %in% 2000:2012 &
  vehicle == "Trains")
dat_trains_wide <- spread(dat_trains, geo, values)
kable(subset(dat_trains_wide, select = -vehicle), row.names = FALSE)

SDMX

Eurostat data is available also in the Statistical Data and Metadata eXchange (SDMX) Web Services. Our eurostat R package does not provide custom tools for this but the following generic R packages provide access to eurostat SDMX version:

Further examples

For further examples, see the package homepage.

NOTE: we recommend to check also the giscoR package (https://dieghernan.github.io/giscoR/). This is another API package that provides R tools for Eurostat geographic data to support geospatial analysis and visualization.

Citing the data sources

Eurostat data: cite Eurostat.

Administrative boundaries: cite EuroGeographics

Citing the eurostat R package

For main developers and contributors, see the package homepage.

This work can be freely used, modified and distributed under the BSD-2-clause (modified FreeBSD) license:

citation("eurostat")
## Kindly cite the eurostat R package as follows:
## 
##   (C) Leo Lahti, Janne Huovari, Markus Kainu, Przemyslaw Biecek.
##   Retrieval and analysis of Eurostat open data with the eurostat
##   package. R Journal 9(1):385-392, 2017. doi: 10.32614/RJ-2017-019
##   Package URL: http://ropengov.github.io/eurostat Article URL:
##   https://journal.r-project.org/archive/2017/RJ-2017-019/index.html
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {Retrieval and Analysis of Eurostat Open Data with the eurostat Package},
##     author = {Leo Lahti and Janne Huovari and Markus Kainu and Przemyslaw Biecek},
##     journal = {The R Journal},
##     volume = {9},
##     number = {1},
##     pages = {385--392},
##     year = {2017},
##     doi = {10.32614/RJ-2017-019},
##     url = {https://doi.org/10.32614/RJ-2017-019},
##   }

Contact

For contact information, see the package homepage.

Version info

This tutorial was created with

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] eurostat_3.8.3 knitr_1.43    
## 
## loaded via a namespace (and not attached):
##  [1] xfun_0.40          bslib_0.5.1        tzdb_0.4.0         vctrs_0.6.3       
##  [5] tools_4.3.1        ISOweek_0.6-2      generics_0.1.3     curl_5.0.2        
##  [9] parallel_4.3.1     tibble_3.2.1       proxy_0.4-27       fansi_1.0.4       
## [13] RefManageR_1.4.0   pkgconfig_2.0.3    KernSmooth_2.23-21 desc_1.4.2        
## [17] readxl_1.4.3       assertthat_0.2.1   lifecycle_1.0.3    compiler_4.3.1    
## [21] stringr_1.5.0      textshaping_0.3.6  htmltools_0.5.6    class_7.3-22      
## [25] sass_0.4.7         yaml_2.3.7         pillar_1.9.0       pkgdown_2.0.7     
## [29] crayon_1.5.2       jquerylib_0.1.4    tidyr_1.3.0        regions_0.1.8     
## [33] classInt_0.4-9     cachem_1.0.8       countrycode_1.5.0  tidyselect_1.2.0  
## [37] digest_0.6.33      stringi_1.7.12     dplyr_1.1.2        purrr_1.0.2       
## [41] bibtex_0.5.1       rprojroot_2.0.3    fastmap_1.1.1      here_1.0.1        
## [45] cli_3.6.1          magrittr_2.0.3     utf8_1.2.3         broom_1.0.5       
## [49] e1071_1.7-13       readr_2.1.4        backports_1.4.1    bit64_4.0.5       
## [53] lubridate_1.9.2    timechange_0.2.0   rmarkdown_2.24     httr_1.4.7        
## [57] bit_4.0.5          cellranger_1.1.0   ragg_1.2.5         hms_1.1.3         
## [61] memoise_2.0.1      evaluate_0.21      rlang_1.1.1        Rcpp_1.0.11       
## [65] glue_1.6.2         xml2_1.3.5         vroom_1.6.3        jsonlite_1.8.7    
## [69] R6_2.5.1           plyr_1.8.8         systemfonts_1.0.4  fs_1.6.3