Searching, downloading and manipulating Eurostat data with R
We have published a new package eurostat in CRAN. To install the package in R, use:
The eurostat package is based on the SmarterPoland package, which was revised and expanded with new functionality. The new eurostat package includes the following functions:
This blog post will walk you through the basic functionalities of the package. To reproduce the examples, run the following code to install the required dependencies.
In this exercise we use indicator tgs00026, (Disposable income of private households by NUTS 2 regions) from Eurostat. Most indicators in Eurostat database are of country-year -type, however, some indicators have data also at lower level of regional breakdown, as tgs00026 at NUTS2-level. (See more information on regional classification here).
Searching and downloading data from Eurostat
Though we have already picked the data for following demonstrations you can search the available variables using search_eurostat-function. Data for tables resides in Datasets that are in folders. You can specify in the function whether you want to look for table, dataset or a folder.
Or you can also download the whole table of contents of the database with get_eurostat_toc-function. In both cases the values in column code should be used to download a selected dataset.
Downloading and plotting time-series data at the NUTS2 regional level
However, we are interested in the disposable household income and first we download the data and convert the time column into numeric format.
Then we plot the data at the regional level and color the lines using country names derived from countrycode-package.
Labelling the data
Function label_eurostat provides a straighforward way to convert the codes in data into more meaningful labels. The label_eurostat function requires that the data is in the “default format” with no added columns. First, check unlabeled example data:
Labeling the data based on definitions from dictionary:
For clarity, we plot first four countries in alphabetical order in their own facets with region names.
Mapping the household incomes at NUTS2 level
In the following exercise we are plotting household income data from Eurostat on map from three different years. In addition to downloading and manipulating data from EUROSTAT, we will demonstrate how to access and use shapefiles of Europe published by EUROSTAT at Administrative units / Statistical units.
For this exercise you need few more dependencies that can be installed running the following script.
Downloading and manipulating the tabular data
First, we shall retrieve the nuts2-level figures of variable tgs00026 (Disposable income of private households by NUTS 2 regions) and filter out only the year 2011.
Second, we use cut_to_classes-function to classify our numeric values for the choropleth map.
Third, we use merge_eurostat_geospatial-function to download the shapefile at 1:60 million resolution from year 2010 and merge it with our Eurostat attribute data.
Plotting the maps using ggplot2
We are done! That was a small exercise on how to use the main functions in the eurostat-package. We hope you find the package useful! All suggestions, bug reports, and contributions are warmly welcome at: https://github.com/ropengov/eurostat. When using the packages, please cite accordingly: