vignettes/website/regional_data.Rmd
regional_data.Rmd
This rOpenGov R package provides tools to access Eurostat database, which you can also browse on-line for the data sets and documentation. For contact information and source code, see the package website.
See eurostat vignette for installation and basic use.
Working with regional data has many advantages and many challenges. I had three aims when creating this article:
This work has some similarities with my other eurostat package related extension released on CRAN and rOpenGov, iotables. The iotables packages deals with national accounts data, where the use of Eurostat’s metadata system requires domain-specific knowledge, and where the correct joining of different tables based on this knowledge, coded in the metadata, is critical to make the downloaded tables work. Similarly, a domain-specific knowledge of regional metadata is necessary to put the regional data into tidy data pipelines, or even to put them on a map (see article on using maps).
The advantage over national data is lies in the homogeneity in units, and the larger number of units, which enables us to better understand social and economic differences. National boundaries, i.e. NUTS0 regions, are historical and political constructions. They greatly vary in size and complexity. Within the EU, Germany and Malta are equally NUTS0 regions or countries, although Malta’s size would make it a small NUTS3 region in Germany. Comparing Germany with Malta hides a huge diversity within Germany.
Statistical regions are largely homogeneous in size and in urban complexity. The smallest, NUTS3 region are cities, or towns with their rural hinterland; it can be expected that most people go to school or work within this region. Malta itself is the size of a NUTS3 region, so it could be compared with the NUTS3 regions of Germany the most meaningfully. NUTS1 units are usually provinces of larger countries, such as Bavaria in Germany. NUTS2 units comprise of (usually) several NUTS3 units within a NUTS1 large region.
The smallest member states are the size of NUTS2 and NUTS3 regions and can be best compared with all the similar sized regions of Europe. Bit larger member states like Slovakia are NUTS1 regions, and they can be best compared with all NUTS1 regions of Europe: Bavaria and Slovakia make a more meaningful comparison in many cases than Germany and Slovakia. There are several difficulties with working on sub-national level of data.These are related to data availability, changes in boundaries, and data & metadata quality.
Changes in boundaries meant that unlike national boundaries, regional boundaries change very often. Since standardizing the NUTS regions in 2003 with the EU, boundary changes were made on average every three years. Boundary changes make organizing data panels (which are several time instances of the cross section regional data) very tedious.
You can review the NUTS change history on the Eurostat website.
Data availability means that many statistical produces are only available on NUTS0 country level. The creation of NUTS1-NUTS3 statistics is usually slow and the data product range is narrower at these levels.
NUTS-level data is often disaggregated with the use of various estimations from higher levels. While some original data sources are available from NUTS3 levels (or even higher geographical resolution data, i.e. lower level of aggregation level), such as population or mortality data, many economic activities are theoretically difficult to be connected to one place and geographical disaggregation is only estimated. For example, since the GDP is mainly produced in companies, and many companies work in several locations across municipal and regional borders, locating their contribution to the GDP is the result of a more or less precise estimation.
Pan-European surveys are very important data sources for many social data products, but they are often created with the use of nationally representative samples. Even if they contain regional coding, and they can be re-arranged into regional statistics, the results are of lower quality, as the original survey sample is not representative to each and every NUTS2 or NUTS3 region of Germany, for example. (Of course, since Malta is a NUTS2 region, survey data from Malta is representative on NUTS2 = NUTS1 = NUTS0 level.) Practically this means that many statistical products of Eurostat are mixed products, i.e. they contain NUTS1 level data for larger member states, such as Germany, France or Italy, and they contain NUTS2 level data for other member states.
One problem of Eurostat’s data products is that they have no legal mandate to force national statistical offices to create consistent datasets. Sometimes data ‘goes missing’ because the national statistical offices, which is responsible for the quality and validity of the data, does not recode the historical data with new geographic label definitions.
And at last, the metadata quality of Eurostat’s products is not as good as on NUTS0 national level. A particularly problematic issue is that Eurostat’s tables do not differentiate between the current NUTS2016 regional boundaries and the NUTS2013 or NUTS2010 boundaries. Some data tables contain rows that cannot and must not be compared. For example, France went under a very thorough change in its regional boundaries, meaning that NUTS2013 regional data from 2013 can only be compared in the case of a very small fraction of the country with NUTS2016 data from 2016 or 2018.
We programatically coded the NUTS2013-2016 changes into the new functions presented in this article. You can download the correspondence table in Excel or review it with data(nuts_correspondence)
. Whenever we found examples of the use NUTS2010 data (in Slovenia and Greece), we treated them as exception in the functions.
# download to a temporary file
tf <- tempfile(fileext = ".xlsx")
download.file(url = 'https://ec.europa.eu/eurostat/documents/345175/629341/NUTS2013-NUTS2016.xlsx', destfile = tf, mode = 'wb' )
The correspondence tables themselves are not tidy, and they are in several sheets which are not fully consistent. In the 2013-2016 table the French region FR7
or Centre-Est is marked as discontinued
in the sheet Correspondence NUTS-1
and at the same time as relabelled and recoded
to FRK
, or Auvergne-Rhône-Alpes. We believe that the latter case is correct and use only this row in the correspondence table to avoid duplications in joining.
Furthermore, Eurostat has a very problematic practice with simply removing statistical products when metadata definitions change. So, you may have downloaded industry-level data with the NACE Rev2 definition or French regional data with the NUTS 2013 definition, but under the same title, you will be downloading a differently defined dataset in 2020. Or, you will not be able to reproduce your code, because they will remove the data with your earlier definition. While it is clear that Eurostat cannot take care of boundary changes if the responsible national statistical offices fail to do this, removing the history of data products makes the validation of professional and academic work made with such data impossible in some cases.
The logical workflow is the following:
It is very important that data missingness is often caused by incorrect joining by wrong metadata labels. In a limited number of further cases, the missing data is functionally dependent of other data. In these cases general imputation methods give misleading or plain wrong imputation results. You must get the metadata right to make valid imputation on missing data, or to join several data tables meaningfully (and successfully) together. So the data imputation should be the last step.
Most regional statistical products are made on the NUTS2 level, or they are mixed NUTS1-NUTS2 level statistics. This means that when you open a Eurostat data table, some rows refer to NUTS1 regions and others to NUTS2 regions, or even you find all NUTS0-NUTS3 level data in the same table. And sometimes not.
The power of statistical analysis can be increased when you order such data into panels, because the different change in a time interval in this huge cross-section contains usually a lot more information about the underlying social or economic process. However, organizing panels – or just simple time series of an individual region – is often hindered by changes in regional boundaries.
Usually you have 150-300 units to compare, which is gives an unprecedented richness in cross-sectional analysis. Most US or Australian datasets are not so detailed in cross-section, and data availability in the rest of the world is just lower. But joining this data with spatial maps or other data is challenging because the data tables are not consistently made, and often their titles or description is misleading, for example, the description claims that you will get NUTS2 level data, but in reality you get an assortment of all level data.
A simple strategy is to create a panel of only those data that do not change boundaries. However, if you have many variables, this leads very quickly to a huge loss in data, because missing data is often independent from boundary changes. With the addition of each new variable you are likely to loose new and new rows of observations when you keep only complete cases.
Keeping track of the changes is a much better strategy, and up to a point, it is a costless in the amount of work, because often only the metadata is changing, so, in fact, the data itself is not missing, just it is labelled inconsistently. Member states, when they change two regions’ boundary only, will nevertheless create new regional codes for all their regions, to make sure that regional labels do not mix. However, Eurostat is not following this practice well, and it does mixes up different labels.
With the new helper function harmonize_geo_code()
you can see if your geo label codes are affected by these changes, and you get a first view on how you can continue your work.
eurostat::tgs00026 %>%
filter ( time == 2012 ) %>%
harmonize_geo_code()
## In this data frame 238 observations are coded with the current NUTS2016
## geo labels and 276 observations/rows have NUTS2013 historical labels.
## Not checking for regional label consistency in non-EU countries.
## In this data frame not controlled countries: NO
## with altogether 7 observations/rows.
## geo time values code13 code16
## 1 AT11 2012 20900 AT11 AT11
## 2 AT12 2012 21800 AT12 AT12
## 3 AT13 2012 21000 AT13 AT13
## 4 AT21 2012 20200 AT21 AT21
## 5 AT22 2012 20500 AT22 AT22
## 6 AT31 2012 21000 AT31 AT31
## 7 AT32 2012 21300 AT32 AT32
## 8 AT33 2012 20400 AT33 AT33
## 9 AT34 2012 21800 AT34 AT34
## 10 BE10 2012 15600 BE10 BE10
## 11 BE21 2012 17900 BE21 BE21
## 12 BE22 2012 17100 BE22 BE22
## 13 BE23 2012 18600 BE23 BE23
## 14 BE24 2012 20100 BE24 BE24
## 15 BE25 2012 18100 BE25 BE25
## 16 BE31 2012 18900 BE31 BE31
## 17 BE32 2012 15000 BE32 BE32
## 18 BE33 2012 15400 BE33 BE33
## 19 BE34 2012 15700 BE34 BE34
## 20 BE35 2012 15800 BE35 BE35
## 21 BG31 2012 4700 BG31 BG31
## 22 BG32 2012 5400 BG32 BG32
## 23 BG33 2012 5800 BG33 BG33
## 24 BG34 2012 5800 BG34 BG34
## 25 BG41 2012 9100 BG41 BG41
## 26 BG42 2012 5600 BG42 BG42
## 27 CY00 2012 15500 CY00 CY00
## 28 CZ01 2012 14500 CZ01 CZ01
## 29 CZ02 2012 12200 CZ02 CZ02
## 30 CZ03 2012 10800 CZ03 CZ03
## 31 CZ04 2012 9700 CZ04 CZ04
## 32 CZ05 2012 10500 CZ05 CZ05
## 33 CZ06 2012 10800 CZ06 CZ06
## 34 CZ07 2012 10200 CZ07 CZ07
## 35 CZ08 2012 10000 CZ08 CZ08
## 36 DE11 2012 22800 DE11 DE11
## 37 DE12 2012 21500 DE12 DE12
## 38 DE13 2012 21500 DE13 DE13
## 39 DE14 2012 21800 DE14 DE14
## 40 DE21 2012 24500 DE21 DE21
## 41 DE22 2012 20200 DE22 DE22
## 42 DE23 2012 20100 DE23 DE23
## 43 DE24 2012 21000 DE24 DE24
## 44 DE25 2012 22100 DE25 DE25
## 45 DE26 2012 21100 DE26 DE26
## 46 DE27 2012 21800 DE27 DE27
## 47 DE30 2012 18100 DE30 DE30
## 48 DE40 2012 17500 DE40 DE40
## 49 DE50 2012 19400 DE50 DE50
## 50 DE60 2012 22600 DE60 DE60
## 51 DE71 2012 21800 DE71 DE71
## 52 DE72 2012 19500 DE72 DE72
## 53 DE73 2012 19300 DE73 DE73
## 54 DE80 2012 16500 DE80 DE80
## 55 DE91 2012 19500 DE91 DE91
## 56 DE92 2012 19500 DE92 DE92
## 57 DE93 2012 20000 DE93 DE93
## 58 DE94 2012 18700 DE94 DE94
## 59 DEA1 2012 20200 DEA1 DEA1
## 60 DEA2 2012 20200 DEA2 DEA2
## 61 DEA3 2012 19000 DEA3 DEA3
## 62 DEA4 2012 20400 DEA4 DEA4
## 63 DEA5 2012 20000 DEA5 DEA5
## 64 DEB1 2012 20500 DEB1 DEB1
## 65 DEB2 2012 19900 DEB2 DEB2
## 66 DEB3 2012 20500 DEB3 DEB3
## 67 DEC0 2012 18800 DEC0 DEC0
## 68 DED2 2012 17400 DED2 DED2
## 69 DED4 2012 17400 DED4 DED4
## 70 DED5 2012 17400 DED5 DED5
## 71 DEE0 2012 16800 DEE0 DEE0
## 72 DEF0 2012 20400 DEF0 DEF0
## 73 DEG0 2012 17000 DEG0 DEG0
## 74 DK01 2012 15100 DK01 DK01
## 75 DK02 2012 14100 DK02 DK02
## 76 DK03 2012 13800 DK03 DK03
## 77 DK04 2012 13900 DK04 DK04
## 78 DK05 2012 13900 DK05 DK05
## 79 EE00 2012 9400 EE00 EE00
## 80 EL30 2012 13400 EL30 EL30
## 81 EL41 2012 11900 EL41 EL41
## 82 EL42 2012 11900 EL42 EL42
## 83 EL43 2012 9600 EL43 EL43
## 84 EL51 2012 10100 EL51 EL51
## 85 EL52 2012 11200 EL52 EL52
## 86 EL53 2012 11800 EL53 EL53
## 87 EL54 2012 11400 EL54 EL54
## 88 EL61 2012 11000 EL61 EL61
## 89 EL62 2012 12000 EL62 EL62
## 90 EL63 2012 9900 EL63 EL63
## 91 EL64 2012 9700 EL64 EL64
## 92 EL65 2012 10900 EL65 EL65
## 93 ES11 2012 12700 ES11 ES11
## 94 ES12 2012 14300 ES12 ES12
## 95 ES13 2012 13400 ES13 ES13
## 96 ES21 2012 18400 ES21 ES21
## 97 ES22 2012 17300 ES22 ES22
## 98 ES23 2012 14300 ES23 ES23
## 99 ES24 2012 15200 ES24 ES24
## 100 ES30 2012 17300 ES30 ES30
## 101 ES41 2012 13900 ES41 ES41
## 102 ES42 2012 11400 ES42 ES42
## 103 ES43 2012 10400 ES43 ES43
## 104 ES51 2012 15800 ES51 ES51
## 105 ES52 2012 12200 ES52 ES52
## 106 ES53 2012 13800 ES53 ES53
## 107 ES61 2012 10800 ES61 ES61
## 108 ES62 2012 11100 ES62 ES62
## 109 ES63 2012 11800 ES63 ES63
## 110 ES64 2012 10700 ES64 ES64
## 111 ES70 2012 11500 ES70 ES70
## 112 FI19 2012 15300 FI19 FI19
## 113 FI1B 2012 18200 FI1B FI1B
## 114 FI1C 2012 15700 FI1C FI1C
## 115 FI1D 2012 14700 FI1D FI1D
## 116 FI20 2012 18400 FI20 FI20
## 117 FR10 2012 21000 FR10 FR10
## 118 HR03 2012 8500 HR03 HR03
## 119 HR04 2012 8900 HR04 HR04
## 120 HU21 2012 9100 HU21 HU21
## 121 HU22 2012 9100 HU22 HU22
## 122 HU23 2012 8200 HU23 HU23
## 123 HU31 2012 7700 HU31 HU31
## 124 HU32 2012 7600 HU32 HU32
## 125 HU33 2012 8100 HU33 HU33
## 126 ITC1 2012 18600 ITC1 ITC1
## 127 ITC2 2012 18700 ITC2 ITC2
## 128 ITC3 2012 18900 ITC3 ITC3
## 129 ITC4 2012 20200 ITC4 ITC4
## 130 ITF1 2012 14700 ITF1 ITF1
## 131 ITF2 2012 13400 ITF2 ITF2
## 132 ITF3 2012 11600 ITF3 ITF3
## 133 ITF4 2012 12300 ITF4 ITF4
## 134 ITF5 2012 12000 ITF5 ITF5
## 135 ITF6 2012 11500 ITF6 ITF6
## 136 ITG1 2012 12000 ITG1 ITG1
## 137 ITG2 2012 13400 ITG2 ITG2
## 138 ITH1 2012 21100 ITH1 ITH1
## 139 ITH2 2012 19300 ITH2 ITH2
## 140 ITH3 2012 17600 ITH3 ITH3
## 141 ITH4 2012 18300 ITH4 ITH4
## 142 ITH5 2012 19800 ITH5 ITH5
## 143 ITI1 2012 17700 ITI1 ITI1
## 144 ITI2 2012 16900 ITI2 ITI2
## 145 ITI3 2012 16600 ITI3 ITI3
## 146 ITI4 2012 17100 ITI4 ITI4
## 147 LU00 2012 25000 LU00 LU00
## 148 LV00 2012 7700 LV00 LV00
## 149 MT00 2012 11600 MT00 MT00
## 150 NL11 2012 14100 NL11 NL11
## 151 NL12 2012 14500 NL12 NL12
## 152 NL13 2012 14400 NL13 NL13
## 153 NL21 2012 14500 NL21 NL21
## 154 NL22 2012 15700 NL22 NL22
## 155 NL23 2012 15200 NL23 NL23
## 156 NL31 2012 16500 NL31 NL31
## 157 NL32 2012 16800 NL32 NL32
## 158 NL33 2012 16000 NL33 NL33
## 159 NL34 2012 15900 NL34 NL34
## 160 NL41 2012 15500 NL41 NL41
## 161 NL42 2012 15000 NL42 NL42
## 162 PL21 2012 10200 PL21 PL21
## 163 PL22 2012 12200 PL22 PL22
## 164 PL41 2012 11200 PL41 PL41
## 165 PL42 2012 10400 PL42 PL42
## 166 PL43 2012 9800 PL43 PL43
## 167 PL51 2012 11100 PL51 PL51
## 168 PL52 2012 9700 PL52 PL52
## 169 PL61 2012 9600 PL61 PL61
## 170 PL62 2012 9300 PL62 PL62
## 171 PL63 2012 10600 PL63 PL63
## 172 PT11 2012 10500 PT11 PT11
## 173 PT15 2012 12800 PT15 PT15
## 174 PT16 2012 11400 PT16 PT16
## 175 PT17 2012 15300 PT17 PT17
## 176 PT18 2012 11500 PT18 PT18
## 177 PT20 2012 12300 PT20 PT20
## 178 PT30 2012 12200 PT30 PT30
## 179 RO11 2012 5900 RO11 RO11
## 180 RO12 2012 6300 RO12 RO12
## 181 RO21 2012 4600 RO21 RO21
## 182 RO22 2012 5600 RO22 RO22
## 183 RO31 2012 5300 RO31 RO31
## 184 RO32 2012 12200 RO32 RO32
## 185 RO41 2012 5200 RO41 RO41
## 186 RO42 2012 7700 RO42 RO42
## 187 SE11 2012 19500 SE11 SE11
## 188 SE12 2012 16500 SE12 SE12
## 189 SE21 2012 16300 SE21 SE21
## 190 SE22 2012 16800 SE22 SE22
## 191 SE23 2012 17300 SE23 SE23
## 192 SE31 2012 16100 SE31 SE31
## 193 SE32 2012 16200 SE32 SE32
## 194 SE33 2012 16300 SE33 SE33
## 195 SI03 2012 11700 SI03 SI03
## 196 SI04 2012 12800 SI04 SI04
## 197 SK01 2012 16500 SK01 SK01
## 198 SK02 2012 10600 SK02 SK02
## 199 SK03 2012 10100 SK03 SK03
## 200 SK04 2012 9200 SK04 SK04
## 201 UKC1 2012 14800 UKC1 UKC1
## 202 UKC2 2012 15400 UKC2 UKC2
## 203 UKD1 2012 17100 UKD1 UKD1
## 204 UKD3 2012 15000 UKD3 UKD3
## 205 UKD4 2012 15100 UKD4 UKD4
## 206 UKD6 2012 18500 UKD6 UKD6
## 207 UKD7 2012 15200 UKD7 UKD7
## 208 UKE1 2012 14900 UKE1 UKE1
## 209 UKE2 2012 18100 UKE2 UKE2
## 210 UKE3 2012 14300 UKE3 UKE3
## 211 UKE4 2012 14700 UKE4 UKE4
## 212 UKF1 2012 15200 UKF1 UKF1
## 213 UKF2 2012 16300 UKF2 UKF2
## 214 UKF3 2012 16200 UKF3 UKF3
## 215 UKG1 2012 17800 UKG1 UKG1
## 216 UKG2 2012 15900 UKG2 UKG2
## 217 UKG3 2012 13900 UKG3 UKG3
## 218 UKH1 2012 16900 UKH1 UKH1
## 219 UKH2 2012 19500 UKH2 UKH2
## 220 UKH3 2012 18200 UKH3 UKH3
## 221 UKI3 2012 37200 UKI3 UKI3
## 222 UKI4 2012 19900 UKI4 UKI4
## 223 UKI5 2012 17900 UKI5 UKI5
## 224 UKI6 2012 21300 UKI6 UKI6
## 225 UKI7 2012 21200 UKI7 UKI7
## 226 UKJ1 2012 20600 UKJ1 UKJ1
## 227 UKJ2 2012 21300 UKJ2 UKJ2
## 228 UKJ3 2012 18400 UKJ3 UKJ3
## 229 UKJ4 2012 18000 UKJ4 UKJ4
## 230 UKK1 2012 17700 UKK1 UKK1
## 231 UKK2 2012 17700 UKK2 UKK2
## 232 UKK3 2012 15900 UKK3 UKK3
## 233 UKK4 2012 16500 UKK4 UKK4
## 234 UKL1 2012 14800 UKL1 UKL1
## 235 UKL2 2012 16000 UKL2 UKL2
## 236 UKM5 2012 20300 UKM5 UKM5
## 237 UKM6 2012 16800 UKM6 UKM6
## 238 UKN0 2012 14600 UKN0 UKN0
## 239 FR21 2012 16400 FR21 FRF2
## 240 FR22 2012 16600 FR22 FRE2
## 241 FR23 2012 17100 FR23 FRD2
## 242 FR24 2012 17500 FR24 FRB0
## 243 FR25 2012 16900 FR25 FRD1
## 244 FR26 2012 17400 FR26 FRC1
## 245 FR30 2012 14900 FR30 FRE1
## 246 FR41 2012 16300 FR41 FRF3
## 247 FR42 2012 17100 FR42 FRF1
## 248 FR43 2012 16800 FR43 FRC2
## 249 FR51 2012 16800 FR51 FRG0
## 250 FR52 2012 16900 FR52 FRH0
## 251 FR53 2012 17000 FR53 FRI3
## 252 FR61 2012 17000 FR61 FRI1
## 253 FR62 2012 17000 FR62 FRJ2
## 254 FR63 2012 17100 FR63 FRI2
## 255 FR71 2012 17800 FR71 FRK2
## 256 FR72 2012 17700 FR72 FRK1
## 257 FR81 2012 15800 FR81 FRJ1
## 258 FR82 2012 17400 FR82 FRL0
## 259 FR83 2012 16400 FR83 FRM0
## 260 FRA1 2012 13700 FRA1 FRY1
## 261 FRA2 2012 13800 FRA2 FRY2
## 262 FRA3 2012 8700 FRA3 FRY3
## 263 FRA4 2012 13700 FRA4 FRY4
## 264 FRA5 2012 4600 FRA5 FRY5
## 265 HU10 2012 10300 HU10 <NA>
## 266 IE01 2012 13400 IE01 <NA>
## 267 IE02 2012 15100 IE02 <NA>
## 268 LT00 2012 10700 LT00 <NA>
## 269 PL11 2012 10900 PL11 PL71
## 270 PL12 2012 12900 PL12 <NA>
## 271 PL31 2012 9300 PL31 PL81
## 272 PL32 2012 8500 PL32 PL82
## 273 PL33 2012 9500 PL33 PL72
## 274 PL34 2012 9000 PL34 PL84
## 275 UKM2 2012 17500 UKM2 UKM7
## 276 UKM3 2012 16000 UKM3 <NA>
## 277 NO01 2012 20600 <NA> <NA>
## 278 NO02 2012 17500 <NA> <NA>
## 279 NO03 2012 18000 <NA> <NA>
## 280 NO04 2012 19000 <NA> <NA>
## 281 NO05 2012 18700 <NA> <NA>
## 282 NO06 2012 18200 <NA> <NA>
## 283 NO07 2012 18100 <NA> <NA>
## name unit
## 1 Burgenland PPCS_HAB
## 2 Niederösterreich PPCS_HAB
## 3 Wien PPCS_HAB
## 4 Kärnten PPCS_HAB
## 5 Steiermark PPCS_HAB
## 6 Oberösterreich PPCS_HAB
## 7 Salzburg PPCS_HAB
## 8 Tirol PPCS_HAB
## 9 Vorarlberg PPCS_HAB
## 10 Région de Bruxelles-Capitale/ Brussels Hoofdstedelijk Gewest PPCS_HAB
## 11 Prov. Antwerpen PPCS_HAB
## 12 Prov. Limburg (BE) PPCS_HAB
## 13 Prov. Oost-Vlaanderen PPCS_HAB
## 14 Prov. Vlaams-Brabant PPCS_HAB
## 15 Prov. West-Vlaanderen PPCS_HAB
## 16 Prov. Brabant Wallon PPCS_HAB
## 17 Prov. Hainaut PPCS_HAB
## 18 Prov. Liège PPCS_HAB
## 19 Prov. Luxembourg (BE) PPCS_HAB
## 20 Prov. Namur PPCS_HAB
## 21 Северозападен PPCS_HAB
## 22 Северен централен PPCS_HAB
## 23 Североизточен PPCS_HAB
## 24 Югоизточен PPCS_HAB
## 25 Югозападен PPCS_HAB
## 26 Южен централен PPCS_HAB
## 27 Κύπρος PPCS_HAB
## 28 Praha PPCS_HAB
## 29 Střední Čechy PPCS_HAB
## 30 Jihozápad PPCS_HAB
## 31 Severozápad PPCS_HAB
## 32 Severovýchod PPCS_HAB
## 33 Jihovýchod PPCS_HAB
## 34 Střední Morava PPCS_HAB
## 35 Moravskoslezsko PPCS_HAB
## 36 Stuttgart PPCS_HAB
## 37 Karlsruhe PPCS_HAB
## 38 Freiburg PPCS_HAB
## 39 Tübingen PPCS_HAB
## 40 Oberbayern PPCS_HAB
## 41 Niederbayern PPCS_HAB
## 42 Oberpfalz PPCS_HAB
## 43 Oberfranken PPCS_HAB
## 44 Mittelfranken PPCS_HAB
## 45 Unterfranken PPCS_HAB
## 46 Schwaben PPCS_HAB
## 47 Berlin PPCS_HAB
## 48 Brandenburg PPCS_HAB
## 49 Bremen PPCS_HAB
## 50 Hamburg PPCS_HAB
## 51 Darmstadt PPCS_HAB
## 52 Gießen PPCS_HAB
## 53 Kassel PPCS_HAB
## 54 Mecklenburg-Vorpommern PPCS_HAB
## 55 Braunschweig PPCS_HAB
## 56 Hannover PPCS_HAB
## 57 Lüneburg PPCS_HAB
## 58 Weser-Ems PPCS_HAB
## 59 Düsseldorf PPCS_HAB
## 60 Köln PPCS_HAB
## 61 Münster PPCS_HAB
## 62 Detmold PPCS_HAB
## 63 Arnsberg PPCS_HAB
## 64 Koblenz PPCS_HAB
## 65 Trier PPCS_HAB
## 66 Rheinhessen-Pfalz PPCS_HAB
## 67 Saarland PPCS_HAB
## 68 Dresden PPCS_HAB
## 69 Chemnitz PPCS_HAB
## 70 Leipzig PPCS_HAB
## 71 Sachsen-Anhalt PPCS_HAB
## 72 Schleswig-Holstein PPCS_HAB
## 73 Thüringen PPCS_HAB
## 74 Hovedstaden PPCS_HAB
## 75 Sjælland PPCS_HAB
## 76 Syddanmark PPCS_HAB
## 77 Midtjylland PPCS_HAB
## 78 Nordjylland PPCS_HAB
## 79 Eesti PPCS_HAB
## 80 Aττική PPCS_HAB
## 81 Βόρειο Αιγαίο PPCS_HAB
## 82 Νότιο Αιγαίο PPCS_HAB
## 83 Κρήτη PPCS_HAB
## 84 Aνατολική Μακεδονία, Θράκη PPCS_HAB
## 85 Κεντρική Μακεδονία PPCS_HAB
## 86 Δυτική Μακεδονία PPCS_HAB
## 87 Ήπειρος PPCS_HAB
## 88 Θεσσαλία PPCS_HAB
## 89 Ιόνια Νησιά PPCS_HAB
## 90 Δυτική Ελλάδα PPCS_HAB
## 91 Στερεά Ελλάδα PPCS_HAB
## 92 Πελοπόννησος PPCS_HAB
## 93 Galicia PPCS_HAB
## 94 Principado de Asturias PPCS_HAB
## 95 Cantabria PPCS_HAB
## 96 País Vasco PPCS_HAB
## 97 Comunidad Foral de Navarra PPCS_HAB
## 98 La Rioja PPCS_HAB
## 99 Aragón PPCS_HAB
## 100 Comunidad de Madrid PPCS_HAB
## 101 Castilla y León PPCS_HAB
## 102 Castilla-La Mancha PPCS_HAB
## 103 Extremadura PPCS_HAB
## 104 Cataluña PPCS_HAB
## 105 Comunidad Valenciana PPCS_HAB
## 106 Illes Balears PPCS_HAB
## 107 Andalucía PPCS_HAB
## 108 Región de Murcia PPCS_HAB
## 109 Ciudad Autónoma de Ceuta PPCS_HAB
## 110 Ciudad Autónoma de Melilla PPCS_HAB
## 111 Canarias PPCS_HAB
## 112 Länsi-Suomi PPCS_HAB
## 113 Helsinki-Uusimaa PPCS_HAB
## 114 Etelä-Suomi PPCS_HAB
## 115 Pohjois- ja Itä-Suomi PPCS_HAB
## 116 Åland PPCS_HAB
## 117 Ile-de-France PPCS_HAB
## 118 Jadranska Hrvatska PPCS_HAB
## 119 Kontinentalna Hrvatska PPCS_HAB
## 120 Közép-Dunántúl PPCS_HAB
## 121 Nyugat-Dunántúl PPCS_HAB
## 122 Dél-Dunántúl PPCS_HAB
## 123 Észak-Magyarország PPCS_HAB
## 124 Észak-Alföld PPCS_HAB
## 125 Dél-Alföld PPCS_HAB
## 126 Piemonte PPCS_HAB
## 127 Valle d’Aosta/Vallée d’Aoste PPCS_HAB
## 128 Liguria PPCS_HAB
## 129 Lombardia PPCS_HAB
## 130 Abruzzo PPCS_HAB
## 131 Molise PPCS_HAB
## 132 Campania PPCS_HAB
## 133 Puglia PPCS_HAB
## 134 Basilicata PPCS_HAB
## 135 Calabria PPCS_HAB
## 136 Sicilia PPCS_HAB
## 137 Sardegna PPCS_HAB
## 138 Provincia Autonoma di Bolzano/Bozen PPCS_HAB
## 139 Provincia Autonoma di Trento PPCS_HAB
## 140 Veneto PPCS_HAB
## 141 Friuli-Venezia Giulia PPCS_HAB
## 142 Emilia-Romagna PPCS_HAB
## 143 Toscana PPCS_HAB
## 144 Umbria PPCS_HAB
## 145 Marche PPCS_HAB
## 146 Lazio PPCS_HAB
## 147 Luxembourg PPCS_HAB
## 148 Latvija PPCS_HAB
## 149 Malta PPCS_HAB
## 150 Groningen PPCS_HAB
## 151 Friesland (NL) PPCS_HAB
## 152 Drenthe PPCS_HAB
## 153 Overijssel PPCS_HAB
## 154 Gelderland PPCS_HAB
## 155 Flevoland PPCS_HAB
## 156 Utrecht PPCS_HAB
## 157 Noord-Holland PPCS_HAB
## 158 Zuid-Holland PPCS_HAB
## 159 Zeeland PPCS_HAB
## 160 Noord-Brabant PPCS_HAB
## 161 Limburg (NL) PPCS_HAB
## 162 Małopolskie PPCS_HAB
## 163 Śląskie PPCS_HAB
## 164 Wielkopolskie PPCS_HAB
## 165 Zachodniopomorskie PPCS_HAB
## 166 Lubuskie PPCS_HAB
## 167 Dolnośląskie PPCS_HAB
## 168 Opolskie PPCS_HAB
## 169 Kujawsko-pomorskie PPCS_HAB
## 170 Warmińsko-mazurskie PPCS_HAB
## 171 Pomorskie PPCS_HAB
## 172 Norte PPCS_HAB
## 173 Algarve PPCS_HAB
## 174 Centro (PT) PPCS_HAB
## 175 Área Metropolitana de Lisboa PPCS_HAB
## 176 Alentejo PPCS_HAB
## 177 Região Autónoma dos Açores PPCS_HAB
## 178 Região Autónoma da Madeira PPCS_HAB
## 179 Nord-Vest PPCS_HAB
## 180 Centru PPCS_HAB
## 181 Nord-Est PPCS_HAB
## 182 Sud-Est PPCS_HAB
## 183 Sud-Muntenia PPCS_HAB
## 184 Bucureşti-Ilfov PPCS_HAB
## 185 Sud-Vest Oltenia PPCS_HAB
## 186 Vest PPCS_HAB
## 187 Stockholm PPCS_HAB
## 188 Östra Mellansverige PPCS_HAB
## 189 Småland med öarna PPCS_HAB
## 190 Sydsverige PPCS_HAB
## 191 Västsverige PPCS_HAB
## 192 Norra Mellansverige PPCS_HAB
## 193 Mellersta Norrland PPCS_HAB
## 194 Övre Norrland PPCS_HAB
## 195 Vzhodna Slovenija PPCS_HAB
## 196 Zahodna Slovenija PPCS_HAB
## 197 Bratislavský kraj PPCS_HAB
## 198 Západné Slovensko PPCS_HAB
## 199 Stredné Slovensko PPCS_HAB
## 200 Východné Slovensko PPCS_HAB
## 201 Tees Valley and Durham PPCS_HAB
## 202 Northumberland and Tyne and Wear PPCS_HAB
## 203 Cumbria PPCS_HAB
## 204 Greater Manchester PPCS_HAB
## 205 Lancashire PPCS_HAB
## 206 Cheshire PPCS_HAB
## 207 Merseyside PPCS_HAB
## 208 East Yorkshire and Northern Lincolnshire PPCS_HAB
## 209 North Yorkshire PPCS_HAB
## 210 South Yorkshire PPCS_HAB
## 211 West Yorkshire PPCS_HAB
## 212 Derbyshire and Nottinghamshire PPCS_HAB
## 213 Leicestershire, Rutland and Northamptonshire PPCS_HAB
## 214 Lincolnshire PPCS_HAB
## 215 Herefordshire, Worcestershire and Warwickshire PPCS_HAB
## 216 Shropshire and Staffordshire PPCS_HAB
## 217 West Midlands PPCS_HAB
## 218 East Anglia PPCS_HAB
## 219 Bedfordshire and Hertfordshire PPCS_HAB
## 220 Essex PPCS_HAB
## 221 Inner London — West PPCS_HAB
## 222 Inner London — East PPCS_HAB
## 223 Outer London — East and North East PPCS_HAB
## 224 Outer London — South PPCS_HAB
## 225 Outer London — West and North West PPCS_HAB
## 226 Berkshire, Buckinghamshire and Oxfordshire PPCS_HAB
## 227 Surrey, East and West Sussex PPCS_HAB
## 228 Hampshire and Isle of Wight PPCS_HAB
## 229 Kent PPCS_HAB
## 230 Gloucestershire, Wiltshire and Bristol/Bath area PPCS_HAB
## 231 Dorset and Somerset PPCS_HAB
## 232 Cornwall and Isles of Scilly PPCS_HAB
## 233 Devon PPCS_HAB
## 234 West Wales and The Valleys PPCS_HAB
## 235 East Wales PPCS_HAB
## 236 North Eastern Scotland PPCS_HAB
## 237 Highlands and Islands PPCS_HAB
## 238 Northern Ireland PPCS_HAB
## 239 Champagne-Ardenne PPCS_HAB
## 240 Picardie PPCS_HAB
## 241 Haute-Normandie PPCS_HAB
## 242 Centre — Val de Loire PPCS_HAB
## 243 Basse-Normandie PPCS_HAB
## 244 Bourgogne PPCS_HAB
## 245 Nord-Pas de Calais PPCS_HAB
## 246 Lorraine PPCS_HAB
## 247 Alsace PPCS_HAB
## 248 Franche-Comté PPCS_HAB
## 249 Pays de la Loire PPCS_HAB
## 250 Bretagne PPCS_HAB
## 251 Poitou-Charentes PPCS_HAB
## 252 Aquitaine PPCS_HAB
## 253 Midi-Pyrénées PPCS_HAB
## 254 Limousin PPCS_HAB
## 255 Rhône-Alpes PPCS_HAB
## 256 Auvergne PPCS_HAB
## 257 Languedoc-Roussillon PPCS_HAB
## 258 Provence-Alpes-Côte d’Azur PPCS_HAB
## 259 Corse PPCS_HAB
## 260 Guadeloupe PPCS_HAB
## 261 Martinique PPCS_HAB
## 262 Guyane PPCS_HAB
## 263 La Réunion PPCS_HAB
## 264 Mayotte PPCS_HAB
## 265 Közép-Magyarország PPCS_HAB
## 266 Border, Midland and Western PPCS_HAB
## 267 Southern and Eastern PPCS_HAB
## 268 Lietuva PPCS_HAB
## 269 Łódzkie PPCS_HAB
## 270 Mazowieckie PPCS_HAB
## 271 Lubelskie PPCS_HAB
## 272 Podkarpackie PPCS_HAB
## 273 Świętokrzyskie PPCS_HAB
## 274 Podlaskie PPCS_HAB
## 275 Eastern Scotland PPCS_HAB
## 276 South Western Scotland PPCS_HAB
## 277 <NA> PPCS_HAB
## 278 <NA> PPCS_HAB
## 279 <NA> PPCS_HAB
## 280 <NA> PPCS_HAB
## 281 <NA> PPCS_HAB
## 282 <NA> PPCS_HAB
## 283 <NA> PPCS_HAB
## na_item nuts_level change
## 1 B6N 2 unchanged
## 2 B6N 2 unchanged
## 3 B6N 2 unchanged
## 4 B6N 2 unchanged
## 5 B6N 2 unchanged
## 6 B6N 2 unchanged
## 7 B6N 2 unchanged
## 8 B6N 2 unchanged
## 9 B6N 2 unchanged
## 10 B6N 2 unchanged
## 11 B6N 2 unchanged
## 12 B6N 2 unchanged
## 13 B6N 2 unchanged
## 14 B6N 2 unchanged
## 15 B6N 2 unchanged
## 16 B6N 2 unchanged
## 17 B6N 2 unchanged
## 18 B6N 2 unchanged
## 19 B6N 2 unchanged
## 20 B6N 2 unchanged
## 21 B6N 2 unchanged
## 22 B6N 2 unchanged
## 23 B6N 2 unchanged
## 24 B6N 2 unchanged
## 25 B6N 2 unchanged
## 26 B6N 2 unchanged
## 27 B6N 2 unchanged
## 28 B6N 2 unchanged
## 29 B6N 2 unchanged
## 30 B6N 2 unchanged
## 31 B6N 2 unchanged
## 32 B6N 2 unchanged
## 33 B6N 2 unchanged
## 34 B6N 2 unchanged
## 35 B6N 2 unchanged
## 36 B6N 2 unchanged
## 37 B6N 2 unchanged
## 38 B6N 2 unchanged
## 39 B6N 2 unchanged
## 40 B6N 2 unchanged
## 41 B6N 2 unchanged
## 42 B6N 2 unchanged
## 43 B6N 2 unchanged
## 44 B6N 2 unchanged
## 45 B6N 2 unchanged
## 46 B6N 2 unchanged
## 47 B6N 2 unchanged
## 48 B6N 2 unchanged
## 49 B6N 2 unchanged
## 50 B6N 2 unchanged
## 51 B6N 2 unchanged
## 52 B6N 2 unchanged
## 53 B6N 2 unchanged
## 54 B6N 2 unchanged
## 55 B6N 2 unchanged
## 56 B6N 2 unchanged
## 57 B6N 2 unchanged
## 58 B6N 2 unchanged
## 59 B6N 2 unchanged
## 60 B6N 2 unchanged
## 61 B6N 2 unchanged
## 62 B6N 2 unchanged
## 63 B6N 2 unchanged
## 64 B6N 2 unchanged
## 65 B6N 2 unchanged
## 66 B6N 2 unchanged
## 67 B6N 2 unchanged
## 68 B6N 2 unchanged
## 69 B6N 2 unchanged
## 70 B6N 2 unchanged
## 71 B6N 2 unchanged
## 72 B6N 2 unchanged
## 73 B6N 2 unchanged
## 74 B6N 2 unchanged
## 75 B6N 2 unchanged
## 76 B6N 2 unchanged
## 77 B6N 2 unchanged
## 78 B6N 2 unchanged
## 79 B6N 2 unchanged
## 80 B6N 2 unchanged
## 81 B6N 2 unchanged
## 82 B6N 2 unchanged
## 83 B6N 2 unchanged
## 84 B6N 2 unchanged
## 85 B6N 2 unchanged
## 86 B6N 2 unchanged
## 87 B6N 2 unchanged
## 88 B6N 2 unchanged
## 89 B6N 2 unchanged
## 90 B6N 2 unchanged
## 91 B6N 2 unchanged
## 92 B6N 2 unchanged
## 93 B6N 2 unchanged
## 94 B6N 2 unchanged
## 95 B6N 2 unchanged
## 96 B6N 2 unchanged
## 97 B6N 2 unchanged
## 98 B6N 2 unchanged
## 99 B6N 2 unchanged
## 100 B6N 2 unchanged
## 101 B6N 2 unchanged
## 102 B6N 2 unchanged
## 103 B6N 2 unchanged
## 104 B6N 2 unchanged
## 105 B6N 2 unchanged
## 106 B6N 2 unchanged
## 107 B6N 2 unchanged
## 108 B6N 2 unchanged
## 109 B6N 2 unchanged
## 110 B6N 2 unchanged
## 111 B6N 2 unchanged
## 112 B6N 2 unchanged
## 113 B6N 2 unchanged
## 114 B6N 2 unchanged
## 115 B6N 2 unchanged
## 116 B6N 2 unchanged
## 117 B6N 2 unchanged
## 118 B6N 2 unchanged
## 119 B6N 2 unchanged
## 120 B6N 2 unchanged
## 121 B6N 2 unchanged
## 122 B6N 2 unchanged
## 123 B6N 2 unchanged
## 124 B6N 2 unchanged
## 125 B6N 2 unchanged
## 126 B6N 2 unchanged
## 127 B6N 2 unchanged
## 128 B6N 2 unchanged
## 129 B6N 2 unchanged
## 130 B6N 2 unchanged
## 131 B6N 2 unchanged
## 132 B6N 2 unchanged
## 133 B6N 2 unchanged
## 134 B6N 2 unchanged
## 135 B6N 2 unchanged
## 136 B6N 2 unchanged
## 137 B6N 2 unchanged
## 138 B6N 2 unchanged
## 139 B6N 2 unchanged
## 140 B6N 2 unchanged
## 141 B6N 2 unchanged
## 142 B6N 2 unchanged
## 143 B6N 2 unchanged
## 144 B6N 2 unchanged
## 145 B6N 2 unchanged
## 146 B6N 2 unchanged
## 147 B6N 2 unchanged
## 148 B6N 2 unchanged
## 149 B6N 2 unchanged
## 150 B6N 2 unchanged
## 151 B6N 2 unchanged
## 152 B6N 2 unchanged
## 153 B6N 2 unchanged
## 154 B6N 2 unchanged
## 155 B6N 2 unchanged
## 156 B6N 2 unchanged
## 157 B6N 2 unchanged
## 158 B6N 2 unchanged
## 159 B6N 2 unchanged
## 160 B6N 2 unchanged
## 161 B6N 2 unchanged
## 162 B6N 2 unchanged
## 163 B6N 2 unchanged
## 164 B6N 2 unchanged
## 165 B6N 2 unchanged
## 166 B6N 2 unchanged
## 167 B6N 2 unchanged
## 168 B6N 2 unchanged
## 169 B6N 2 unchanged
## 170 B6N 2 unchanged
## 171 B6N 2 unchanged
## 172 B6N 2 unchanged
## 173 B6N 2 unchanged
## 174 B6N 2 unchanged
## 175 B6N 2 unchanged
## 176 B6N 2 unchanged
## 177 B6N 2 unchanged
## 178 B6N 2 unchanged
## 179 B6N 2 unchanged
## 180 B6N 2 unchanged
## 181 B6N 2 unchanged
## 182 B6N 2 unchanged
## 183 B6N 2 unchanged
## 184 B6N 2 unchanged
## 185 B6N 2 unchanged
## 186 B6N 2 unchanged
## 187 B6N 2 unchanged
## 188 B6N 2 unchanged
## 189 B6N 2 unchanged
## 190 B6N 2 unchanged
## 191 B6N 2 unchanged
## 192 B6N 2 unchanged
## 193 B6N 2 unchanged
## 194 B6N 2 unchanged
## 195 B6N 2 unchanged
## 196 B6N 2 unchanged
## 197 B6N 2 unchanged
## 198 B6N 2 unchanged
## 199 B6N 2 unchanged
## 200 B6N 2 unchanged
## 201 B6N 2 unchanged
## 202 B6N 2 unchanged
## 203 B6N 2 unchanged
## 204 B6N 2 unchanged
## 205 B6N 2 unchanged
## 206 B6N 2 unchanged
## 207 B6N 2 unchanged
## 208 B6N 2 unchanged
## 209 B6N 2 unchanged
## 210 B6N 2 unchanged
## 211 B6N 2 unchanged
## 212 B6N 2 unchanged
## 213 B6N 2 unchanged
## 214 B6N 2 unchanged
## 215 B6N 2 unchanged
## 216 B6N 2 unchanged
## 217 B6N 2 unchanged
## 218 B6N 2 unchanged
## 219 B6N 2 unchanged
## 220 B6N 2 unchanged
## 221 B6N 2 unchanged
## 222 B6N 2 unchanged
## 223 B6N 2 unchanged
## 224 B6N 2 unchanged
## 225 B6N 2 unchanged
## 226 B6N 2 unchanged
## 227 B6N 2 unchanged
## 228 B6N 2 unchanged
## 229 B6N 2 unchanged
## 230 B6N 2 unchanged
## 231 B6N 2 unchanged
## 232 B6N 2 unchanged
## 233 B6N 2 unchanged
## 234 B6N 2 unchanged
## 235 B6N 2 unchanged
## 236 B6N 2 unchanged
## 237 B6N 2 unchanged
## 238 B6N 2 unchanged
## 239 B6N 2 recoded
## 240 B6N 2 recoded
## 241 B6N 2 recoded
## 242 B6N 2 recoded and relabelled
## 243 B6N 2 recoded
## 244 B6N 2 recoded
## 245 B6N 2 recoded
## 246 B6N 2 recoded
## 247 B6N 2 recoded
## 248 B6N 2 recoded
## 249 B6N 2 recoded
## 250 B6N 2 recoded
## 251 B6N 2 recoded
## 252 B6N 2 recoded
## 253 B6N 2 recoded
## 254 B6N 2 recoded
## 255 B6N 2 recoded
## 256 B6N 2 recoded
## 257 B6N 2 recoded
## 258 B6N 2 recoded
## 259 B6N 2 recoded
## 260 B6N 2 recoded
## 261 B6N 2 recoded
## 262 B6N 2 recoded
## 263 B6N 2 recoded
## 264 B6N 2 recoded
## 265 B6N 2 discontinued; split into new HU11 and HU12
## 266 B6N 2 discontinued
## 267 B6N 2 discontinued
## 268 B6N 2 split into new LT01 and LT02
## 269 B6N 2 recoded
## 270 B6N 2 discontinued; split into new PL91 and PL92
## 271 B6N 2 recoded
## 272 B6N 2 recoded
## 273 B6N 2 recoded
## 274 B6N 2 recoded
## 275 B6N 2 boundary shift; lost ex-UKM24
## 276 B6N 2 discontinued; split into new UKM8 and UKM9
## 277 B6N 2 not in EU - not controlled
## 278 B6N 2 not in EU - not controlled
## 279 B6N 2 not in EU - not controlled
## 280 B6N 2 not in EU - not controlled
## 281 B6N 2 not in EU - not controlled
## 282 B6N 2 not in EU - not controlled
## 283 B6N 2 not in EU - not controlled
## resolution nuts_2016 nuts_2013
## 1 <NA> TRUE TRUE
## 2 <NA> TRUE TRUE
## 3 <NA> TRUE TRUE
## 4 <NA> TRUE TRUE
## 5 <NA> TRUE TRUE
## 6 <NA> TRUE TRUE
## 7 <NA> TRUE TRUE
## 8 <NA> TRUE TRUE
## 9 <NA> TRUE TRUE
## 10 <NA> TRUE TRUE
## 11 <NA> TRUE TRUE
## 12 <NA> TRUE TRUE
## 13 <NA> TRUE TRUE
## 14 <NA> TRUE TRUE
## 15 <NA> TRUE TRUE
## 16 <NA> TRUE TRUE
## 17 <NA> TRUE TRUE
## 18 <NA> TRUE TRUE
## 19 <NA> TRUE TRUE
## 20 <NA> TRUE TRUE
## 21 <NA> TRUE TRUE
## 22 <NA> TRUE TRUE
## 23 <NA> TRUE TRUE
## 24 <NA> TRUE TRUE
## 25 <NA> TRUE TRUE
## 26 <NA> TRUE TRUE
## 27 <NA> TRUE TRUE
## 28 <NA> TRUE TRUE
## 29 <NA> TRUE TRUE
## 30 <NA> TRUE TRUE
## 31 <NA> TRUE TRUE
## 32 <NA> TRUE TRUE
## 33 <NA> TRUE TRUE
## 34 <NA> TRUE TRUE
## 35 <NA> TRUE TRUE
## 36 <NA> TRUE TRUE
## 37 <NA> TRUE TRUE
## 38 <NA> TRUE TRUE
## 39 <NA> TRUE TRUE
## 40 <NA> TRUE TRUE
## 41 <NA> TRUE TRUE
## 42 <NA> TRUE TRUE
## 43 <NA> TRUE TRUE
## 44 <NA> TRUE TRUE
## 45 <NA> TRUE TRUE
## 46 <NA> TRUE TRUE
## 47 <NA> TRUE TRUE
## 48 <NA> TRUE TRUE
## 49 <NA> TRUE TRUE
## 50 <NA> TRUE TRUE
## 51 <NA> TRUE TRUE
## 52 <NA> TRUE TRUE
## 53 <NA> TRUE TRUE
## 54 <NA> TRUE TRUE
## 55 <NA> TRUE TRUE
## 56 <NA> TRUE TRUE
## 57 <NA> TRUE TRUE
## 58 <NA> TRUE TRUE
## 59 <NA> TRUE TRUE
## 60 <NA> TRUE TRUE
## 61 <NA> TRUE TRUE
## 62 <NA> TRUE TRUE
## 63 <NA> TRUE TRUE
## 64 <NA> TRUE TRUE
## 65 <NA> TRUE TRUE
## 66 <NA> TRUE TRUE
## 67 <NA> TRUE TRUE
## 68 <NA> TRUE TRUE
## 69 <NA> TRUE TRUE
## 70 <NA> TRUE TRUE
## 71 <NA> TRUE TRUE
## 72 <NA> TRUE TRUE
## 73 <NA> TRUE TRUE
## 74 <NA> TRUE TRUE
## 75 <NA> TRUE TRUE
## 76 <NA> TRUE TRUE
## 77 <NA> TRUE TRUE
## 78 <NA> TRUE TRUE
## 79 <NA> TRUE TRUE
## 80 <NA> TRUE TRUE
## 81 <NA> TRUE TRUE
## 82 <NA> TRUE TRUE
## 83 <NA> TRUE TRUE
## 84 <NA> TRUE TRUE
## 85 <NA> TRUE TRUE
## 86 <NA> TRUE TRUE
## 87 <NA> TRUE TRUE
## 88 <NA> TRUE TRUE
## 89 <NA> TRUE TRUE
## 90 <NA> TRUE TRUE
## 91 <NA> TRUE TRUE
## 92 <NA> TRUE TRUE
## 93 <NA> TRUE TRUE
## 94 <NA> TRUE TRUE
## 95 <NA> TRUE TRUE
## 96 <NA> TRUE TRUE
## 97 <NA> TRUE TRUE
## 98 <NA> TRUE TRUE
## 99 <NA> TRUE TRUE
## 100 <NA> TRUE TRUE
## 101 <NA> TRUE TRUE
## 102 <NA> TRUE TRUE
## 103 <NA> TRUE TRUE
## 104 <NA> TRUE TRUE
## 105 <NA> TRUE TRUE
## 106 <NA> TRUE TRUE
## 107 <NA> TRUE TRUE
## 108 <NA> TRUE TRUE
## 109 <NA> TRUE TRUE
## 110 <NA> TRUE TRUE
## 111 <NA> TRUE TRUE
## 112 <NA> TRUE TRUE
## 113 <NA> TRUE TRUE
## 114 <NA> TRUE TRUE
## 115 <NA> TRUE TRUE
## 116 <NA> TRUE TRUE
## 117 <NA> TRUE TRUE
## 118 <NA> TRUE TRUE
## 119 <NA> TRUE TRUE
## 120 <NA> TRUE TRUE
## 121 <NA> TRUE TRUE
## 122 <NA> TRUE TRUE
## 123 <NA> TRUE TRUE
## 124 <NA> TRUE TRUE
## 125 <NA> TRUE TRUE
## 126 <NA> TRUE TRUE
## 127 <NA> TRUE TRUE
## 128 <NA> TRUE TRUE
## 129 <NA> TRUE TRUE
## 130 <NA> TRUE TRUE
## 131 <NA> TRUE TRUE
## 132 <NA> TRUE TRUE
## 133 <NA> TRUE TRUE
## 134 <NA> TRUE TRUE
## 135 <NA> TRUE TRUE
## 136 <NA> TRUE TRUE
## 137 <NA> TRUE TRUE
## 138 <NA> TRUE TRUE
## 139 <NA> TRUE TRUE
## 140 <NA> TRUE TRUE
## 141 <NA> TRUE TRUE
## 142 <NA> TRUE TRUE
## 143 <NA> TRUE TRUE
## 144 <NA> TRUE TRUE
## 145 <NA> TRUE TRUE
## 146 <NA> TRUE TRUE
## 147 <NA> TRUE TRUE
## 148 <NA> TRUE TRUE
## 149 <NA> TRUE TRUE
## 150 <NA> TRUE TRUE
## 151 <NA> TRUE TRUE
## 152 <NA> TRUE TRUE
## 153 <NA> TRUE TRUE
## 154 <NA> TRUE TRUE
## 155 <NA> TRUE TRUE
## 156 <NA> TRUE TRUE
## 157 <NA> TRUE TRUE
## 158 <NA> TRUE TRUE
## 159 <NA> TRUE TRUE
## 160 <NA> TRUE TRUE
## 161 <NA> TRUE TRUE
## 162 <NA> TRUE TRUE
## 163 <NA> TRUE TRUE
## 164 <NA> TRUE TRUE
## 165 <NA> TRUE TRUE
## 166 <NA> TRUE TRUE
## 167 <NA> TRUE TRUE
## 168 <NA> TRUE TRUE
## 169 <NA> TRUE TRUE
## 170 <NA> TRUE TRUE
## 171 <NA> TRUE TRUE
## 172 <NA> TRUE TRUE
## 173 <NA> TRUE TRUE
## 174 <NA> TRUE TRUE
## 175 <NA> TRUE TRUE
## 176 <NA> TRUE TRUE
## 177 <NA> TRUE TRUE
## 178 <NA> TRUE TRUE
## 179 <NA> TRUE TRUE
## 180 <NA> TRUE TRUE
## 181 <NA> TRUE TRUE
## 182 <NA> TRUE TRUE
## 183 <NA> TRUE TRUE
## 184 <NA> TRUE TRUE
## 185 <NA> TRUE TRUE
## 186 <NA> TRUE TRUE
## 187 <NA> TRUE TRUE
## 188 <NA> TRUE TRUE
## 189 <NA> TRUE TRUE
## 190 <NA> TRUE TRUE
## 191 <NA> TRUE TRUE
## 192 <NA> TRUE TRUE
## 193 <NA> TRUE TRUE
## 194 <NA> TRUE TRUE
## 195 <NA> TRUE TRUE
## 196 <NA> TRUE TRUE
## 197 <NA> TRUE TRUE
## 198 <NA> TRUE TRUE
## 199 <NA> TRUE TRUE
## 200 <NA> TRUE TRUE
## 201 <NA> TRUE TRUE
## 202 <NA> TRUE TRUE
## 203 <NA> TRUE TRUE
## 204 <NA> TRUE TRUE
## 205 <NA> TRUE TRUE
## 206 <NA> TRUE TRUE
## 207 <NA> TRUE TRUE
## 208 <NA> TRUE TRUE
## 209 <NA> TRUE TRUE
## 210 <NA> TRUE TRUE
## 211 <NA> TRUE TRUE
## 212 <NA> TRUE TRUE
## 213 <NA> TRUE TRUE
## 214 <NA> TRUE TRUE
## 215 <NA> TRUE TRUE
## 216 <NA> TRUE TRUE
## 217 <NA> TRUE TRUE
## 218 <NA> TRUE TRUE
## 219 <NA> TRUE TRUE
## 220 <NA> TRUE TRUE
## 221 <NA> TRUE TRUE
## 222 <NA> TRUE TRUE
## 223 <NA> TRUE TRUE
## 224 <NA> TRUE TRUE
## 225 <NA> TRUE TRUE
## 226 <NA> TRUE TRUE
## 227 <NA> TRUE TRUE
## 228 <NA> TRUE TRUE
## 229 <NA> TRUE TRUE
## 230 <NA> TRUE TRUE
## 231 <NA> TRUE TRUE
## 232 <NA> TRUE TRUE
## 233 <NA> TRUE TRUE
## 234 <NA> TRUE TRUE
## 235 <NA> TRUE TRUE
## 236 <NA> TRUE TRUE
## 237 <NA> TRUE TRUE
## 238 <NA> TRUE TRUE
## 239 FRF2=FR21 FALSE TRUE
## 240 FRE2=FR22 FALSE TRUE
## 241 FRD2=FR23 FALSE TRUE
## 242 FRB0=FR24 FALSE TRUE
## 243 FRD1=FRFR25 FALSE TRUE
## 244 FRC1=FR26 FALSE TRUE
## 245 FRE1=FR30 FALSE TRUE
## 246 FRF3=FR41 FALSE TRUE
## 247 FRF1=FR42 FALSE TRUE
## 248 FRC2=FR43 FALSE TRUE
## 249 FRG0=FR51 FALSE TRUE
## 250 FRH0=FR52 FALSE TRUE
## 251 FRI3=FR53 FALSE TRUE
## 252 FRI1=FR61 FALSE TRUE
## 253 FRJ2=FR62 FALSE TRUE
## 254 FRI2=FR63 FALSE TRUE
## 255 FRK2=FR71 FALSE TRUE
## 256 FRK1=FR72 FALSE TRUE
## 257 FRJ1=FR81 FALSE TRUE
## 258 FRL0=FR82 FALSE TRUE
## 259 FRM0=FR83 FALSE TRUE
## 260 FRY1=FRA1 FALSE TRUE
## 261 FRY2=FRA2 FALSE TRUE
## 262 FRY3=FRA3 FALSE TRUE
## 263 FRY4=FRA4 FALSE TRUE
## 264 FRY5=FRA5 FALSE TRUE
## 265 <NA> FALSE TRUE
## 266 <NA> FALSE TRUE
## 267 <NA> FALSE TRUE
## 268 <NA> FALSE TRUE
## 269 PL71=PL11 FALSE TRUE
## 270 <NA> FALSE TRUE
## 271 PL81=PL31 FALSE TRUE
## 272 PL82=PL32 FALSE TRUE
## 273 PL72=PL33 FALSE TRUE
## 274 PL84=PL34 FALSE TRUE
## 275 UKM7=UKM2-UKM24 FALSE TRUE
## 276 <NA> FALSE TRUE
## 277 check with national authorities FALSE FALSE
## 278 check with national authorities FALSE FALSE
## 279 check with national authorities FALSE FALSE
## 280 check with national authorities FALSE FALSE
## 281 check with national authorities FALSE FALSE
## 282 check with national authorities FALSE FALSE
## 283 check with national authorities FALSE FALSE
Zooming on regions UKM
you can see that UKM5
and UKM6
are unchanged, UKM3
gave birth to two new regional units UKM8
and UKM9
(this is an additive change) and UKM2
lost a NUTS3 unit UKM24
. This latter one is also an additive change, but maybe far more difficult to handle in practice, because data about UKM24
may not be available in most cases, as NUTS1 and NUTS2 level data is only available for a very few basic indicators on NUTS3 level. You can, however, easily maintain backward compatibility among UKM3
, UKM8
, UKM9
, because the new data is just available in higher resolution, or, in other words, for two halves of the earlier UKM3
region.
# for readability the previous example is filtered and reduced
eurostat::tgs00026 %>%
filter ( time == 2012 ) %>%
harmonize_geo_code() %>%
filter ( grepl("UKM", geo) ) %>%
select ( geo, values, change )
## In this data frame 238 observations are coded with the current NUTS2016
## geo labels and 276 observations/rows have NUTS2013 historical labels.
## Not checking for regional label consistency in non-EU countries.
## In this data frame not controlled countries: NO
## with altogether 7 observations/rows.
## geo values change
## 1 UKM5 20300 unchanged
## 2 UKM6 16800 unchanged
## 3 UKM2 17500 boundary shift; lost ex-UKM24
## 4 UKM3 16000 discontinued; split into new UKM8 and UKM9
For easier filtering in further use, there are two logical variables added to the data frame, i.e. nuts_2013
and nuts_2016
. Many datasets contain non-EU regions not covered in the Eurostat correspondence tables, their filter is nuts_2013 == FALSE & nuts_2016 == FALSE
.
The following example will filter out all rows that use a geo code which is defined in NUTS2013 and cannot be found in NUTS2016. These are the main sources of incompatibility in your data panel.
eurostat::tgs00026 %>%
filter ( time == 2012 ) %>%
harmonize_geo_code() %>%
filter ( nuts_2013, ! nuts_2016 )
## In this data frame 238 observations are coded with the current NUTS2016
## geo labels and 276 observations/rows have NUTS2013 historical labels.
## Not checking for regional label consistency in non-EU countries.
## In this data frame not controlled countries: NO
## with altogether 7 observations/rows.
## geo time values code13 code16 name unit na_item
## 1 FR21 2012 16400 FR21 FRF2 Champagne-Ardenne PPCS_HAB B6N
## 2 FR22 2012 16600 FR22 FRE2 Picardie PPCS_HAB B6N
## 3 FR23 2012 17100 FR23 FRD2 Haute-Normandie PPCS_HAB B6N
## 4 FR24 2012 17500 FR24 FRB0 Centre — Val de Loire PPCS_HAB B6N
## 5 FR25 2012 16900 FR25 FRD1 Basse-Normandie PPCS_HAB B6N
## 6 FR26 2012 17400 FR26 FRC1 Bourgogne PPCS_HAB B6N
## 7 FR30 2012 14900 FR30 FRE1 Nord-Pas de Calais PPCS_HAB B6N
## 8 FR41 2012 16300 FR41 FRF3 Lorraine PPCS_HAB B6N
## 9 FR42 2012 17100 FR42 FRF1 Alsace PPCS_HAB B6N
## 10 FR43 2012 16800 FR43 FRC2 Franche-Comté PPCS_HAB B6N
## 11 FR51 2012 16800 FR51 FRG0 Pays de la Loire PPCS_HAB B6N
## 12 FR52 2012 16900 FR52 FRH0 Bretagne PPCS_HAB B6N
## 13 FR53 2012 17000 FR53 FRI3 Poitou-Charentes PPCS_HAB B6N
## 14 FR61 2012 17000 FR61 FRI1 Aquitaine PPCS_HAB B6N
## 15 FR62 2012 17000 FR62 FRJ2 Midi-Pyrénées PPCS_HAB B6N
## 16 FR63 2012 17100 FR63 FRI2 Limousin PPCS_HAB B6N
## 17 FR71 2012 17800 FR71 FRK2 Rhône-Alpes PPCS_HAB B6N
## 18 FR72 2012 17700 FR72 FRK1 Auvergne PPCS_HAB B6N
## 19 FR81 2012 15800 FR81 FRJ1 Languedoc-Roussillon PPCS_HAB B6N
## 20 FR82 2012 17400 FR82 FRL0 Provence-Alpes-Côte d’Azur PPCS_HAB B6N
## 21 FR83 2012 16400 FR83 FRM0 Corse PPCS_HAB B6N
## 22 FRA1 2012 13700 FRA1 FRY1 Guadeloupe PPCS_HAB B6N
## 23 FRA2 2012 13800 FRA2 FRY2 Martinique PPCS_HAB B6N
## 24 FRA3 2012 8700 FRA3 FRY3 Guyane PPCS_HAB B6N
## 25 FRA4 2012 13700 FRA4 FRY4 La Réunion PPCS_HAB B6N
## 26 FRA5 2012 4600 FRA5 FRY5 Mayotte PPCS_HAB B6N
## 27 HU10 2012 10300 HU10 <NA> Közép-Magyarország PPCS_HAB B6N
## 28 IE01 2012 13400 IE01 <NA> Border, Midland and Western PPCS_HAB B6N
## 29 IE02 2012 15100 IE02 <NA> Southern and Eastern PPCS_HAB B6N
## 30 LT00 2012 10700 LT00 <NA> Lietuva PPCS_HAB B6N
## 31 PL11 2012 10900 PL11 PL71 Łódzkie PPCS_HAB B6N
## 32 PL12 2012 12900 PL12 <NA> Mazowieckie PPCS_HAB B6N
## 33 PL31 2012 9300 PL31 PL81 Lubelskie PPCS_HAB B6N
## 34 PL32 2012 8500 PL32 PL82 Podkarpackie PPCS_HAB B6N
## 35 PL33 2012 9500 PL33 PL72 Świętokrzyskie PPCS_HAB B6N
## 36 PL34 2012 9000 PL34 PL84 Podlaskie PPCS_HAB B6N
## 37 UKM2 2012 17500 UKM2 UKM7 Eastern Scotland PPCS_HAB B6N
## 38 UKM3 2012 16000 UKM3 <NA> South Western Scotland PPCS_HAB B6N
## nuts_level change resolution
## 1 2 recoded FRF2=FR21
## 2 2 recoded FRE2=FR22
## 3 2 recoded FRD2=FR23
## 4 2 recoded and relabelled FRB0=FR24
## 5 2 recoded FRD1=FRFR25
## 6 2 recoded FRC1=FR26
## 7 2 recoded FRE1=FR30
## 8 2 recoded FRF3=FR41
## 9 2 recoded FRF1=FR42
## 10 2 recoded FRC2=FR43
## 11 2 recoded FRG0=FR51
## 12 2 recoded FRH0=FR52
## 13 2 recoded FRI3=FR53
## 14 2 recoded FRI1=FR61
## 15 2 recoded FRJ2=FR62
## 16 2 recoded FRI2=FR63
## 17 2 recoded FRK2=FR71
## 18 2 recoded FRK1=FR72
## 19 2 recoded FRJ1=FR81
## 20 2 recoded FRL0=FR82
## 21 2 recoded FRM0=FR83
## 22 2 recoded FRY1=FRA1
## 23 2 recoded FRY2=FRA2
## 24 2 recoded FRY3=FRA3
## 25 2 recoded FRY4=FRA4
## 26 2 recoded FRY5=FRA5
## 27 2 discontinued; split into new HU11 and HU12 <NA>
## 28 2 discontinued <NA>
## 29 2 discontinued <NA>
## 30 2 split into new LT01 and LT02 <NA>
## 31 2 recoded PL71=PL11
## 32 2 discontinued; split into new PL91 and PL92 <NA>
## 33 2 recoded PL81=PL31
## 34 2 recoded PL82=PL32
## 35 2 recoded PL72=PL33
## 36 2 recoded PL84=PL34
## 37 2 boundary shift; lost ex-UKM24 UKM7=UKM2-UKM24
## 38 2 discontinued; split into new UKM8 and UKM9 <NA>
## nuts_2016 nuts_2013
## 1 FALSE TRUE
## 2 FALSE TRUE
## 3 FALSE TRUE
## 4 FALSE TRUE
## 5 FALSE TRUE
## 6 FALSE TRUE
## 7 FALSE TRUE
## 8 FALSE TRUE
## 9 FALSE TRUE
## 10 FALSE TRUE
## 11 FALSE TRUE
## 12 FALSE TRUE
## 13 FALSE TRUE
## 14 FALSE TRUE
## 15 FALSE TRUE
## 16 FALSE TRUE
## 17 FALSE TRUE
## 18 FALSE TRUE
## 19 FALSE TRUE
## 20 FALSE TRUE
## 21 FALSE TRUE
## 22 FALSE TRUE
## 23 FALSE TRUE
## 24 FALSE TRUE
## 25 FALSE TRUE
## 26 FALSE TRUE
## 27 FALSE TRUE
## 28 FALSE TRUE
## 29 FALSE TRUE
## 30 FALSE TRUE
## 31 FALSE TRUE
## 32 FALSE TRUE
## 33 FALSE TRUE
## 34 FALSE TRUE
## 35 FALSE TRUE
## 36 FALSE TRUE
## 37 FALSE TRUE
## 38 FALSE TRUE
The first, logical step is to find those data points which are in fact identical, only their regional codes have changed. For example, FRC1
is in fact identical to region with the NUTS2013 label FR26
(Bourgogne region in France.) In this case, you can simply re-label the regions that appear to be different just because of the different codes applied.
The helper function harmonize_geo_code()
will assist you with these cases.
To make the example more clear, let’s zoom on changes in France. You can see that many regions changes, but some of them only changed labels. For forward compatibility, harmonize_geo_code()
changed all geo labels to the current, NUTS2016
definition. In fact, this is needed to use maps, for example.
# for readability the previous example is filtered and reduced
eurostat::tgs00026 %>%
filter ( time == 2012 ) %>%
harmonize_geo_code() %>%
filter ( grepl("FR", geo) ) %>%
select ( geo, code13, code16, change, values )
## In this data frame 238 observations are coded with the current NUTS2016
## geo labels and 276 observations/rows have NUTS2013 historical labels.
## Not checking for regional label consistency in non-EU countries.
## In this data frame not controlled countries: NO
## with altogether 7 observations/rows.
## geo code13 code16 change values
## 1 FR10 FR10 FR10 unchanged 21000
## 2 FR21 FR21 FRF2 recoded 16400
## 3 FR22 FR22 FRE2 recoded 16600
## 4 FR23 FR23 FRD2 recoded 17100
## 5 FR24 FR24 FRB0 recoded and relabelled 17500
## 6 FR25 FR25 FRD1 recoded 16900
## 7 FR26 FR26 FRC1 recoded 17400
## 8 FR30 FR30 FRE1 recoded 14900
## 9 FR41 FR41 FRF3 recoded 16300
## 10 FR42 FR42 FRF1 recoded 17100
## 11 FR43 FR43 FRC2 recoded 16800
## 12 FR51 FR51 FRG0 recoded 16800
## 13 FR52 FR52 FRH0 recoded 16900
## 14 FR53 FR53 FRI3 recoded 17000
## 15 FR61 FR61 FRI1 recoded 17000
## 16 FR62 FR62 FRJ2 recoded 17000
## 17 FR63 FR63 FRI2 recoded 17100
## 18 FR71 FR71 FRK2 recoded 17800
## 19 FR72 FR72 FRK1 recoded 17700
## 20 FR81 FR81 FRJ1 recoded 15800
## 21 FR82 FR82 FRL0 recoded 17400
## 22 FR83 FR83 FRM0 recoded 16400
## 23 FRA1 FRA1 FRY1 recoded 13700
## 24 FRA2 FRA2 FRY2 recoded 13800
## 25 FRA3 FRA3 FRY3 recoded 8700
## 26 FRA4 FRA4 FRY4 recoded 13700
## 27 FRA5 FRA5 FRY5 recoded 4600
In the change log, recoded
means that the geo code was changed in the transition to NUTS2016, recoded and relabelled
means that not only the code, but also the official name of the region changed.
You can decide which coding you prefer to use. Beware to use consistent map definitions if you will visualize your work - you can add the NUTS2013 labelled data to a map that contains the NUTS2013 boundary definitions.
For comparing with additional data sources, it may be useful to make sure that you use the current name of the region. Function recode_to_nuts_2016()
changes the name column to the NUTS2016 definition, when applicable, and recode_to_nuts_2013()
will use the earlier definition.
# for readability the previous example is filtered and reduced
eurostat::tgs00026 %>%
filter ( time == 2012 ) %>%
recode_to_nuts_2016() %>%
filter ( grepl("FR", geo) ) %>%
select ( geo, name, code16, change, resolution, values )
## In this data frame 238 observations are coded with the current NUTS2016
## geo labels and 276 observations/rows have NUTS2013 historical labels.
## Not checking for regional label consistency in non-EU countries.
## In this data frame not controlled countries: NO
## with altogether 7 observations/rows.
## Warning in recode_to_nuts_2016(.): The following regions have no NUTS2016
## labels: HU10, IE01, IE02, LT00, PL12, UKM3.
## geo name code16 change resolution
## 1 FR10 Ile-de-France FR10 unchanged <NA>
## 2 FRB0 Centre — Val de Loire FRB0 recoded and relabelled FRB0=FR24
## 3 FRC1 Bourgogne FRC1 recoded FRC1=FR26
## 4 FRC2 Franche-Comté FRC2 recoded FRC2=FR43
## 5 FRD1 Basse-Normandie FRD1 recoded FRD1=FRFR25
## 6 FRD2 Haute-Normandie FRD2 recoded FRD2=FR23
## 7 FRE1 Nord-Pas de Calais FRE1 recoded FRE1=FR30
## 8 FRE2 Picardie FRE2 recoded FRE2=FR22
## 9 FRF1 Alsace FRF1 recoded FRF1=FR42
## 10 FRF2 Champagne-Ardenne FRF2 recoded FRF2=FR21
## 11 FRF3 Lorraine FRF3 recoded FRF3=FR41
## 12 FRG0 Pays de la Loire FRG0 recoded FRG0=FR51
## 13 FRH0 Bretagne FRH0 recoded FRH0=FR52
## 14 FRI1 Aquitaine FRI1 recoded FRI1=FR61
## 15 FRI2 Limousin FRI2 recoded FRI2=FR63
## 16 FRI3 Poitou-Charentes FRI3 recoded FRI3=FR53
## 17 FRJ1 Languedoc-Roussillon FRJ1 recoded FRJ1=FR81
## 18 FRJ2 Midi-Pyrénées FRJ2 recoded FRJ2=FR62
## 19 FRK1 Auvergne FRK1 recoded FRK1=FR72
## 20 FRK2 Rhône-Alpes FRK2 recoded FRK2=FR71
## 21 FRL0 Provence-Alpes-Côte d’Azur FRL0 recoded FRL0=FR82
## 22 FRM0 Corse FRM0 recoded FRM0=FR83
## 23 FRY1 Guadeloupe FRY1 recoded FRY1=FRA1
## 24 FRY2 Martinique FRY2 recoded FRY2=FRA2
## 25 FRY3 Guyane FRY3 recoded FRY3=FRA3
## 26 FRY4 La Réunion FRY4 recoded FRY4=FRA4
## 27 FRY5 Mayotte FRY5 recoded FRY5=FRA5
## values
## 1 21000
## 2 17500
## 3 17400
## 4 16800
## 5 16900
## 6 17100
## 7 14900
## 8 16600
## 9 17100
## 10 16400
## 11 16300
## 12 16800
## 13 16900
## 14 17000
## 15 17100
## 16 17000
## 17 15800
## 18 17000
## 19 17700
## 20 17800
## 21 17400
## 22 16400
## 23 13700
## 24 13800
## 25 8700
## 26 13700
## 27 4600
Another useful filter is change == "not in the EU"
. The non-EU member state region definitions (and their possible changes) are not covered in the Eurostat correspondence table.
# for readability the previous example is filtered and reduced
eurostat::tgs00026 %>%
filter ( time == 2012 ) %>%
recode_to_nuts_2016() %>%
filter ( ! nuts_2013, ! nuts_2016 )
## In this data frame 238 observations are coded with the current NUTS2016
## geo labels and 276 observations/rows have NUTS2013 historical labels.
## Not checking for regional label consistency in non-EU countries.
## In this data frame not controlled countries: NO
## with altogether 7 observations/rows.
## Warning in recode_to_nuts_2016(.): The following regions have no NUTS2016
## labels: HU10, IE01, IE02, LT00, PL12, UKM3.
## geo time values code13 code16 unit na_item nuts_level
## 1 NO01 2012 20600 <NA> <NA> PPCS_HAB B6N 2
## 2 NO02 2012 17500 <NA> <NA> PPCS_HAB B6N 2
## 3 NO03 2012 18000 <NA> <NA> PPCS_HAB B6N 2
## 4 NO04 2012 19000 <NA> <NA> PPCS_HAB B6N 2
## 5 NO05 2012 18700 <NA> <NA> PPCS_HAB B6N 2
## 6 NO06 2012 18200 <NA> <NA> PPCS_HAB B6N 2
## 7 NO07 2012 18100 <NA> <NA> PPCS_HAB B6N 2
## change resolution nuts_2016
## 1 not in EU - not controlled check with national authorities FALSE
## 2 not in EU - not controlled check with national authorities FALSE
## 3 not in EU - not controlled check with national authorities FALSE
## 4 not in EU - not controlled check with national authorities FALSE
## 5 not in EU - not controlled check with national authorities FALSE
## 6 not in EU - not controlled check with national authorities FALSE
## 7 not in EU - not controlled check with national authorities FALSE
## nuts_2013 name
## 1 FALSE <NA>
## 2 FALSE <NA>
## 3 FALSE <NA>
## 4 FALSE <NA>
## 5 FALSE <NA>
## 6 FALSE <NA>
## 7 FALSE <NA>
You may need to review these manually, and if you have a problem with the boundaries, refer to the national statistical authorities of these non-EU countries.
Eurostat released an untidy Excel document that contains all boundary changes from the NUTS2013
to the NUTS2016
boundary definition. You can load these tidy tables into your global environment with data("nuts_correspondence")
and data ("regional_changes_2016")
or simply reference them as eurostat::nuts_correspondence
and eurostat::regional_changes_2016
. (The eurostat::
part can be omitted if you have called earlierlibrary(eurostat)
in your code.)
Because NUTS3 level data is very scarce, we did not create a programmatic solution to filling in new boundaries for NUTS2 regions.
However, using these correspondence information, many NUTS1 regions, when NUTS2 data is present in the data, can be filled in with historical data using simple equivalence or addition.
## # A tibble: 28 x 2
## code16 resolution
## <chr> <chr>
## 1 FRB FRB=FR24
## 2 FRC FRC=FR26+FR43
## 3 FRD FRD=FR23+FR25
## 4 FRE FRE=FR22+FR30
## 5 FRF FRF=FR21+FR41+FR42
## 6 FRG FRG=FR51
## 7 FRH FRH=FR52
## 8 FRI FRI=FR53+FR61+FR63
## 9 FRJ FRJ=FR62+FR81
## 10 FRK FRK=FR7
## # … with 18 more rows
For example, the new NUTS1 regions FRB
is simply the continuation of the earlier NUTS2 region FR24
. Or, the new NUTS1 region FRC
can be filled with historical data with simply adding FR26
and FR43
NUTS2 data observations.
When applying the latest boundaries (and visualizing according to current boundaries) is not important, it may be easier, or leave you with a larger panel of data if you use the correspondence information to backfill new, NUTS2016 data into the NUTS2013 boundaries, simply because you have more data following the earlier definition.
There are many imputation methodologies implemented in various R libraries (see CRAN Task View: Missing Data) You have to beware that most of these methods are not satisfactory in regional datasets. Whenever missingness is caused by boundary changes, it will certainly violate many imputation method’s conditions. For example, many imputation strategies work when missingness is random. Therefore, it is very important that you first align the boundaries, and then apply imputation.
Consider the following very simple, hypothetical example:
## # A tibble: 4 x 4
## regions Y2014 Y2015 Y2016
## <chr> <dbl> <dbl> <dbl>
## 1 A02 - from 2015 in D1 greater region 1 NA NA
## 2 B01 - from 2015 in D1 greater region 2 NA NA
## 3 C1 10 NA 10
## 4 D1 - from 2015 A02+B02 NA NA 5
How would you interpolate the missing 2015 data? In the case of region C
, there are no boundary changes, and the data seems constant. You would interpolate the value to be 10.
However, in the case of the new D1
region, we first reconstruct the sum of its smaller regions, A02
+ B01
where we have historical data. If D1
region would have been defined as a region in 2014, its value would have been 3. So the correct intrapolation is 4.
## # A tibble: 4 x 4
## regions Y2014 Y2015 Y2016
## <chr> <dbl> <dbl> <dbl>
## 1 A02 - from 2015 in D1 greater region 1 NA NA
## 2 B01 - from 2015 in D1 greater region 2 NA NA
## 3 C1 - 2015: intrapolated 10 10 10
## 4 D1 - 2014: A02+B02 3 4 5
You may still wonder if you should use the old boundary definitions, because D1
had a higher resolution of data given it detailed the statistics to its constituent subregions, A02
and B01
.
## regions Y2014 Y2015 Y2016
## 1 A02 - extrapolated with D1 data 1 1.5 2
## 2 B01 - extrapolated with D1 data 2 2.5 3
## 3 C1 - 2015: intrapolated 10 10.0 10
## 4 D1 - 2014: A02+B02 3 4.0 5
There are a few things to keep in mind when you start actually analyse the data.
If you fill up your data set to both old and new boundary definitions, your dataset appears to be bigger, but it does not contain more information. Keeping both A02 and B01
and D1
in your panel duplicates the new D1 region in your panel which is formerly known as A02
and B01
. If you measure growth, you will overestimate average growth, because the high-growth region is duplicated in the dataset. You must remove either A02 and B01
or D1
from your panel, otherwise you will skew the effect that you analyse towards D1
.
The use of the old boundaries makes sense if you have more data in the old definition prior to 2014. In this case, your dataset will contain less estimated values if you stick to the historical boundaries, and extrapolate the discontinued A02
and B01
regions, and leave D1
out of your models.
The use of new boundaries is useful when you have more data after Y2016. In this case, the switch to a lower geographical resolution (merging A02 and B01
to D1
) is balanced by the fact that you have more recent and more factual data about the less detailed D1
observation. In this case, backfilling via reverse extrapolation the D1
data is the better strategy. You should leave A02
and B01
out of your further analysis.
There are problems with Eurostat’s data products on two levels: with the data and with the metadata.
The data problems are affecting the work of national statistical authorities, because they are responsible for the creation, validation, and when necessary, the later correction of data. Eurostat cannot change the data they submit; however, it can change harmonization methodology, guidelines, and when necessary, initiate change in statistical regulation. I think that updating guidelines, and possible even regulation would not be controversial in the case when member states would be asked to provide the history of their statistics in the cases when the content of the data did not change, only its metadata, i.e. the labelling. If a member state changed the boundaries of a region, it may or may not be possible to re-calculate the data for this region. However, when only the name and short code changed, the data points are there, and they should be included in the data products.
Regarding metadata, Eurostat could improve its products without the involvement of member states. The current problem with the metadata of the regional statistics is that they are not tidy and not fully consistent. The variable column ‘geo’ in the statistical products in fact contains at least four different information: the level of aggregation, the label of the information in the NUTS2013 definition and the label of the information in the NUTS2016 information - and at least in the case of Greece and Slovenia NUTS2010 sometimes information, too. Depending on what view you take on the contents of the table, this means that a seemingly single data table in fact is an unlabelled join of four tables: a national data table, and three regional data tables following different regional boundaries.
The addition of the NUTS (or NUTS equivalent non-EU) level would already remove a lot of confusion and several metadata errors. The source of the confusion is that many products claim to contain NUTS2 information, but they contain a mixture of NUTS0, NUTS1 and NUTS3 information. While the geo column can be easily filtered (by the number of characters of the geo code) this information is not known to all users. Adding the nuts_level variable in our case makes joining various data sources far easier and less confusing.
Several ways could be found to add the information currently contained in the (otherwise not tidy) Correspondence Table to each regional product. This would require adding the information to which NUTS definition does the row (observation) in the dataset comply with. It could be done in several ways from a data presentation and organization point of view. What should be minimally added is the NUTS definition (vocabulary) where the NUTS unit can be found, and potentially, as our helper functions do, further information about conversion.
A solution to the metadata presentation of the regional statistical products does not require the modification of statistical regulations (which must be adopted by the member states of the EU) and it is very urgent, because the next NUTS changes are already announced, and if NUTS2021 will be implemented in the same way, the usability of the data tables will decrease even more, as more joining errors will occur in mapping or modelling use.
And at last, it would be a non-controversial change, which may require updating guidelines or regulations, is to add, at least on a non-mandatory basis, non-EU countries to the Correspondence tables. It is very unlikely that EEA countries like Norway or potential candidate countries like North Macedonia would have objections to report their regional boundary changes to the Correspondence tables. This is a self-evident change, which is also necessary after Brexit, given that the United Kingdom’s boundary data will have to remain in the Correspondence tables.
This tutorial was created with
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.10
##
## Matrix products: default
## BLAS: /home/lemila/bin/R-4.0.3/lib/libRblas.so
## LAPACK: /home/lemila/bin/R-4.0.3/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tibble_3.0.6 dplyr_1.0.3 eurostat_3.7.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.0 xfun_0.20 purrr_0.3.4 sf_0.9-7
## [5] lattice_0.20-41 vctrs_0.3.6 generics_0.1.0 htmltools_0.5.1.1
## [9] yaml_2.2.1 utf8_1.1.4 rlang_0.4.10 e1071_1.7-4
## [13] pkgdown_1.6.1 pillar_1.4.7 glue_1.4.2 DBI_1.1.1
## [17] sp_1.4-5 RColorBrewer_1.1-2 lifecycle_0.2.0 plyr_1.8.6
## [21] stringr_1.4.0 ragg_0.4.1 memoise_2.0.0 evaluate_0.14
## [25] knitr_1.31 fastmap_1.1.0 curl_4.3 class_7.3-18
## [29] fansi_0.4.2 broom_0.7.4 Rcpp_1.0.6 KernSmooth_2.23-18
## [33] readr_1.4.0 backports_1.2.1 classInt_0.4-3 cachem_1.0.1
## [37] desc_1.2.0 jsonlite_1.7.2 countrycode_1.2.0 systemfonts_0.3.2
## [41] fs_1.5.0 textshaping_0.2.1 hms_1.0.0 digest_0.6.27
## [45] stringi_1.5.3 rprojroot_2.0.2 grid_4.0.3 cli_2.2.0
## [49] tools_4.0.3 magrittr_2.0.1 RefManageR_1.3.0 crayon_1.3.4
## [53] tidyr_1.1.2 pkgconfig_2.0.3 ellipsis_0.3.1 xml2_1.3.2
## [57] lubridate_1.7.9.2 assertthat_0.2.1 rmarkdown_2.6.4 httr_1.4.2
## [61] R6_2.5.0 units_0.6-7 compiler_4.0.3