Skip to contents

This sotkanet R package provides access to data from the Sotkanet portal. Your contributions and bug reports and other feedback are welcome.

Introduction

The Sotkanet portal provides over 2000 demographic indicators across Finland and Europe. It is maintained by the National Institute for Health and Welfare (THL). For more information, see Information about Sotkanet and API description.

The sotkanet R package enables access to the Sotkanet API using R facilitating the use of the data from the API. This package is part of rOpenGov.

Installation

To install latest release version from CRAN, use:

install.packages("sotkanet")

To install development version from GitHub, use:

library(remotes)
remotes::install_github("ropengov/sotkanet")

Test the installation by loading the package:

Usage

Listing availabe indicators

Load sotkanet and other packages used in the vignette.

List available Sotkanet indicators using sotkanet_indicators():

# Using a preset list of indicators to avoid a large download
indicators <- sotkanet_indicators(id = c(4, 5, 6, 127, 10012, 10027),
                                  type = "table", lang = "en")
kable(head(indicators))
indicator indicator.title indicator.organization indicator.organization.title
4 Hospital care for mental disorders, recipients aged 0-17 per 1000 persons of the same age 2 Finnish institute for Health and Welfare (THL)
5 Social assistance recipients aged 25-64, as % of total population of same age 2 Finnish institute for Health and Welfare (THL)
6 Specialised somatic inpatient health care, care days for those aged 75 and over per 1000 persons of same age 2 Finnish institute for Health and Welfare (THL)
127 Population at year end 3 Statistics Finland
10012 (EU) GDP per capita in Purchasing Power Standards (PPS) 58 Statistical Office of the European Communities (Eurostat)
10027 (EU) Standardised death rate due to suicides per 100 000 persons 58 Statistical Office of the European Communities (Eurostat)

List geographical regions with available indicators using sotkanet_regions():

# List of the first few regions
regions <- sotkanet_regions(type = "table", lang = "en")
kable(head(regions))
region region.title region.code region.category region.uri
833 Area for Southern Finland AVI 1 ALUEHALLINTOVIRASTO http://www.yso.fi/onto/kunnat/ahv1
834 Area for Southwestern Finland AVI 2 ALUEHALLINTOVIRASTO http://www.yso.fi/onto/kunnat/ahv2
835 Area for Eastern Finland AVI 3 ALUEHALLINTOVIRASTO http://www.yso.fi/onto/kunnat/ahv3
836 Area for Western and Inland Finland AVI 4 ALUEHALLINTOVIRASTO http://www.yso.fi/onto/kunnat/ahv4
837 Area for Northern Finland AVI 5 ALUEHALLINTOVIRASTO http://www.yso.fi/onto/kunnat/ahv5
838 Area for Lapland AVI 6 ALUEHALLINTOVIRASTO http://www.yso.fi/onto/kunnat/ahv6

Querying Sotkanet data

To download the data, we need to know the indicator for it. You can look for the right indicator using aforementioned sotkanet_indicators() or by browsing the Sotkanet website. For example, the indicator no. 10012 responds to the (EU) GPD per capita in Purchasing Power Standards (PPS) dataset. The data can be downloaded with get_sotkanet() function. If we want, for example, the GPD data from Finland for 2000-2010, the function call is:

# Get the indicator data
dat <- get_sotkanet(indicators = 10012, years = 2000:2010,
                    genders = c("total"), lang = "en", regions = "Finland")

# The first few lines of the data
kable(head(dat)) %>%
  kable_styling() %>%
  scroll_box(width = "100%")
indicator region year gender primary.value absolute.value indicator.title region.title region.code region.category indicator.organization.title
10012 1045 2005 total 114 NA (EU) GDP per capita in Purchasing Power Standards (PPS) Finland 246 POHJOISMAAT Statistical Office of the European Communities (Eurostat)
10012 1022 2010 total 116 NA (EU) GDP per capita in Purchasing Power Standards (PPS) Finland 246 EUROOPPA Statistical Office of the European Communities (Eurostat)
10012 1022 2002 total 115 NA (EU) GDP per capita in Purchasing Power Standards (PPS) Finland 246 EUROOPPA Statistical Office of the European Communities (Eurostat)
10012 1045 2006 total 114 NA (EU) GDP per capita in Purchasing Power Standards (PPS) Finland 246 POHJOISMAAT Statistical Office of the European Communities (Eurostat)
10012 1022 2001 total 115 NA (EU) GDP per capita in Purchasing Power Standards (PPS) Finland 246 EUROOPPA Statistical Office of the European Communities (Eurostat)
10012 1045 2003 total 112 NA (EU) GDP per capita in Purchasing Power Standards (PPS) Finland 246 POHJOISMAAT Statistical Office of the European Communities (Eurostat)

The data can also be downloaded by using interactive function sotkanet_interactive(). It gives user interactive alternative for downloading the dataset. This function can also print dataset citation, code for the get_sotkanet() call and fixity checksum.

Dataset citation can be printed for any indicator using the function sotkanet_cite(). The citation for the GPD data is:

sotkanet_cite(10012, lang = "en")
#> @Misc{,
#>   title = {(EU) GDP per capita in Purchasing Power Standards (PPS)},
#>   url = {https://sotkanet.fi/sotkanet/en/metadata/indicators/10012},
#>   organization = {Statistical Office of the European Communities (Eurostat)},
#>   year = {2017},
#>   urldate = {2024-07-15},
#>   type = {Dataset},
#>   note = {Accessed 2024-07-15, dataset last updated 2017-10-24},
#> }

Examples

Let’s now demonstrate the use of the package with two examples. For the first example we will use the GPD data from Nordic countries (Pohjoismaat) for 2000-2010 and draw a time series of the data comparing the countries.

# Get indicator data
dat <- get_sotkanet(indicators = 10012, years = 2000:2010,
                    genders = "total", lang = "en", region.category = "POHJOISMAAT")

indicator_name <- as.character(unique(dat$indicator.title))
indicator_source <- as.character(unique(dat$indicator.organization.title))

# Retrive metadata
dat_meta <- sotkanet_indicator_metadata(id = 10012)

# Visualize
library(ggplot2)
p <- ggplot(dat, aes(x = year, y = primary.value,
                     group = region.title, color = region.title)) + 
  geom_line() + ggtitle(paste0(indicator_name, " \n", indicator_source)) +
  labs(x = "Year", y = "Value",caption = paste0(
    "Data source: https://sotkanet.fi/sotkanet",  "\n", "Data date: ", dat_meta$`data-updated`)) +
  scale_x_continuous(breaks = seq(2000,2010, by = 1)) +
  theme(title = element_text(size = 10)) +
  theme(axis.title.x = element_text(size = 15)) +
  theme(axis.title.y = element_text(size = 15)) +
  theme(legend.title = element_text(size = 15))
print(p)

For the second example we will plot the population of Finnish municipalities against a measure of educational level.

# Get the data for the two indicators
dat <- get_sotkanet(indicators = c(127, 180), 
                    years = 2022, lang = "en",
                    genders = c("total"), region.category = c("KUNTA"))
# Pick the fields of interest and remove duplicates
datf <- dat[,c("region.title", "indicator.title", "primary.value")]
datf <- datf[!duplicated(datf),]
dw <- reshape(datf, idvar = "region.title",
              timevar = "indicator.title", direction = "wide")
names(dw) <- c("Municipality", "Population", "Education_level")

# Vizualise
p <- ggplot(dw, aes(x = log(Population), y = Education_level)) + geom_point(size = 3) +
  ggtitle("Education level vs. population size") +
    theme(title = element_text(size = 10)) +
  labs(y = "Education level", caption = "Data source: https://sotkanet.fi/sotkanet") +
  theme(axis.title.x = element_text(size = 15)) +
  theme(axis.title.y = element_text(size = 15)) +
  theme(legend.title = element_text(size = 15))
plot(p)

Licensing and Citations

Sotkanet data

Cite Sotkanet and link to https://sotkanet.fi/sotkanet/fi/index. Also mention indicator provider.

Central points:

  • SOTKAnet REST API is meant for non-regular data queries. Avoid regular and repeated downloads.
  • SOTKAnet API can be used as the basis for other systems
  • Metadata for regions and indicators are under CC-BY 4.0
  • THL indicators are under CC-BY 4.0
  • Indicators provided by third parties can be used only by separate agreement!

Sotkanet R package

This work can be freely used, modified and distributed under the Two-clause BSD license.

citation("sotkanet")
#> Kindly cite the sotkanet R package as follows:
#> 
#>   Leo Lahti, Einari Happonen, Juuso Parkkinen, Joona Lehtomaki, Vesa
#>   Saaristo, Aleksi Lahtinen and Pyry Kantanen (rOpenGov 2024).
#>   sotkanet: Sotkanet Open Data Access and Analysis. R package version
#>   0.10.1 https://github.com/rOpenGov/sotkanet
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Misc{,
#>     title = {sotkanet: Sotkanet Open Data Access and Analysis},
#>     author = {Leo Lahti and Einari Happonen and Joona Lehtomäki and Juuso Parkkinen and Joona Lehtomaki and Vesa Saaristo and Pyry Kantanen and Aleksi Lahtinen},
#>     url = {https://github.com/rOpenGov/sotkanet},
#>     year = {2024},
#>     note = {R package version 0.10.1},
#>   }
#> 
#> Many thanks for all contributors!

Suggestions and bug reports

You can check the package GitHub page for known issues. You can can also use it to report new bugs and to make suggestions for improving the package.

Session info

This vignette was created with

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.5.1    kableExtra_1.4.0 sotkanet_0.10.1 
#> 
#> loaded via a namespace (and not attached):
#>  [1] rappdirs_0.3.3     utf8_1.2.4         sass_0.4.9         generics_0.1.3    
#>  [5] xml2_1.3.6         stringi_1.8.4      hms_1.1.3          digest_0.6.36     
#>  [9] magrittr_2.0.3     grid_4.4.1         evaluate_0.24.0    timechange_0.3.0  
#> [13] fastmap_1.2.0      plyr_1.8.9         jsonlite_1.8.8     backports_1.5.0   
#> [17] httr_1.4.7         fansi_1.0.6        viridisLite_0.4.2  scales_1.3.0      
#> [21] httr2_1.0.1        bibtex_0.5.1       textshaping_0.4.0  jquerylib_0.1.4   
#> [25] cli_3.6.3          rlang_1.1.4        munsell_0.5.1      withr_3.0.0       
#> [29] cachem_1.1.0       yaml_2.3.9         tools_4.4.1        tzdb_0.4.0        
#> [33] dplyr_1.1.4        colorspace_2.1-0   frictionless_1.1.0 curl_5.2.1        
#> [37] vctrs_0.6.5        R6_2.5.1           lifecycle_1.0.4    lubridate_1.9.3   
#> [41] RefManageR_1.4.0   stringr_1.5.1      fs_1.6.4           htmlwidgets_1.6.4 
#> [45] ragg_1.3.2         pkgconfig_2.0.3    desc_1.4.3         gtable_0.3.5      
#> [49] pkgdown_2.1.0      bslib_0.7.0        pillar_1.9.0       glue_1.7.0        
#> [53] Rcpp_1.0.12        systemfonts_1.1.0  highr_0.11         tidyselect_1.2.1  
#> [57] xfun_0.45          tibble_3.2.1       rstudioapi_0.16.0  knitr_1.48        
#> [61] farver_2.1.2       htmltools_0.5.8.1  labeling_0.4.3     rmarkdown_2.27    
#> [65] svglite_2.1.3      readr_2.1.5        compiler_4.4.1