
Accessing Spatial and Population Data from Statistics Finland OGC api
Markus Kainu
2025-04-29
Source:vignettes/geofi_statfi_ogc.Rmd
geofi_statfi_ogc.Rmd
Introduction
The geofi
package provides tools to access spatial data
from Statistics Finland’s OGC API, including
administrative boundaries, population data by administrative units, and
population data by statistical grid cells. This vignette demonstrates
how to use the package’s core functions to:
- Retrieve Finnish administrative area polygons (e.g., municipalities, regions).
- Fetch population data linked to administrative units.
- Access population data for statistical grid cells.
Unlike some other spatial data APIs, no API key is required
to access Statistics Finland’s OGC API, making it straightforward to get
started. The package handles pagination, spatial filtering, and
coordinate reference system (CRS) transformations, delivering data as
sf
objects compatible with the sf
package for
spatial analysis and visualization.
Package Overview
The geofi
package includes the following key functions
for accessing Statistics Finland data:
-
ogc_get_statfi_area()
: Retrieves administrative area polygons (e.g., municipalities, wellbeing areas) for specified years, scales, and tessellation types. -
ogc_get_statfi_area_pop()
: Fetches administrative area polygons with associated population data, pivoted into a wide format. -
ogc_get_statfi_statistical_grid()
: Retrieves population data for statistical grid cells at different resolutions (1km or 5km). -
fetch_ogc_api_statfi()
: An internal function that handles low-level API requests and pagination (not typically called directly by users).
All functions return spatial data as sf
objects, making
it easy to integrate with spatial analysis workflows in R.
Step 1: Retrieving Administrative Area Polygons
The ogc_get_statfi_area()
function retrieves polygons
for Finnish administrative units, such as municipalities
(kunta
), wellbeing areas (hyvinvointialue
), or
regions (maakunta
). You can customize the output with
parameters like:
-
year
: The year of the boundaries (2020–2022). -
scale
: Map resolution (1:1,000,000 or 1:4,500,000). -
tessellation
: Type of administrative unit (e.g., kunta, hyvinvointialue). -
crs
: Coordinate reference system (EPSG:3067 or EPSG:4326). -
limit
: Maximum number of features (or NULL for all). -
bbox
: Bounding box for spatial filtering.
Example: Downloading Municipalities
Fetch all municipalities for 2022 at the 1:4,500,000 scale:
munis <- ogc_get_statfi_area(year = 2022, scale = 4500, tessellation = "kunta")
print(munis)
Visualize the municipalities using ggplot2
:
ggplot(munis) +
geom_sf() +
theme_minimal() +
labs(title = "Finnish Municipalities (2022)")
Example: Spatial Filtering with a Bounding Box
To retrieve municipalities within a specific area (e.g., southern Finland), use the bbox parameter. Coordinates should match the specified crs.
bbox <- "200000,6600000,500000,6900000" # In EPSG:3067
munis_south <- ogc_get_statfi_area(
year = 2022,
scale = 4500,
tessellation = "kunta",
bbox = bbox,
crs = 3067
)
Visualize the filtered results:
ggplot(munis_south) +
geom_sf() +
theme_minimal() +
labs(title = "Municipalities in Southern Finland (2022)")
Example: Fetching Wellbeing Areas
Retrieve wellbeing areas (hyvinvointialue) for 2022:
wellbeing <- ogc_get_statfi_area(
year = 2022,
tessellation = "hyvinvointialue",
scale = 4500
)
Step 2: Retrieving Population Data by Administrative Area
The ogc_get_statfi_area_pop()
function fetches
administrative area polygons with associated population data, pivoted
into a wide format where each population variable is a column.
Parameters include:
-
year
: The year of the data (2019–2021). -
crs
: Coordinate reference system (EPSG:3067 or EPSG:4326). -
limit
: Maximum number of features (orNULL
for all). -
bbox
: Bounding box for spatial filtering.
Example: Fetching Population Data
Retrieve population data for 2021:
pop_data <- ogc_get_statfi_area_pop(year = 2021, crs = 3067)
print(pop_data)
Visualize population density (assuming a variable like
population_total
exists):
ggplot(pop_data) +
geom_sf(aes(fill = population_total)) +
scale_fill_viridis_c(option = "plasma") +
theme_minimal() +
labs(title = "Population by Administrative Area (2021)", fill = "Population")
Example: Population Data with Bounding Box
Fetch population data within a bounding box:
bbox <- "200000,6600000,500000,6900000"
pop_south <- ogc_get_statfi_area_pop(year = 2021, bbox = bbox, crs = 3067)
Step 3: Retrieving Population Data by Statistical Grid
The ogc_get_statfi_statistical_grid()
function retrieves
population data for statistical grid cells at 1km or 5km resolution.
Data is returned in EPSG:3067 (ETRS89 / TM35FIN). Parameters
include:
-
year
: The year of the data (2019–2021). -
resolution
: Grid cell size (1000m or 5000m). -
limit
: Maximum number of features (orNULL
for all). -
bbox
: Bounding box for spatial filtering.
Example: Fetching 5km Grid Data
Retrieve population data for a 5km grid in 2021:
grid_data <- ogc_get_statfi_statistical_grid(year = 2021, resolution = 5000)
print(grid_data)
Visualize the grid data:
ggplot(grid_data) +
geom_sf(aes(fill = population_total), color = NA) +
scale_fill_viridis_c(option = "magma") +
theme_minimal() +
labs(title = "Population by 5km Grid Cells (2021)", fill = "Population")
Example: 1km Grid with Bounding Box
Fetch 1km grid data within a bounding box:
bbox <- "200000,6600000,500000,6900000"
grid_south <- ogc_get_statfi_statistical_grid(
year = 2021,
resolution = 1000,
bbox = bbox
)
Advanced Features
Pagination
When limit = NULL
, the
fetch_ogc_api_statfi()
function automatically paginates
through large datasets, fetching up to 10,000 features per request. This
ensures all available data is retrieved, even for large administrative
or grid datasets.
Error Handling
The package includes robust error handling:
- Validates inputs (e.g., year, scale, tessellation, CRS, bounding box format).
- Provides informative error messages for API failures or invalid responses.
- Returns
NULL
with a warning if no data is retrieved, helping users diagnose issues.
Coordinate Reference Systems
The functions support two CRS options:
- EPSG:3067 (ETRS89 / TM35FIN): The default for Finnish spatial data, suitable for local analyses.
- EPSG:4326 (WGS84): Useful for global compatibility or web mapping.
Note that ogc_get_statfi_statistical_grid()
is fixed to
EPSG:3067, as per the API’s design.
Best Practices
-
Test with Limits: For large datasets (e.g., 1km
grids), start with a small
limit
orbbox
to estimate runtime before fetching all features. -
CRS Selection: Use
EPSG:3067
for Finnish data unless you needEPSG:4326
for compatibility with other systems. -
Check Tessellation Types: Verify valid
tessellation
options (kunta
,hyvinvointialue
, etc.) when usingogc_get_statfi_area()
. -
Inspect Output: Population data from
ogc_get_statfi_area_pop()
andogc_get_statfi_statistical_grid()
is pivoted into wide format. Check column names to identify available variables.
Additional Resources
- Statistics Finland Geoserver: Documentation for the OGC API.
- geofi GitHub Repository: Source code and issue tracker.
- sf Package Documentation: For working with sf objects.
- ggplot2 Documentation: For visualizing spatial data.
Conclusion
The geofi
package simplifies access to Statistics
Finland’s spatial and population data, enabling analyses of
administrative boundaries, population distributions, and grid-based
statistics. With no API key required, users can quickly retrieve and
visualize data using sf
and ggplot2
. Try the
examples above to explore Finland’s spatial and demographic
datasets!