Skip to contents

Introduction

The purpose of the refine_metadata() function is to:

  • Ensure completeness by filling in missing values with placeholder text.
  • Standardize key metadata fields for easier analysis.
  • Select only the most relevant fields, simplifying the dataset.

This refinement process makes the metadata more consistent and user-friendly, reducing potential issues in subsequent analysis or reporting.

library(finna)
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
print(refined_data)
## # A tibble: 100 × 8
##    Title                   Author Year  Language Formats Subjects Library Series
##    <chr>                   <chr>  <chr> <chr>    <chr>   <chr>    <chr>   <chr> 
##  1 Sibelius favourites : … Sibel… 2001  Unknown… Äänite… orkeste… Lapin … Unkno…
##  2 SIBELIUS                TAWAS… 1997  fin      Kirja,… SIBELIUS Anders… Unkno…
##  3 Sibelius                Tawas… 1997  fin      Kirja,… Sibeliu… Anders… Unkno…
##  4 Sibelius                Lampi… 1984  fin      Kirja,… Sibeliu… Helka-… Unkno…
##  5 Sibelius                Tawas… 2003  fin      Kirja,… Sibeliu… Kansal… Unkno…
##  6 Sibelius                Ringb… 1948  fin      Kirja,… Sibeliu… Kirkes… Unkno…
##  7 Sibelius                Downe… 1945  fin      Kirja,… Sibeliu… OUTI-k… Unkno…
##  8 SIBELIUS                Lampi… 1995  fin      Kirja,… Sibeliu… Vaasan… Unkno…
##  9 Sibelius                Tawas… 2003  fin      Kirja,… Sibeliu… Vaasan… Unkno…
## 10 Sibelius                Tawas… 1968  swe      Kirja,… Sibeliu… Helle-… Unkno…
## # ℹ 90 more rows

integrate to other metadata

To integrate two datasets using full_join() from dplyr, you can write the code directly like this:

library(dplyr)

# Example Finna metadata (metadata1)
finna_data <- search_finna("sibelius",limit = 4)

# Example other dataset to merge with (metadata2)
other_data <- tibble::tibble(
  Title = c("Sibelius Symphony No. 5", "Finlandia", "Valse Triste"),
  Rating = c(5, 4, 3)
)

# Integrate the two datasets using full_join by the "Title" column
integrated_data <- full_join(finna_data, other_data, by = "Title")

# Print the integrated dataset
print(integrated_data)
## # A tibble: 7 × 10
##   id          Title Author Year  Language Formats Subjects Library Series Rating
##   <chr>       <chr> <chr>  <chr> <chr>    <chr>   <chr>    <chr>   <chr>   <dbl>
## 1 lapinkirja… Sibe… Sibel… 2001  NA       Äänite… orkeste… Lapin … NA         NA
## 2 anders.429… SIBE… TAWAS… 1997  fin      Kirja,… SIBELIUS Anders… NA         NA
## 3 anders.149… Sibe… Tawas… 1997  fin      Kirja,… Sibeliu… Anders… NA         NA
## 4 helka.9916… Sibe… Lampi… 1984  fin      Kirja,… Sibeliu… Helka-… NA         NA
## 5 NA          Sibe… NA     NA    NA       NA      NA       NA      NA          5
## 6 NA          Finl… NA     NA    NA       NA      NA       NA      NA          4
## 7 NA          Vals… NA     NA    NA       NA      NA       NA      NA          3

Analyze using analyze_metadata() Function

sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
analysis_results <- analyze_metadata(refined_data)
print(analysis_results)
## $format_distribution
## # A tibble: 11 × 2
##    Formats                            n
##    <chr>                          <int>
##  1 Kirja, Kirja                      55
##  2 Lehti/Artikkeli, Artikkeli        19
##  3 Äänite, CD                        11
##  4 Video, DVD                         4
##  5 Äänite, Äänilevy                   4
##  6 Video, Elokuva, lyhyt              2
##  7 Arkisto/Kokoelma, Arkistosarja     1
##  8 Taideteos, Taideteos               1
##  9 Taideteos, Veistos                 1
## 10 Video, Elokuva, pitkä              1
## 11 Äänite, Musiikkitallenne           1
## 
## $year_distribution
## # A tibble: 38 × 2
##    Year             n
##    <chr>        <int>
##  1 1997             9
##  2 1948             8
##  3 1999             8
##  4 2003             7
##  5 1945             6
##  6 1998             6
##  7 1968             5
##  8 Unknown Year     4
##  9 1949             3
## 10 1984             3
## # ℹ 28 more rows
## 
## $author_distribution
## # A tibble: 51 × 2
##    Author                                          n
##    <chr>                                       <int>
##  1 Häyrynen, Antti                                12
##  2 Sibelius, Jean                                 10
##  3 Layton, Robert                                  5
##  4 Downes, Olin, Sjöblom, Paul, Jalas, Jussi       4
##  5 Ringbom, Nils-Eric                              4
##  6 Pickenhayn, Jorge Oscar                         3
##  7 Schouwman, Hans                                 3
##  8 Tawaststjerna, Erik                             3
##  9 Tawaststjerna, Erik, Tawaststjerna, Erik T.     3
## 10 Valsta, Heikki                                  3
## # ℹ 41 more rows

1. Applying the visualize_year_distribution() Function

sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
analysis_results <- analyze_metadata(refined_data)
visualize_year_distribution(analysis_results$year_distribution)

1.1 Line plot of yearly distribution

library(finna)
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
visualize_year_distribution_line(refined_data)
## Warning: There was 1 warning in `mutate()`.
##  In argument: `Year = as.numeric(Year)`.
## Caused by warning:
## ! NAs introduced by coercion

2. Applying the visualize_top_20_titles() Function

This function will visualize the top 20 titles from your dataset.

# Assuming you have a tibble with Finna metadata called `refined_data`
top_20_titles_plot <- visualize_top_20_titles(refined_data)

# To display the plot
print(top_20_titles_plot)

2.1 Visualize Heatmap of Titles by Year

library(finna)
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
visualize_title_year_heatmap(refined_data)

3. Applying the visualize_format_distribution() Function

This function visualizes the distribution of the records by format.

# Plot the format distribution
format_distribution_plot <- visualize_format_distribution(refined_data)

# To display the plot
print(format_distribution_plot)

### 3.1 Visualize Format Distribution as Pie Chart

library(finna)
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
visualize_format_distribution_pie(refined_data)

4. Applying the visualize_library_distribution() Function

This function shows the distribution of the records by library.

# Plot the library distribution
library_distribution_plot <- visualize_library_distribution(refined_data)

# To display the plot
print(library_distribution_plot)

### 4.1 Visualize Correlation Between Formats and Libraries

This function shows the distribution of the records by library.

library(finna)
sibelius_data <- search_finna("sibelius")
refined_data <- refine_metadata(sibelius_data)
visualize_format_library_correlation(refined_data)

5. Applying the visualize_author_distribution() Function

This function visualizes the distribution of the records by author.

# Plot the author distribution
author_distribution_plot <- visualize_author_distribution(refined_data)

# To display the plot
print(author_distribution_plot)

6. Applying the visualize_subject_distribution() Function

This function visualizes the distribution of the records by subject.

# Plot the subject distribution
subject_distribution_plot <- visualize_subject_distribution(refined_data)

# To display the plot
print(subject_distribution_plot)

### 6.1 Visualize Word Cloud of Titles or Subjects

This function visualizes the distribution of the records by subject.

music_data <- search_finna("music")
refined_data <- refine_metadata(music_data)
visualize_word_cloud(refined_data, "Title")
## Warning in tm_map.SimpleCorpus(corpus, content_transformer(tolower)):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(corpus, removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(corpus, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(corpus, stripWhitespace): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(corpus, removeWords, c(finnish_stopwords, :
## transformation drops documents