By

Digital humanities with R

Digital humanities is one of the fields where reproducible research is now becoming increasingly popular. We got an opportunity to highlight the latest advancement in this field in Paris Digital Humanities event, June 12 at Deutsches Historische Institut Paris.

The ‘digital humanities lab’ is an interactive experiment where the audience actively participates in designing the analysis of massive library catalogues. In particular, we focused on publishing activity in the early modern period by mining the British Library ESTC catalogue 1470-1800. To facilitate this interactive session, we prepared reproducible presentation slides with Rmarkdown. To be completed in the workshop, carrying out data analysis on-the-fly together with the audience. To reproduce the preliminary slides (we will complete them during the workshop!), clone the rOpenGov slide repository and run the following commands in R:

library(rmarkdown)
render("slides/20150612-Paris/20150611-Paris.Rmd")

Unfortunately the ESTC data itself is not public, so we could not share it. The slides and source code are fully reproducible, however, so you can modify the template for your own purposes and check out our summaries of the ESTC data collection at our estc site. You may also like to read the related excellent blog post of Douglas Duhaime on the same data set.

We used a combination of RStudio and the estc and bibliographica R packages that are designed for bibliographic data analysis. Combined with the vast analytical capabilities of the R statistical ecosystem, these custom tools for digital humanities provide a rapid development toolkit for reproducible research of historical document collections.