R made personal (at least for swedes)!


– Who are you? asked Mr Doe.

– I’m a Hindu! Namrata from India replied.

– I’m a statistician! said Günther from Germany.

People of different nationalities tend to identify themselves using different characteristics. In India, your identity might rely on your religion, while in other countries your profession might take its place. In Sweden, you might identify yourself with your almost-world-known (!?) personal identification number (“pin”). This 10 digit number is given to you almost immediately after birth and it often stays with you until your very last breath. The number is similar to a “social security number” but it has a much broader use and it is considered public. It is used in public registers (for education, work, tax payment, healthcare, car ownership etc) and it often serves as a membership number or customer id within companies and member unions. It is also essential for example in the public health and quality registers maintained in Sweden (and other Scandinavian countries) and used for reaserch.


Naturally, the “pin” is used extensively to distinguish individuals in data sets analysed by R. The number also helps to match data from different sources and it can bring some demographic background data into the bargain, such as birth date (age), sex and geographic origin (depending on your birth year).

Up until now however, with the lack of a consistent R convention to handle “pins”, the number might be treated as either a 10 or 12 digit numeric (with or without century prefix), a character (with hyphen or a ‘+’-sign to distinguish birth date from suffix numbers) or as a factor variable. But the pin is not a number (to add, subtract or logarithm pins is just nonsense) and it contains more information than captured by the individual characters in a string. Luckily, the new R package sweidnumbr (released on CRAN) is here for rescue!


Let’s look at some data (all pins are fake; they have a valid syntax but do not identify any real individuals):

## Error in library(sweidnumbr): there is no package called 'sweidnumbr'
## Error in tail(fake_pins, 10): object 'fake_pins' not found

So far, pin is just a standard character vector but let’s change that to benefit from all of sweidnumbr’s features:

pin <-$pin)
## Error in$pin): could not find function ""
## Error in str(pin): object 'pin' not found

We can now also investigate some demographic characteristics almost on the fly (note that pins contained geographical information only up to 1989):

par(mfrow = c(1,2))
hist(pin_age(pin), 20, col = "lightgreen", main = "Age distribution")
## Error in pin_age(pin): could not find function "pin_age"
pie(table(pin_sex(pin)), main = "Sex distribution")
## Error in pin_sex(pin): could not find function "pin_sex"
## Error in pin_birthplace(pin[1:8]): could not find function "pin_birthplace"

Formats can recognize pins in several different formats such as:"191212121212", "1212121212", "121212-1212", "121212+1212"))
## Error in"191212121212", "1212121212", "121212-1212", "121212+1212")): could not find function ""

It also checks that the numbers follow the correct pin syntax:"181212121212") # Pins were introduced in 1946 and only for people not deceased before that
## Error in"181212121212"): could not find function ""
pin_ctrl("191212121211") # The last digit is a control number that is checked against preceeding digits
## Error in pin_ctrl("191212121211"): could not find function "pin_ctrl"
luhn_algo("191212121211") # The correct control number can be calculated by the Luhn algorithm
## Error in luhn_algo("191212121211"): could not find function "luhn_algo"

Organisational numbers

Not only individual has their personal identification number, so do companies and NGO:s. These features are covered by the oin group of functions in the package. Feel free to try them out …

Other countries

An analogous conversion function is availale for the Finnish social security numbers in the sorvi package.

Keep in touch!

… and feel free to suggest enhancements and report bugs to