– Who are you? asked Mr Doe.
– I’m a Hindu! Namrata from India replied.
– I’m a statistician! said Günther from Germany.
People of different nationalities tend to identify themselves using different characteristics. In India, your identity might rely on your religion, while in other countries your profession might take its place. In Sweden, you might identify yourself with your almost-world-known (!?) personal identification number (“pin”). This 10 digit number is given to you almost immediately after birth and it often stays with you until your very last breath. The number is similar to a “social security number” but it has a much broader use and it is considered public. It is used in public registers (for education, work, tax payment, healthcare, car ownership etc) and it often serves as a membership number or customer id within companies and member unions. It is also essential for example in the public health and quality registers maintained in Sweden (and other Scandinavian countries) and used for reaserch.
Naturally, the “pin” is used extensively to distinguish individuals in data sets analysed by R. The number also helps to match data from different sources and it can bring some demographic background data into the bargain, such as birth date (age), sex and geographic origin (depending on your birth year).
Up until now however, with the lack of a consistent R convention to handle “pins”, the number might be treated as either a 10 or 12 digit numeric (with or without century prefix), a character (with hyphen or a ‘+’-sign to distinguish birth date from suffix numbers) or as a factor variable. But the pin is not a number (to add, subtract or logarithm pins is just nonsense) and it contains more information than captured by the individual characters in a string. Luckily, the new R package
sweidnumbr (released on CRAN) is here for rescue!
Let’s look at some data (all pins are fake; they have a valid syntax but do not identify any real individuals):
|54||19440311-1131||NOBLESSE, RAGNAR JOHN|
|58||20050111-1123||MINT, MARIA ADA|
So far, pin is just a standard character vector but let’s change that to benefit from all of
We can now also investigate some demographic characteristics almost on the fly (note that pins contained geographical information only up to 1989):
as.pin can recognize pins in several different formats such as:
It also checks that the numbers follow the correct pin syntax:
Not only individual has their personal identification number, so do companies and NGO:s. These features are covered by the oin group of functions in the package. Feel free to try them out …
An analogous conversion function is availale for the Finnish social security numbers in the sorvi package.
Keep in touch!
… and feel free to suggest enhancements and report bugs to https://github.com/rOpenGov/sweidnumbr/issues