Introduction

This package provides easy access to the API provided by Open Government Data Platform - India to download datasets from R. Here’s the list of Datasets available through API.

Basic Usage

library(ogdindiar)

## Welcome to ogdindiar

When calling OGD India API, at minimum, you need to provide 2 parameters.

res_id: Resource id of the dataset you want to access
api_key: Your personal API key (See Package Installation instructions)

Resource id for the datasets can be found on the data specific page on the data portal. For example, this page has the resource id information for Annual And Seasonal Mean Temperature Of India. The resource is a string that’s part of Datastore API URL.

The URL for this dataset as shown on the page is - https://data.gov.in/api/datastore/resource.json?resource_id=98fe9271-a59d-4834-b05b-fd5ddb94ac01&api-key=OGDINDIA_API_KEY
The resource id is the highlighted part in the above URL.

The main function this package provides is fetch_data(). Once you have figured out the resource id, you can download that dataset as follows:

mean_temp_data = fetch_data(res_id = "98fe9271-a59d-4834-b05b-fd5ddb94ac01")

This function returns a list of 2 elements.

The first element is the data

knitr::kable(head(mean_temp_data[[1]]))

id	timestamp	year	annual	jan_feb	mar_may	jun_sep	oct_dec
1123	1424778424	1957	23	18	25	27	21
1423	1424778424	1972	24	18	25	27	21
1443	1424778424	1973	24	19	26	27	21
1463	1424778424	1974	24	18	26	27	21
1483	1424778424	1975	23	18	25	26	21
1503	1424778424	1976	24	18	25	26	22

The second element is a dataframe containing metadata about the columns.

knitr::kable(mean_temp_data[[2]])

.id	type	size	unsigned	not null	description
id	serial	normal	TRUE	TRUE
timestamp	int	normal	TRUE	FALSE	The Unix timestamp for the data.
year	int	normal	NA	FALSE
annual	int	normal	NA	FALSE
jan_feb	int	normal	NA	FALSE
mar_may	int	normal	NA	FALSE
jun_sep	int	normal	NA	FALSE
oct_dec	int	normal	NA	FALSE

Advanced Usage

Instead of downloading entire datasets you can conditionally download specific data elements. This functionality is achieved using additional arguments to fetch_data() function. Currently you can use -

filter to filter the dataset using equality constraints on specific columns.
select to select specific set of columns to be downloaded.
sort to sort the resulting dataset based on multiple columns.

Following example illustrates this -

mean_temp_25 = fetch_data(res_id = "98fe9271-a59d-4834-b05b-fd5ddb94ac01",
                        filter = c("annual" = "25"),
                        select = c("year", "annual", "jan_feb", "mar_may", "jun_sep", "oct_dec"),
                        sort = c("jan_feb" = "asc", "mar_may" = "desc")
                        )

The returned dataset -

knitr::kable(head(mean_temp_25[[1]]))

year	annual	jan_feb	mar_may	jun_sep	oct_dec
2002	25	19	27	27	22
2010	25	20	27	27	22
1995	25	20	26	28	23
2009	25	20	26	27	22
2006	25	21	26	27	22

Metadata about the returned dataset

knitr::kable(mean_temp_25[[2]])

.id	type	size	not null
year	int	normal	FALSE
annual	int	normal	FALSE
jan_feb	int	normal	FALSE
mar_may	int	normal	FALSE
jun_sep	int	normal	FALSE
oct_dec	int	normal	FALSE

There’s one more argument that is passed to fetch_data() function, field_type_correction. The data fetch process inadvertently treats all the columns as character. The default setting field_type_correction = TRUE converts these columns back to numeric type based on accompanying metadata.

Introduction

Dhrumin Shah

2015-08-22

Introduction

Basic Usage

Advanced Usage