This function scrapes the data.gov.in search results and returns most of the information available for the datasets. As this function doesn't use API and just parses the web pages, there needs to delay between successive requests, and there should be limits to the number of pages that the function downloads from the web. For a particular search input, there may be multiple pages of search results. Each result page contains a list of catalogs. And each catalog contains multiple pages, with each page containing a list of data sets. There are default limits at each one of these stages. Make them 'Inf' if you need to get all the results or if you don't expect a large number of results. Please refer to vignette for a detailed overview.
Usage
search_for_datasets(
search_terms,
limit_catalog_pages = 5L,
limit_catalogs = 10L,
return_catalog_list = FALSE,
limit_dataset_pages = 5L,
limit_datasets = 10L
)
Arguments
- search_terms
Either one string with multiple words separated by space, or a character vector with all the search terms
- limit_catalog_pages
Number of pages of search results to request. Default is 5. Set to Inf to get all.
- limit_catalogs
Number of catalogs that the function should parse to get the data sets. Default is 5. Set to Inf to get all.
- return_catalog_list
Default is FALSE. If TRUE, the function will not look for data sets, and will only return the list of catalogs found.
- limit_dataset_pages
Limit the number of pages that should be requested and parsed, to acquire the datasets. Default is 5. Set to Inf to request all.
- limit_datasets
Request more pages until the number of datasets obtained reaches this limit. Default is 10. Set to Inf to request all.
Examples
if (FALSE) { # \dontrun{
# Basic Use:
search_for_datasets('train usage')
# Advanced Use, specifying additional parameters
search_for_datasets(search_terms = c('state', 'gdp'),
limit_catalog_pages = 1,
limit_catalogs = 3,
limit_dataset_pages = 2)
search_for_datasets(search_terms = c('state', 'gdp'),
limit_catalog_pages = 2,
return_catalog_list = TRUE)
} # }