[Experimental]

These experimental functions provide a minimal interface to the ABS.Stat API.

More information on the ABS.Stat API can be found on the ABS website

Note that an ABS.Stat 'dataflow' is like a table. A 'datastructure' contains metadata that describes the variables in the dataflow. To load data from the ABS.Stat API, you need to either:

  • Using read_api_dataflows() you can get information on the available dataflows

  • Using read_api_datastructure() you can get metadata relating to a specific dataflow, including the variables available in each dataflow

  • Using read_api() you can get the data belonging to a given dataflow.

  • Using read_api_url() you can get the data for a given query url generated using the online data viewer.

read_api_dataflows()

read_api(
  id,
  datakey = NULL,
  start_period = NULL,
  end_period = NULL,
  version = NULL
)

read_api_url(url)

read_api_datastructure(id)

Arguments

id

A dataflow id. Use read_api_dataflows() to obtain a dataframe listing available dataflows.

datakey

A named list matching filter variables to codes. All variables with a position in the datastructure are filterable. Use read_api_datastructure() to obtain information about the variables in a dataflow and the values of that variable.

start_period

The start period (used to filter by time). This is inclusive. The supported formats are:

  • "YYYY" for annual data (e.g. 2019)

  • "YYYY-S[1-2]" for semi-annual data (e.g. 2019-S1)

  • "YYYY-Q[1-4]" for quarterly data (e.g. 2019-Q1)

  • "YYYY-MM[01-12]" for monthly data (e.g. 2019-01)

  • "YYYY-W[01-53]" for weekly data (e.g. 2019-W01)

  • "YYYY-MM-DD" for daily and business data (e.g. 2019-01-01)

end_period

The end period (used to filter on time). This is inclusive. The supported formats are the same as for start_period

version

A version number, if unspecified the latest version of the dataset is used. Use read_api_dataflows() to see available dataflow versions.

url

A complete query url

Value

A data.frame

Details

Note that the API enforces a reasonably strict gateway timeout policy. This means that, if you're trying to access a reasonably large dataset, you will need to filter it on the server side using the datakey. You might like to review the data manually via the ABS website to figure out what subset of the data you require.

Note, furthermore, that the datastructure contains a complete codebook for the variables appearing in the relevant dataflow. Since some variables are shared across multiple dataflows, this means that the datastructure corresponding to a particular id may contain values for a given variable which are not in the corresponding dataflow.

Examples

if (FALSE) { # \dontrun{
# List available dataflows
read_api_dataflows()

# Say we want the "Estimated resident population, Country of birth"
# data flow, with the id ERP_COB. We load the data like this:
# Get full data set for a given flow by providing id and start period:
read_api("ERP_COB", start_period = 2020)

# In some cases, loading a whole dataflow (as above) won't work.
# For eg., the `ABS_C16_T10_SA` dataflow is very large,
# so the gateway will timeout if we try to collect the full data set
try(read_api("ABS_C16_T10_SA"))

# We need to filter the dataflow before downlaoding it.
# To figure out how to filter it, we get metadata ('datastructure').
ds <- read_api_datastructure("ABS_C16_T10_SA")

# The `asgs_2016` code for 'Australia' is 0
ds[ds$var == "asgs_2016" & ds$label == "Australia", ]

# The `sex_abs` code for 'Persons' (i.e. all persons) is 3
ds[ds$var == "sex_abs" & ds$label == "Persons", ]

# So we have:
x <- read_api("ABS_C16_T10_SA", datakey = list(asgs_2016 = 0, sex_abs = 3))
unique(x["asgs_2016"]) # Confirming only 'Australia' level records came through
unique(x["sex_abs"]) # Confirming only 'Persons' level records came through

# Please note however that not all values in the datastructure necessarily
# appear in the data. You get 404s in this case
ds[ds$var == "regiontype" & ds$label == "Destination Zones", ]
try(read_api("ABS_C16_T10_SA", datakey = list(regiontype = "DZN")))

# If you already have a query url, then use `read_api_url()`
wpi_url <- "https://api.data.abs.gov.au/data/ABS,WPI/all"
read_api_url(wpi_url)
} # }