These experimental functions provide a minimal interface to the ABS.Stat API.
More information on the ABS.Stat API can be found on the ABS website
Note that an ABS.Stat 'dataflow' is like a table. A 'datastructure' contains metadata that describes the variables in the dataflow. To load data from the ABS.Stat API, you need to either:
Using read_api_dataflows()
you can get information on the available dataflows
Using read_api_datastructure()
you can get metadata relating to a
specific dataflow, including the variables available in each dataflow
Using read_api()
you can get the data belonging to a given dataflow.
Using read_api_url()
you can get the data for a given query url
generated using the online data viewer.
read_api_dataflows()
read_api(
id,
datakey = NULL,
start_period = NULL,
end_period = NULL,
version = NULL
)
read_api_url(url)
read_api_datastructure(id)
A dataflow id. Use read_api_dataflows()
to obtain a dataframe
listing available dataflows.
A named list matching filter variables to codes. All variables
with a position
in the datastructure are filterable. Use
read_api_datastructure()
to obtain information about the variables in
a dataflow and the values of that variable.
The start period (used to filter by time). This is inclusive. The supported formats are:
"YYYY"
for annual data (e.g. 2019)
"YYYY-S[1-2]"
for semi-annual data (e.g. 2019-S1)
"YYYY-Q[1-4]"
for quarterly data (e.g. 2019-Q1)
"YYYY-MM[01-12]"
for monthly data (e.g. 2019-01)
"YYYY-W[01-53]"
for weekly data (e.g. 2019-W01)
"YYYY-MM-DD"
for daily and business data (e.g. 2019-01-01)
The end period (used to filter on time). This is inclusive.
The supported formats are the same as for start_period
A version number, if unspecified the latest version of the
dataset is used. Use read_api_dataflows()
to see
available dataflow versions.
A complete query url
A data.frame
Note that the API enforces a reasonably strict gateway timeout policy. This
means that, if you're trying to access a reasonably large dataset, you will
need to filter it on the server side using the datakey
. You might like to
review the data manually via the ABS website
to figure out what subset of the data you require.
Note, furthermore, that the datastructure contains a complete codebook for
the variables appearing in the relevant dataflow. Since some variables are
shared across multiple dataflows, this means that the datastructure
corresponding to a particular id
may contain values for a given variable
which are not in the corresponding dataflow.
if (FALSE) { # \dontrun{
# List available dataflows
read_api_dataflows()
# Say we want the "Estimated resident population, Country of birth"
# data flow, with the id ERP_COB. We load the data like this:
# Get full data set for a given flow by providing id and start period:
read_api("ERP_COB", start_period = 2020)
# In some cases, loading a whole dataflow (as above) won't work.
# For eg., the `ABS_C16_T10_SA` dataflow is very large,
# so the gateway will timeout if we try to collect the full data set
try(read_api("ABS_C16_T10_SA"))
# We need to filter the dataflow before downlaoding it.
# To figure out how to filter it, we get metadata ('datastructure').
ds <- read_api_datastructure("ABS_C16_T10_SA")
# The `asgs_2016` code for 'Australia' is 0
ds[ds$var == "asgs_2016" & ds$label == "Australia", ]
# The `sex_abs` code for 'Persons' (i.e. all persons) is 3
ds[ds$var == "sex_abs" & ds$label == "Persons", ]
# So we have:
x <- read_api("ABS_C16_T10_SA", datakey = list(asgs_2016 = 0, sex_abs = 3))
unique(x["asgs_2016"]) # Confirming only 'Australia' level records came through
unique(x["sex_abs"]) # Confirming only 'Persons' level records came through
# Please note however that not all values in the datastructure necessarily
# appear in the data. You get 404s in this case
ds[ds$var == "regiontype" & ds$label == "Destination Zones", ]
try(read_api("ABS_C16_T10_SA", datakey = list(regiontype = "DZN")))
# If you already have a query url, then use `read_api_url()`
wpi_url <- "https://api.data.abs.gov.au/data/ABS,WPI/all"
read_api_url(wpi_url)
} # }