## ── Attaching packages ──────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0       ✔ purrr   0.3.1  
## ✔ tibble  2.0.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.3       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Introduction

A Census is conducted every five years by the Australian Bureau of Statistics. In the years 2001 and 2016 both a federal election and Census occur, but in the other election years (2004, 2007, 2010 and 2013) there is no Census to directly match, so Census data from neighbouring years must be used in any modelling. This vignette documents how to impute Census data for the desired election year, which involves interpolating between the neighbouring Censuses.

We impute Census data for the electoral divisions in the 2013 federal election. Maps of 2013 and 2016 electoral divisions are obtained from the Australian Electoral Commission http://www.aec.gov.au/Electorates/gis/gis_datadownload.htm, and the map of divisions in place at the time of the 2011 Census is from the Australian Bureau of Statistics https://datapacks.censusdata.abs.gov.au/datapacks/.

The Australian Electoral Commission shifts the electoral boundaries regularly, so the electoral divisions in place in the 2013 election may not match those in 2016, nor those in 2011. This means that obtaining Census information for a particular electoral division, we are not necessarily able to directly interpolate between the electoral Census profiles in neighbouring Censuses.

To account for these boundary shifts, we use a spatio-temporal algorithm to estimate Census information about each electorate, at the time of the election of interest. Electoral boundaries are superimposed onto each of the neighbouring Censuses, in order to estimate Census characteristics for each of those years. By linearly interpolating between these time points, we get an estimate for the election year of interest.

To illustrate this algorithm, consider the example of the Hume electoral division in the 2013 federal election.

Example: Hume in the 2013 election

To illustrate the spatio-temporal algorithm, consider the imputation of a socio-demographic variable for the electorate of Hume in New South Wales (NSW), at the time of the 2013 federal election. The figure below shows this region amongst other NSW electorates.

Some of the electoral boundaries in NSW for 2013, with the electoral boundary for Hume, shown in purple.

Some of the electoral boundaries in NSW for 2013, with the electoral boundary for Hume, shown in purple.

The Censuses neighbouring the 2013 election are those in 2011 and 2016. By plotting the Hume boundary (purple) in the 2013 election over the divisions in 2016, we see that its boundary has changed.

Census division boundaries in NSW for 2016, with the 2013 electoral boundary for Hume, shown in purple. The purple region is not contained within a single Census division.

Census division boundaries in NSW for 2016, with the 2013 electoral boundary for Hume, shown in purple. The purple region is not contained within a single Census division.

Our aim is to impute Census information for this purple region.

There are many divisions in 2016 that intersect with the purple region (Hume boundary for 2013), these include the divisions of Riverina, Eden-Monaro and Hume, along with smaller intersecting areas with Fenner, Calare, Gilmore and Whitlam.

For each 2016 division that intersects with the purple region, we calculate the percentage of its area is consumed by the purple. The population in this region can then be estimated by assuming evenly distributed populations over the space.

Electoral division (2016) Percentage Population in Division Estimated Population Allocated to Purple Region
Hume 90.58% 150643 136458
Riverina 24.89% 155793 38780
Eden-Monaro 9.96% 147532 14691
Whitlam 0.55% 152280 844
Calare 0.39% 161298 633
Fenner 0.33% 202955 683
Gilmore 0.03% 150436 49

This is done for each of the 2013 electoral divisions using the mapping_fn function, which computes the composition of the electoral divisions in terms of the divisions in place at Census time. Note that shapefiles need to be loaded through load_shapefile to be in the right format to be passed through mapping_fn. Populations and other Census data is estimated using weighted_avg_census.

Applying function to all electorates in the 2013 election

aec_sF is the shapefile containing the polygons associated with electoral boundaries for which Census information is to be imputed, and abs_sF contains polygons that match those in place at the time of the Census.

sF_13 <- load_shapefile("path-to-your-shapefiles/national-midmif-16122011/COM20111216_ELB.MIF")
sF_16 <- load_shapefile("path-to-your-shapefiles/national-midmif-09052016/COM_ELB.TAB")
mapping_2016 <- mapping_fn(aec_sF = sF_13, abs_sF = sF_16)

The resultant object is a data.frame which contains information about the intersecting areas, and the variable Percent_Census_Composition is the percentage of the Census division population that will be attributed to the electoral division.

The next step is to estimate Census data with a weighted average of these intersecting populations, for each electoral division. This is done using weighted_avg_census.

imputed_data_2016 <- weighted_avg_census(mapping_df = mapping_2016, abs_df = abs2016)

We have now estimated 2016 Census information for the electoral boundaries of interest (2013 election). The steps above are repeated with the 2011 Census.

# Load 2011 boundaries
sF_11 <- load_shapefile("/Users/Jeremy/Documents/R/eechidna/data-raw/Shapefiles/2011_CED_shape/CED_2011_AUST.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/Jeremy/Documents/R/eechidna/data-raw/Shapefiles/2011_CED_shape/CED_2011_AUST.shp", layer: "CED_2011_AUST"
## with 168 features
## It has 3 fields
# Mapping
mapping_2011 <- mapping_fn(aec_sF = sF_13, abs_sF = sF_11)

# Weighted average
imputed_data_2011 <- weighted_avg_census(mapping_df = mapping_2011, abs_df = abs2011)

Then we can linearly interpolate between 2011 and 2016 to arrive at our final estimate of Census data for the electorates in place at the 2013 federal election. This involves using inverse distance weighting (power of 1).

DivisionNm Age00_04 Age05_14 Age15_19 BachelorAbv MedianPersonalIncome Owned NoReligion
ADELAIDE 5.238255 9.509769 5.834180 32.096012 457.7773 28.41219 30.84079
ASTON 5.768565 12.496998 7.152311 21.434418 451.1648 35.85664 27.13446
BALLARAT 6.487517 12.955704 6.914614 16.460032 392.9592 34.29433 31.20174
BANKS 6.161189 11.673810 6.112022 24.201877 416.1158 34.10598 21.11699
BARKER 5.963957 12.941691 6.242635 8.215222 376.8047 35.89354 30.30135
BARTON 6.310199 10.776653 5.323435 24.429438 431.5335 34.46361 18.10582

Summary

We have demonstrated how to use the functions needed to impute Census information for 2004, 2007, 2010 and 2013 elections. For future elections, the same functions can be used, and rather than interpolating over time, simply use the previous Census, until another Census is made available.