This package provides a generic implimentation of the iterative proportional fitting algorithm or IPF in the ipf()
function. It also provides an iterative proportional updating algorithm based on on the paper from Arizona State University (IPU) for balancing household- and person-level marginals in the ipu()
function.
library(dplyr)
library(tidyr)
library(ipfr)
Iterative proportional updating is a method developed by Arizona State University that allows the IPF procedure to match household- and person-level marginals. In the basic IPF procedure, all marginal distributions must describe the same thing (e.g. households). IPU allows you to say, for example, that a zone needs a total household count of 500, but also needs 800 people.
This example creates a random seed table and target values to illustrate how the package is used. The targets are specified for two separate geographies (geo_clusters
). Any field name can be used as long as it:
This simple example only has one target marginal distribution, and could be solved directly without ipu
. However, it is designed to show the basics needed to run the function.
The seed table is the starting point for the IPF procedure. In this example, we make up some survey data.
pid
(“primary ID”)geo_taz
)
hh_seed <- tribble(
~pid, ~siz, ~inc, ~weight, ~geo_taz,
1, 1, 1, 12, 1,
2, 1, 2, 3, 1,
3, 2, 1, 6, 1,
4, 2, 2, 5, 1
)
The number of households by size (e.g., 1-person, 2-person, etc.) is referred to as a marginal distribution. Often, from the Census, we know the total number of households by each individual marginal. This information becomes the target that the IPU process tries to match.
Marginal targets are specified below for each taz:
geo_taz
matches the seed table.siz
and inc
columns of the seed.siz
and inc
columns.hh_targets <- list()
hh_targets$siz <- tribble(
~geo_taz, ~`1`, ~`2`,
1, 18, 12
)
hh_targets$inc <- tribble(
~geo_taz, ~`1`, ~`2`,
1, 20, 10
)
hh_targets
## $siz
## # A tibble: 1 x 3
## geo_taz `1` `2`
## <dbl> <dbl> <dbl>
## 1 1 18 12
##
## $inc
## # A tibble: 1 x 3
## geo_taz `1` `2`
## <dbl> <dbl> <dbl>
## 1 1 20 10
result <- ipu(hh_seed, hh_targets)
ipu()
returns a named list.
names(result)
## [1] "weight_tbl" "weight_dist" "primary_comp"
The first element is the resulting weight table. It is the primary seed table with three columns added:
result$weight_tbl
## # A tibble: 4 x 7
## pid siz inc weight geo_taz avg_weight weight_factor
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 12 1 7.5 1.6
## 2 2 1 2 6 1 7.5 0.8
## 3 3 2 1 8 1 7.5 1.07
## 4 4 2 2 4 1 7.5 0.533
The second element is a histogram of the weight_factor
. This provides a quick overview of the distribution of weights.
result$weight_dist
The next element is a comparison back to the targets provided. With complex seed and target tables, this makes investigating results quick and easy.
result$primary_comp
## # A tibble: 4 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_taz_1 inc_1 20 20 0 0
## 2 geo_taz_1 inc_2 10 10 0 0
## 3 geo_taz_1 siz_1 18 18 0 0
## 4 geo_taz_1 siz_2 12 12 0 0
If secondary targets are provided to ipu()
, a fourth item in the list will contain a secondary_comp
table.
In addition to making sure the marginal targets are matched, it is important to ensure that the underlying distribution of households still resembles the seed data. As an example, if your seed data says that most low-income households are also one-person households, that information should be preserved.
hh_seed %>%
mutate(inc = paste0("inc", inc)) %>%
filter(geo_taz == 1) %>%
select(siz, inc, weight) %>%
spread(inc, weight)
## # A tibble: 2 x 3
## siz inc1 inc2
## <dbl> <dbl> <dbl>
## 1 1 12 3
## 2 2 6 5
result$weight_tbl %>%
mutate(inc = paste0("inc", inc)) %>%
filter(geo_taz == 1) %>%
select(siz, inc, weight) %>%
spread(inc, weight)
## # A tibble: 2 x 3
## siz inc1 inc2
## <dbl> <dbl> <dbl>
## 1 1 12 6
## 2 2 8 4
In household survey expansion, it is common to want to control for certain features that describe households, (like size), while controlling for other attributes that describe people (like age). This is possible with the ipu()
function.
This example is taken directly from the Arizona paper on page 20: http://www.scag.ca.gov/Documents/PopulationSynthesizerPaper_TRB.pdf
In this example, household type could represent size (e.g. 1-person and 2-person households). Person type could represent age groups (e.g. under 18, between 18 and 50, and over 50).
The code block below re-creates the seed and target tables for both persons and households.
geo_region
)pid
field
pid
field in the persons seed table links to the household seedhh_seed <- data_frame(
geo_region = 1,
pid = c(1:8),
hhtype = c(1, 1, 1, 2, 2, 2, 2, 2)
)
per_seed <- data_frame(
pid = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 7, 7, 7, 7, 7, 8, 8),
pertype = c(1, 2, 3, 1, 3, 1, 1, 2, 1, 3, 3, 2, 2, 3, 1, 2, 1, 1, 2, 3, 3, 1, 2)
)
hh_targets <- list()
hh_targets$hhtype <- data_frame(
geo_region = 1,
`1` = 35,
`2` = 65
)
per_targets <- list()
per_targets$pertype <- data_frame(
geo_region = 1,
`1` = 91,
`2` = 65,
`3` = 104
)
In the interst of keeping vignette build time short, the ipu()
algorithm is only run for 30 iterations. After running for 400 or more iterations, the results match closely to those shown in the paper.
primary_seed
primary_target
secondary_seed
secondary_target
result <- ipu(hh_seed, hh_targets, per_seed, per_targets, max_iterations = 30)
The first table shows the result. The second table shows the primary comparison table. Since we added secondary seeds and targets, the output now contains a secondary comparison table. Feel free to run the code chunk above for 400 or more iterations and then look again.
result$weight_tbl %>%
mutate(weight = round(weight, 2))
## # A tibble: 8 x 6
## geo_region pid hhtype weight avg_weight weight_factor
## <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 3.77 12.5 0.301
## 2 1 2 1 23 12.5 1.84
## 3 1 3 1 6.5 12.5 0.520
## 4 1 4 2 25.7 12.5 2.06
## 5 1 5 2 17.4 12.5 1.39
## 6 1 6 2 7.26 12.5 0.581
## 7 1 7 2 4.21 12.5 0.337
## 8 1 8 2 7.26 12.5 0.581
result$primary_comp %>%
mutate(result = round(result, 2))
## # A tibble: 2 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 hhtype_1 35 33.3 -1.73 -4.94
## 2 geo_region_1 hhtype_2 65 61.8 -3.15 -4.84
result$secondary_comp %>%
mutate(result = round(result, 2))
## # A tibble: 3 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 pertype_1 91 88.4 -2.58 -2.83
## 2 geo_region_1 pertype_2 65 63.8 -1.16 -1.79
## 3 geo_region_1 pertype_3 104 104 0 0
ipu()
allows different geographies to be specified for different marginal tables. There are a few rules that make this possible, but in short, the geo field on each target table tells the algorithm which scale to constrain to.
All of the following rules are checked by the algorithm a warning message will show if one is violated.
pid
field.
To demonstrate, the Arizona example from example 1 is modified to add two different clusters for household controls but to still control the person targets at the regional level.
# Modifying example 1 for example 2
# Repeat the hh_seed to create cluster 1 and 2 households
hh_seed <- hh_seed %>%
rename(geo_cluster = geo_region)
hh_seed <- bind_rows(
hh_seed,
hh_seed %>%
mutate(geo_cluster = 2, pid = pid + 8)
)
hh_seed$geo_region = 1
hh_seed
## # A tibble: 16 x 4
## geo_cluster pid hhtype geo_region
## <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 1
## 2 1 2 1 1
## 3 1 3 1 1
## 4 1 4 2 1
## 5 1 5 2 1
## 6 1 6 2 1
## 7 1 7 2 1
## 8 1 8 2 1
## 9 2 9 1 1
## 10 2 10 1 1
## 11 2 11 1 1
## 12 2 12 2 1
## 13 2 13 2 1
## 14 2 14 2 1
## 15 2 15 2 1
## 16 2 16 2 1
# Repeat the household targets for two clusters
hh_targets$hhtype <- bind_rows(hh_targets$hhtype, hh_targets$hhtype)
hh_targets$hhtype <- hh_targets$hhtype %>%
rename(geo_cluster = geo_region) %>%
mutate(geo_cluster = c(1, 2))
hh_targets$hhtype
## # A tibble: 2 x 3
## geo_cluster `1` `2`
## <dbl> <dbl> <dbl>
## 1 1 35 65
## 2 2 35 65
# Repeat the per_seed to create cluster 1 and 2 persons
per_seed <- bind_rows(
per_seed,
per_seed %>%
mutate(pid = pid + 8)
)
per_seed %>%
head()
## # A tibble: 6 x 2
## pid pertype
## <dbl> <dbl>
## 1 1 1
## 2 1 2
## 3 1 3
## 4 2 1
## 5 2 3
## 6 3 1
# Double the regional person targets
per_targets$pertype <- per_targets$pertype %>%
mutate_at(
.vars = vars("1", "2", "3"),
.funs = funs(. * 2)
)
per_targets$pertype
## # A tibble: 1 x 4
## geo_region `1` `2` `3`
## <dbl> <dbl> <dbl> <dbl>
## 1 1 182 130 208
Run the IPU algorithm. Again, for vignette build time, only 30 iterations are performed. Run the code yourself with max_iterations
set to 600 to see the converged result.
result <- ipu(hh_seed, hh_targets, per_seed, per_targets, max_iterations = 30)
The tables below show the results compared back to targets. More iterations would make a better match.
result$primary_comp %>%
mutate(result = round(result, 2))
## # A tibble: 4 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_cluster_1 hhtype_1 35 33.3 -1.73 -4.94
## 2 geo_cluster_1 hhtype_2 65 61.8 -3.15 -4.84
## 3 geo_cluster_2 hhtype_1 35 33.3 -1.73 -4.94
## 4 geo_cluster_2 hhtype_2 65 61.8 -3.15 -4.84
result$secondary_comp %>%
mutate(result = round(result, 2))
## # A tibble: 3 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 pertype_1 182 177. -5.16 -2.83
## 2 geo_region_1 pertype_2 130 128. -2.33 -1.79
## 3 geo_region_1 pertype_3 208 208 0 0
This section will show how ipu()
addresses some common problems found in basic ipf procedures. It uses the example data from the first example.
IPF works by successively multiplying the table weights by factors. Cells with a zero weight cannot be modified by this process. As the number of zero weights increase, the flexibility of the process is reduced, and convergence becomes more difficult. ipfr
solves this problem by setting a minimum weight for all cells to .0001
. This minimum weight can be adjusted using the min_weight
parameter and should be arbitrarily small compared to your seed table weights.
Not every combination of marginal categories is required to be included in the seed table; however, at least one observation of each category must exist. For example, the combination:
may not have been observed in the survey, and thus may be missing from the seed table. As long as other combinations of size-1 households exist (e.g. with 0 workers and 1 vehicle), ipfr
will work fine. On the other hand, if there are no observations of any size-1 households, ipfr
will stop with an error message.
See the first IPU example to see how it works.
ipfr
handles two separate issues concerning marginal agreement:
A basic implementation of iterative proportional fitting requires that all targets agree on the total. For example, if the households by size target table has a total of 100 households, but the households by income table has a total of 120, both cannot be satisfied.
ipfr
handles this by scaling all tables in the same target list (either primary or secondary) to match the total of the first table.
In the example below, the size marginal sums to a total of 100 households. The vehicle marginal sums to 300. With the verbose
option set to TRUE
, a message will be displayed telling which, if any, target tables are scaled.
hh_seed <- data_frame(
geo_region = 1,
pid = c(1:8),
hhsiz = c(1, 1, 1, 2, 2, 2, 2, 2),
hhveh = c(0, 2, 1, 1, 1, 2, 1, 0)
)
hh_targets <- list()
hh_targets$hhsiz <- data_frame(
geo_region = 1,
`1` = 35,
`2` = 65
)
hh_targets$hhveh <- data_frame(
geo_region = 1,
`0` = 100,
`1` = 100,
`2` = 100
)
result <- ipu(hh_seed, hh_targets, max_iterations = 30, verbose = TRUE)
## Scaling target tables: hhveh
##
Finished iteration 2 . %RMSE = 9.044233
Finished iteration 3 . %RMSE = 0.4706742
Finished iteration 4 . %RMSE = 0.02385747
Finished iteration 5 . %RMSE = 0.001207629
## IPU converged
## All targets matched within the absolute_diff of 10
Importantly, the performance measures below compare the result to the scaled target not the original. Note that the vehicle targets have been scaled down.
result$primary_comp %>%
mutate_at(
.vars = vars(target, result),
.funs = funs(round(., 2))
)
## # A tibble: 5 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 hhsiz_1 35 35 0 0
## 2 geo_region_1 hhsiz_2 65 65 0 0
## 3 geo_region_1 hhveh_0 33.3 33.3 0 0
## 4 geo_region_1 hhveh_1 33.3 33.3 0 0
## 5 geo_region_1 hhveh_2 33.3 33.3 0 0
In population synthesis or survey expansion, adding a secondary set of person- level targets can lead to a different issue: target balance. Naturally, the total number of households and the total number of persons will be very different. A balance issue arises when the average weight for household records and person records are very different.
In the Arizona example, note that the average weights for household and person records are similar.
avg_hh_weight <- (rowSums(hh_targets$hhtype) - 1) / nrow(hh_seed)
avg_per_weight <- (rowSums(per_targets$pertype) - 1) / nrow(per_seed)
In real applications, this is often not true. The example below demonstrates the consequences by modifying the Arizona to double the person targets.
per_targets$pertype <- per_targets$pertype %>%
mutate_at(
.vars = vars(`1`, `2`, `3`),
.funs = funs(. * 2)
)
result <- ipu(hh_seed, hh_targets, per_seed, per_targets, max_iterations = 30)
The resulting weights tend towards the extreme as the algorithm attempts to match unbalanced primary and secondary targets. In effect, the algorithm is making a large shift to the basic persons-per-household metric found in the seed table. Households with mutiple people get large weights, while households with a single person get small weights.
result$weight_dist
ipu
can fix the underlying problem using the secondary_importance
argument. It is 1
by default, which means the algorithm will attempt to match the absolute values of the secondary targets (as above). As this value is decreased to 0, the secondary targets are scaled to match the average weight of the primary targets.
The examples below set secondary_importance
to 0.80
, 0.20
, and 0.00
to show the effect on results. With each decrease in importance, the match to person targets gets worse, but weight extremes are reduced.
result <- ipu(hh_seed, hh_targets, per_seed, per_targets, max_iterations = 30,
secondary_importance = .80)
result
## $weight_tbl
## # A tibble: 8 x 6
## geo_region pid hhtype weight avg_weight weight_factor
## <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 41.1 12.5 3.29
## 2 1 2 1 0.635 12.5 0.0508
## 3 1 3 1 1.90 12.5 0.152
## 4 1 4 2 1.12 12.5 0.0893
## 5 1 5 2 0.782 12.5 0.0626
## 6 1 6 2 3.35 12.5 0.268
## 7 1 7 2 72.3 12.5 5.78
## 8 1 8 2 3.35 12.5 0.268
##
## $weight_dist
##
## $primary_comp
## # A tibble: 2 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 hhtype_1 35 43.7 8.66 24.7
## 2 geo_region_1 hhtype_2 65 80.9 15.9 24.5
##
## $secondary_comp
## # A tibble: 3 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 pertype_1 182 198. 16.0 8.79
## 2 geo_region_1 pertype_2 130 124. -6.41 -4.93
## 3 geo_region_1 pertype_3 208 189. -18.6 -8.95
result <- ipu(hh_seed, hh_targets, per_seed, per_targets, max_iterations = 30,
secondary_importance = .20)
result
## $weight_tbl
## # A tibble: 8 x 6
## geo_region pid hhtype weight avg_weight weight_factor
## <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 19.6 12.5 1.57
## 2 1 2 1 12.7 12.5 1.02
## 3 1 3 1 3.30 12.5 0.264
## 4 1 4 2 17.4 12.5 1.39
## 5 1 5 2 13.0 12.5 1.04
## 6 1 6 2 4.50 12.5 0.360
## 7 1 7 2 26.7 12.5 2.14
## 8 1 8 2 4.50 12.5 0.360
##
## $weight_dist
##
## $primary_comp
## # A tibble: 2 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 hhtype_1 35 35.6 0.63 1.79
## 2 geo_region_1 hhtype_2 65 66.1 1.13 1.73
##
## $secondary_comp
## # A tibble: 3 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 pertype_1 182 119. -63.2 -34.7
## 2 geo_region_1 pertype_2 130 84.6 -45.4 -34.9
## 3 geo_region_1 pertype_3 208 134. -74.4 -35.8
result <- ipu(hh_seed, hh_targets, per_seed, per_targets, max_iterations = 30,
secondary_importance = 0)
result
## $weight_tbl
## # A tibble: 8 x 6
## geo_region pid hhtype weight avg_weight weight_factor
## <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 9.29 12.5 0.743
## 2 1 2 1 19.8 12.5 1.58
## 3 1 3 1 5.65 12.5 0.452
## 4 1 4 2 23.8 12.5 1.90
## 5 1 5 2 16.0 12.5 1.28
## 6 1 6 2 6.79 12.5 0.543
## 7 1 7 2 11.2 12.5 0.894
## 8 1 8 2 6.79 12.5 0.543
##
## $weight_dist
##
## $primary_comp
## # A tibble: 2 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 hhtype_1 35 34.7 -0.26 -0.75
## 2 geo_region_1 hhtype_2 65 64.5 -0.47 -0.73
##
## $secondary_comp
## # A tibble: 3 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 pertype_1 182 100. -81.9 -45.0
## 2 geo_region_1 pertype_2 130 71.6 -58.4 -44.9
## 3 geo_region_1 pertype_3 208 115 -93 -44.7
Often, it is preferable to constrain weights so that certain, under-sampled observations to do not end up with extreme weights. ipu()
supports this by using the min_ratio
and max_ratio
variables.
First, the average weight is calculated per geography based on the total of the target tables divided by the number of records in the seed table. Then, the max and min factors set a cap and floor based on a multiple of that average.
Common values to use are:
However, care should be taken when moving these variables from their default values. These variables impose another constraint on the algorithm and increase the chance of failure. In the example below, very strict values are used with the same seed and target data from IPU Example 1.
Values of 1.2 and .8 mean that all weights must be within 20% of the average weight.
hh_seed <- data_frame(
pid = c(1, 2, 3, 4),
siz = c(1, 2, 2, 1),
weight = c(1, 1, 1, 1),
geo_cluster = c(1, 1, 2, 2)
)
hh_targets <- list()
hh_targets$siz <- data_frame(
geo_cluster = c(1, 2),
`1` = c(75, 100),
`2` = c(25, 150)
)
result <- ipu(hh_seed, hh_targets, max_iterations = 10,
max_ratio = 1.2, min_ratio = .8)
Consider the effect on geo_cluster 1. With a total target of 100 households and two records in the seed table, the average weight is 50. This means that the weights must be between 40 and 60. The algorithm does not have enough flexibility to meet the controls.
result$primary_comp
## # A tibble: 4 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_cluster_1 siz_1 75 60 -15 -20
## 2 geo_cluster_1 siz_2 25 40 15 60
## 3 geo_cluster_2 siz_1 100 100 0 0
## 4 geo_cluster_2 siz_2 150 150 0 0
A second problem can arrise from capping weights based on the average weight. In the example below, I change the targets so that, for geo_cluster 1, they are very unbalanced. Cluster 1 now has 100,000 1-person households but only 5 2-person households.
hh_targets <- list()
hh_targets$siz <- data_frame(
geo_cluster = c(1, 2),
`1` = c(100000, 100),
`2` = c(5, 150)
)
result <- ipu(hh_seed, hh_targets, max_iterations = 10,
max_ratio = 5, min_ratio = .2)
result$primary_comp
## # A tibble: 4 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_cluster_1 siz_1 100000 100000 0 0
## 2 geo_cluster_1 siz_2 5 10000. 9996. 199910
## 3 geo_cluster_2 siz_1 100 100 0 0
## 4 geo_cluster_2 siz_2 150 150 0 0
Even with reasonable values for the weight caps, the minimum allowable weight is much higher than 5. This is an extreme example, and is unlikely to be an issue in applications related to housing and population - the targets are generally on the same scale. However, when expanding a through-trip table, it is common to have some external stations with large targets and others with small. In these cases, it is advisable to leave the scale arguments at their default values.
The function ipu_nr
only differs from ipu
in one significant way: the method used to balance primary and secondary targets.
As in the more detailed ipu
example above, we modify the Arizona example (which is balanced) to double the person targets. This creates a significant imbalance that standard approahces struggle with.
per_targets$pertype <- per_targets$pertype %>%
mutate_at(
.vars = vars(`1`, `2`, `3`),
.funs = funs(. * 2)
)
While ipu
balances the secondary targets directly using secondary_importance
, ipu_nr
uses an iterative approach and the target_priority
argument.
By default, all target tables have an equally high priority, which means that the algorithm will attempt to match all targets exactly. However, target_priority
can be modified in several ways. In the code below, a data frame is used to assign the hhtype
target a higher priority. (If using a data frame, the column names must be target
and priority
.) A simple named list can also be used (both options shown below).
# Option 1: a data frame
target_priority <- data_frame(
target = c("hhtype", "pertype"),
priority = c(10000, 10)
)
# Options 2: use a named list
target_priority <- list()
target_priority$hhtype <- 10000
target_priority$pertype <- 10
result <- ipu_nr(hh_seed, hh_targets, per_seed, per_targets, max_iterations = 30,
target_priority = target_priority)
As ipu_nr
runs, it relaxes the target constraints on pertype
much faster than on hhtype
. As a result, the final weights will match the household type much closer. The two methods generally match targets to the same degree, but often lead to very different distributions of weight ratios. In addition, ipu
tends to reach convergence levels around .1 %RMSE faster than ipu_nr
, but for levels below that, ipu_nr
tends to be faster.
result
## $weight_tbl
## # A tibble: 8 x 6
## geo_region pid hhtype weight avg_weight weight_factor
## <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 32.2 12.5 2.58
## 2 1 2 1 2.53 12.5 0.203
## 3 1 3 1 1.86 12.5 0.149
## 4 1 4 2 4.09 12.5 0.327
## 5 1 5 2 5.74 12.5 0.459
## 6 1 6 2 3.00 12.5 0.240
## 7 1 7 2 52.1 12.5 4.17
## 8 1 8 2 3.00 12.5 0.240
##
## $weight_dist
##
## $primary_comp
## # A tibble: 2 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 hhtype_1 35 36.6 1.61 4.59
## 2 geo_region_1 hhtype_2 65 67.9 2.91 4.48
##
## $secondary_comp
## # A tibble: 3 x 6
## geo category target result diff pct_diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 geo_region_1 pertype_1 182 153. -29.3 -16.1
## 2 geo_region_1 pertype_2 130 104. -26.4 -20.3
## 3 geo_region_1 pertype_3 208 153. -55.2 -26.5