-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use incidence2 accessor functions or subset columns directly? #79
Comments
You've convinced me I need to write a "design" vignette (it's been planned for a while but ... ... time). I'll address accessors below and leave a separate comment on general use of {incidence2} below (so you can hide as a little off-topic for the issue). On accessors ( |
Following from above, some thoughts on {incidence2} and where it is best used (and not used). I'll use a crude dichotomy of "interactive" to mean any sort of analysis pipeline and "programmatic" to mean in a package. In interactive settings, the benefit of {incidence2} is most apparent for complex aggregations of linelist with multiple date indices, or pre-aggregated data with multiple count variables, e.g. library(incidence2)
library(outbreaks)
library(dplyr)
# linelist example
ebola <- ebola_sim_clean$linelist
(grouped_inci <- incidence(
ebola,
date_index = c(
onset = "date_of_onset",
infection = "date_of_infection"
),
interval = "isoweek",
groups = "gender"
))
#> # incidence: 218 x 4
#> # count vars: infection, onset
#> # groups: gender
#> date_index gender count_variable count
#> * <isowk> <fct> <chr> <int>
#> 1 2014-W12 f infection 1
#> 2 2014-W15 f onset 1
#> 3 2014-W15 m infection 1
#> 4 2014-W16 f infection 1
#> 5 2014-W16 m onset 1
#> 6 2014-W17 f infection 4
#> 7 2014-W17 f onset 4
#> 8 2014-W17 m onset 1
#> 9 2014-W18 f infection 7
#> 10 2014-W18 f onset 4
#> # ℹ 208 more rows
plot(grouped_inci, angle = 45, border_colour = "white") # pre-aggregated example
covid <- covidregionaldataUK
(monthly_covid <-
covid |>
filter(!region %in% c("England", "Scotland", "Northern Ireland", "Wales")) |>
incidence(
date_index = "date",
groups = "region",
counts = c("cases_new", "deaths_new"),
interval = "yearmonth"
))
#> # incidence: 324 x 4
#> # count vars: cases_new, deaths_new
#> # groups: region
#> date_index region count_variable count
#> * <yrmon> <chr> <fct> <dbl>
#> 1 2020-Jan East Midlands cases_new NA
#> 2 2020-Jan East Midlands deaths_new NA
#> 3 2020-Jan East of England cases_new NA
#> 4 2020-Jan East of England deaths_new NA
#> 5 2020-Jan London cases_new NA
#> 6 2020-Jan London deaths_new NA
#> 7 2020-Jan North East cases_new NA
#> 8 2020-Jan North East deaths_new NA
#> 9 2020-Jan North West cases_new NA
#> 10 2020-Jan North West deaths_new NA
#> # ℹ 314 more rows
# exlude deaths from plot due to scale
monthly_covid |>
subset(count_variable == "cases_new") |>
plot(nrow = 3, angle = 45, border_colour = "white")
#> Warning: Removed 26 rows containing missing values (`position_stack()`). Where it may be preferable to use {grates} directly is for more simple aggregations of a single date_index and where you are not worried about the additional formatting of output and the default print methods: # e.g. For some this may be sufficient
ebola |>
mutate(isoweek = as_isoweek(date_of_onset)) |>
count(isoweek, gender) |>
head(n = 10L)
#> isoweek gender n
#> 1 2014-W15 f 1
#> 2 2014-W16 m 1
#> 3 2014-W17 f 4
#> 4 2014-W17 m 1
#> 5 2014-W18 f 4
#> 6 2014-W19 f 9
#> 7 2014-W19 m 3
#> 8 2014-W20 f 7
#> 9 2014-W20 m 10
#> 10 2014-W21 f 8
# as opposed to
incidence(
ebola,
date_index = c(onset = "date_of_onset"),
interval = "isoweek",
groups = "gender"
)
#> # incidence: 109 x 4
#> # count vars: onset
#> # groups: gender
#> date_index gender count_variable count
#> * <isowk> <fct> <chr> <int>
#> 1 2014-W15 f onset 1
#> 2 2014-W16 m onset 1
#> 3 2014-W17 f onset 4
#> 4 2014-W17 m onset 1
#> 5 2014-W18 f onset 4
#> 6 2014-W19 f onset 9
#> 7 2014-W19 m onset 3
#> 8 2014-W20 f onset 7
#> 9 2014-W20 m onset 10
#> 10 2014-W21 f onset 8
#> # ℹ 99 more rows For programatic use the benefits are more aparent and the knowledge of the objects invariants and structure do make it simple for developers to enable nice workflows such as library(i2extras)
out <-
ebola |>
incidence(date_index = "date_of_onset", interval = "week", groups = "hospital") |>
slice_head(n = 120L) |>
fit_curve(model = "poisson", alpha = 0.05)
# plot with a prediction interval but not a confidence interval
plot(out, ci = FALSE, pi=TRUE, angle = 45, border_colour = "white") # estimate growth rate
growth_rate(out)
#> # A tibble: 6 × 10
#> count_variable hospital model r r_lower r_upper growth_or_decay time
#> <chr> <fct> <lis> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 date_of_onset Connaught Ho… <glm> 0.197 0.177 0.217 doubling 3.53
#> 2 date_of_onset Military Hos… <glm> 0.173 0.147 0.200 doubling 4.00
#> 3 date_of_onset other <glm> 0.170 0.141 0.200 doubling 4.09
#> 4 date_of_onset Princess Chr… <glm> 0.142 0.101 0.188 doubling 4.87
#> 5 date_of_onset Rokupa Hospi… <glm> 0.178 0.133 0.228 doubling 3.89
#> 6 date_of_onset <NA> <glm> 0.184 0.164 0.205 doubling 3.77
#> # ℹ 2 more variables: time_lower <dbl>, time_upper <dbl> Created on 2023-04-11 with reprex v2.0.2 |
From #77 (review):
We then have two ways to extract count data, groups or dates from an incidence2 object:
incidence()
.Benefits accessor functions
The extra level of abstraction likely makes it more robust to possible future breaking changes. For example, even if future version of incidence2 chose to not rename the columns but instead to use a purely tag-based system, as in linelist, accessor would likely deal with the breaking change under the hood and provide a stable interface.
On the other hand, we are supposed to already deal with breaking changes by pinning specific version of our dependencies, as discussed in #69.
Benefits direct subsetting
incidence2
classThe text was updated successfully, but these errors were encountered: