Skip to contents

incidence() calculates the incidence of different events across specified time periods and groupings.

Usage

incidence(
  x,
  date_index,
  groups = NULL,
  counts = NULL,
  count_names_to = "count_variable",
  count_values_to = "count",
  date_names_to = "date_index",
  rm_na_dates = TRUE,
  interval = NULL,
  offset = NULL,
  complete_dates = FALSE,
  fill = 0L,
  ...
)

Arguments

x

A data frame object representing a linelist or pre-aggregated dataset.

date_index

character.

The time index(es) of the given data.

This should be the name(s) corresponding to the desired date column(s) in x.

A named vector can be used for convenient relabelling of the resultant output.

Multiple indices only make sense when x is a linelist.

groups

character.

An optional vector giving the names of the groups of observations for which incidence should be grouped.

A named vector can be used for convenient relabelling of the resultant output.

counts

character.

The count variables of the given data. If NULL (default) the data is taken to be a linelist of individual observations.

A named vector can be used for convenient relabelling of the resultant output.

count_names_to

character.

The column to create which will store the counts column names provided that counts is not NULL.

count_values_to

character.

The name of the column to store the resultant count values in.

date_names_to

character.

The name of the column to store the date variables in.

rm_na_dates

bool.

Should NA dates be removed prior to aggregation?

interval

An optional scalar integer or string indicating the (fixed) size of the desired time interval you wish to use for for computing the incidence.

Defaults to NULL in which case the date_index columns are left unchanged.

Numeric values are coerced to integer and treated as a number of days to group.

Text strings can be one of:

* day or daily
* week(s) or weekly
* epiweek(s)
* isoweek(s)
* month(s) or monthly
* yearmonth(s)
* quarter(s) or quarterly
* yearquarter(s)
* year(s) or yearly

More details can be found in the "Interval specification" section.

offset

Only applicable when interval is not NULL.

An optional scalar integer or date indicating the value you wish to start counting periods from relative to the Unix Epoch:

  • Default value of NULL corresponds to 0L.

  • For other integer values this is stored scaled by n (offset <- as.integer(offset) %% n).

  • For date values this is first converted to an integer offset (offset <- floor(as.numeric(offset))) and then scaled via n as above.

complete_dates

bool.

Should the resulting object have the same range of dates for each grouping.

Missing counts will be filled with 0L unless the fill argument is provided (and this value will take precedence).

Will attempt to use function(x) seq(min(x), max(x), by = 1) on the resultant date_index column to generate a complete sequence of dates.

More flexible completion is possible by using the complete_dates() function.

fill

numeric.

Only applicable when complete_dates = TRUE.

The value to replace missing counts caused by completing dates.

If unset then will default to 0L.

...

Not currently used.

Value

A tibble with subclass incidence2.

Details

incidence2 objects are a sub class of data frame with some additional invariants. That is, an incidence2 object must:

  • have one column representing the date index (this does not need to be a date object but must have an inherent ordering over time);

  • have one column representing the count variable (i.e. what is being counted) and one variable representing the associated count;

  • have zero or more columns representing groups;

  • not have duplicated rows with regards to the date and group variables.

Interval specification

Where interval is specified, incidence(), predominantly uses the grates package to generate appropriate date groupings. The grouping used depends on the value of interval. This can be specified as either an integer value or a string corresponding to one of the classes:

For "day" or "daily" interval, we provide a thin wrapper around as.Date() that ensures the underlying data are whole numbers and that time zones are respected. Note that additional arguments are not forwarded to as.Date() so for greater flexibility users are advised to modifying your input prior to calling incidence().

See also

  • browseVignettes("grates") for more details on the grate object classes.

  • incidence_() for a version supporting tidy-select semantics in some arguments.

Examples

if (requireNamespace("outbreaks", quietly = TRUE)) {
    data(ebola_sim_clean, package = "outbreaks")
    dat <- ebola_sim_clean$linelist
    incidence(dat, "date_of_onset")
    incidence(dat, "date_of_onset", groups = c("gender", "hospital"))
}
#> # incidence:  2,535 x 5
#> # count vars: date_of_onset
#> # groups:     gender, hospital
#>    date_index gender hospital                               count_variable count
#>    <date>     <fct>  <fct>                                  <chr>          <int>
#>  1 2014-04-07 f      Military Hospital                      date_of_onset      1
#>  2 2014-04-15 m      Connaught Hospital                     date_of_onset      1
#>  3 2014-04-21 f      other                                  date_of_onset      1
#>  4 2014-04-21 m      other                                  date_of_onset      1
#>  5 2014-04-25 f      NA                                     date_of_onset      1
#>  6 2014-04-26 f      other                                  date_of_onset      1
#>  7 2014-04-27 f      NA                                     date_of_onset      1
#>  8 2014-05-01 f      Princess Christian Maternity Hospital… date_of_onset      1
#>  9 2014-05-01 f      Rokupa Hospital                        date_of_onset      1
#> 10 2014-05-03 f      Connaught Hospital                     date_of_onset      1
#> # ℹ 2,525 more rows