Descriptive Metrics With daytime

Paul R. Hibbing

Introduction

One of the major capabilities of the daytime package surrounds calculation of meaningful descriptive metrics for circular (i.e., daytime) data. As an illustrative scenario, let us assume we have a vector of timestamps, which we wish to summarize.


timestamps <- c(
  "10:34:06", "11:53:23", "08:20:46", "12:16:05", "13:02:51",
  "12:45:48", "10:15:58", "13:58:50", "12:13:54", "12:44:06",
  "13:10:09", "12:07:28", "14:16:43", "10:26:01", "10:19:23",
  "10:33:04", "11:14:41", "10:39:40", "09:41:24", "13:35:08",
  "13:53:38", "11:13:16", "09:08:14", "11:47:55", "13:06:04",
  "10:01:25", "10:40:07", "09:39:11", "08:54:12", "13:06:01",
  "09:43:35", "13:00:51", "12:46:27", "12:19:23", "12:17:08",
  "14:33:06", "12:03:18", "11:33:20", "10:27:35", "12:49:09",
  "10:41:28", "09:30:26", "10:07:00", "10:52:15", "11:32:56",
  "10:47:48", "10:42:18", "07:29:19", "10:20:52", "14:31:59",
  "13:14:41", "09:24:25", "12:40:12", "13:24:54", "11:25:20", 
  "09:51:33", "12:06:54", "11:00:05", "10:44:36", "13:07:10",
  "08:27:44", "08:43:09", "10:44:38", "14:29:52", "10:46:13"
)

We can obtain a visual summary by converting to a daytime object and using the associated plot method.


timestamps <- daytime::as_daytime(
  timestamps, rational = TRUE, format = "%H:%M:%S"
)

plot(timestamps)

But how do we calculate meaningful summary values of central tendency and spread for circular data, akin to familiar metrics of mean and SD for non-circular data? That is the topic of this vignette.

Examining central tendency

The mean is well defined for circular data, and is nicely discussed and illustrated in the article by Cremers & Klugkist (2018). It is determined by finding the “mean direction” of the points on the circle. In daytime, this is coded intuitively, as follows:


mean(timestamps)
#> [1] 686.9274
#> attr(,"x")
#> [1] "Circular: 10.6, 11.9,  8.3, 12.3, 13.0, 12.8, 10.3, 14.0, 12.2, 12.7, 13.2, 12.1, 14.3, 10.4, 10.3, 10.6, 11.2, 10.7,  9.7, 13.6, 13.9, 11.2,  9.1, 11.8, 13.1, 10.0, 10.7,  9.7,  8.9, 13.1,  9.7, 13.0, 12.8, 12.3, 12.3, 14.6, 12.1, 11.6, 10.5, 12.8, 10.7,  9.5, 10.1, 10.9, 11.5, 10.8, 10.7,  7.5, 10.3, 14.5, 13.2,  9.4, 12.7, 13.4, 11.4,  9.9, 12.1, 11.0, 10.7, 13.1,  8.5,  8.7, 10.7, 14.5, 10.8"
#> attr(,"rational")
#> [1] TRUE
#> attr(,"class")
#> [1] "daytime" "numeric"

This shows that the mean is roughly the 687th minute of the day. We can represent this as a string using the time of day (tod) function:


daytime::tod(
  mean(timestamps)
)
#> [1] "11:26:55"
#> attr(,"x")
#> [1] "Circular: 10.6, 11.9,  8.3, 12.3, 13.0, 12.8, 10.3, 14.0, 12.2, 12.7, 13.2, 12.1, 14.3, 10.4, 10.3, 10.6, 11.2, 10.7,  9.7, 13.6, 13.9, 11.2,  9.1, 11.8, 13.1, 10.0, 10.7,  9.7,  8.9, 13.1,  9.7, 13.0, 12.8, 12.3, 12.3, 14.6, 12.1, 11.6, 10.5, 12.8, 10.7,  9.5, 10.1, 10.9, 11.5, 10.8, 10.7,  7.5, 10.3, 14.5, 13.2,  9.4, 12.7, 13.4, 11.4,  9.9, 12.1, 11.0, 10.7, 13.1,  8.5,  8.7, 10.7, 14.5, 10.8"
#> attr(,"rational")
#> [1] TRUE

Notably, this differs from the result we get when taking the non-circular mean:


daytime::tod(
  mean(
    as.numeric(timestamps)
  )
)
#> Warning: Setting `rational` to TRUE
#> [1] "11:26:47"
#> attr(,"x")
#> [1] 686.7872
#> attr(,"rational")
#> [1] TRUE

Overall, the mean is not too much trouble to work with, and we can see that it does a pretty intuitive job of showing central tendency among the data points.


plot(timestamps)

mean_radians <-
  as.numeric(mean(timestamps)) /
  -1440  * ## Fraction of the circle (clockwise)
  2 * pi + ## Convert to radians
  (pi / 2) ## Rotate to match the clock

arrows(
  x0 = 0, y0 = 0,
  x1 = cos(mean_radians),
  y1 = sin(mean_radians),
  col = "#E66100",
  lwd = 4.5
)

What about variability?

With circular data, it is harder to capture variability than central tendency. Several metrics exist to capture variability, but none are in the original units of measurement. We can circumvent this by inventing new metrics, but it is important to note that their descriptive utility may not correspond to statistical utility the way similar metrics (e.g., SD) do for non-circular data.

daytime provides several options for looking at variation in the data, and this is accomplished using an object-oriented approach to the sd function in R. When you call sd on a daytime object, the function daytime:::sd.daytime is called. This function allows you to specify the desired units (“min” or “hr”), and the desired method for calculation. Using the type argument, you can choose from these three:


sd(timestamps, units = "hr", type = "MSD")
#> [1] 1.404713
#> attr(,"x")
#>  [1]  52.827356  26.455978 186.160689  49.155978  95.922644  78.872644
#>  [7]  70.960689 151.905978  46.972644  77.172644 103.222644  40.539311
#> [13] 169.789311  60.910689  67.544022  53.860689  12.244022  47.260689
#> [19] 105.527356 128.205978 146.705978  13.660689 138.694022  20.989311
#> [25]  99.139311  85.510689  46.810689 107.744022 152.727356  99.089311
#> [31] 103.344022  93.922644  79.522644  52.455978  50.205978 186.172644
#> [37]  36.372644   6.405978  59.344022  82.222644  45.460689 116.494022
#> [43]  79.927356  34.677356   6.005978  39.127356  44.627356 237.610689
#> [49]  66.060689 185.055978 107.755978 122.510689  73.272644 117.972644
#> [55]   1.594022  95.377356  39.972644  26.844022  42.327356 100.239311
#> [61] 179.194022 163.777356  42.294022 182.939311  40.710689
#> attr(,"x")attr(,"rational")
#> [1] TRUE
#> attr(,"rational")
#> [1] TRUE
#> attr(,"class")
#> [1] "daytime" "numeric"

sd(timestamps, units = "min", type = "MSD")
#> [1] 84.2828
#> attr(,"x")
#>  [1]  52.827356  26.455978 186.160689  49.155978  95.922644  78.872644
#>  [7]  70.960689 151.905978  46.972644  77.172644 103.222644  40.539311
#> [13] 169.789311  60.910689  67.544022  53.860689  12.244022  47.260689
#> [19] 105.527356 128.205978 146.705978  13.660689 138.694022  20.989311
#> [25]  99.139311  85.510689  46.810689 107.744022 152.727356  99.089311
#> [31] 103.344022  93.922644  79.522644  52.455978  50.205978 186.172644
#> [37]  36.372644   6.405978  59.344022  82.222644  45.460689 116.494022
#> [43]  79.927356  34.677356   6.005978  39.127356  44.627356 237.610689
#> [49]  66.060689 185.055978 107.755978 122.510689  73.272644 117.972644
#> [55]   1.594022  95.377356  39.972644  26.844022  42.327356 100.239311
#> [61] 179.194022 163.777356  42.294022 182.939311  40.710689
#> attr(,"x")attr(,"rational")
#> [1] TRUE
#> attr(,"rational")
#> [1] TRUE
#> attr(,"class")
#> [1] "daytime" "numeric"


sd(timestamps, units = "hr", type = "SRL")
#> [1] 1.09163

sd(timestamps, units = "min", type = "SRL")
#> [1] 65.49778


## `units` argument is not relevant for the `circular` method
sd(timestamps, type = "circular")
#> [1] 2.008734

Examining mean and variability together

Through the PAutilities package, we can summarize mean and variability together, as follows:


## The below code will only run if the PAutilities package is installed
if (!!requireNamespace("PAutilities", quietly = TRUE)) {
  
  PAutilities::mean_sd(
    timestamps, units = "hr", method = "SRL",
    digits = 1, nsmall = 1
  )
  
}
#>       mean       sd     sum_string
#> 1 686.9274 1.404713 11:26:55 ± 1.4



## The below code will only run if the PAutilities package is installed
if (!!requireNamespace("PAutilities", quietly = TRUE)) {
  
  PAutilities::mean_sd(
    timestamps, units = "min", method = "MSD",
    give_df = FALSE, digits = 1, nsmall = 1
  )
  
}
#> [1] "11:26:55 ± 84.3"

Conclusion

In this vignette, we have discussed how to calculate descriptive metrics for circular (i.e., daytime) data. Although the descriptive utility is not necessarily coupled with statistical utility the way it is for non-circular data, these features of the daytime package are nevertheless valuable for adding context to investigations of circular data.