daytime
One of the major capabilities of the daytime
package
surrounds calculation of meaningful descriptive metrics for circular
(i.e., daytime) data. As an illustrative scenario, let us assume we have
a vector of timestamps, which we wish to summarize.
<- c(
timestamps "10:34:06", "11:53:23", "08:20:46", "12:16:05", "13:02:51",
"12:45:48", "10:15:58", "13:58:50", "12:13:54", "12:44:06",
"13:10:09", "12:07:28", "14:16:43", "10:26:01", "10:19:23",
"10:33:04", "11:14:41", "10:39:40", "09:41:24", "13:35:08",
"13:53:38", "11:13:16", "09:08:14", "11:47:55", "13:06:04",
"10:01:25", "10:40:07", "09:39:11", "08:54:12", "13:06:01",
"09:43:35", "13:00:51", "12:46:27", "12:19:23", "12:17:08",
"14:33:06", "12:03:18", "11:33:20", "10:27:35", "12:49:09",
"10:41:28", "09:30:26", "10:07:00", "10:52:15", "11:32:56",
"10:47:48", "10:42:18", "07:29:19", "10:20:52", "14:31:59",
"13:14:41", "09:24:25", "12:40:12", "13:24:54", "11:25:20",
"09:51:33", "12:06:54", "11:00:05", "10:44:36", "13:07:10",
"08:27:44", "08:43:09", "10:44:38", "14:29:52", "10:46:13"
)
We can obtain a visual summary by converting to a
daytime
object and using the associated plot method.
<- daytime::as_daytime(
timestamps rational = TRUE, format = "%H:%M:%S"
timestamps,
)
plot(timestamps)
But how do we calculate meaningful summary values of central tendency and spread for circular data, akin to familiar metrics of mean and SD for non-circular data? That is the topic of this vignette.
The mean is well defined for circular data, and is nicely discussed
and illustrated in the article by Cremers
& Klugkist (2018). It is determined by finding the “mean
direction” of the points on the circle. In daytime
, this is
coded intuitively, as follows:
mean(timestamps)
#> [1] 686.9274
#> attr(,"x")
#> [1] "Circular: 10.6, 11.9, 8.3, 12.3, 13.0, 12.8, 10.3, 14.0, 12.2, 12.7, 13.2, 12.1, 14.3, 10.4, 10.3, 10.6, 11.2, 10.7, 9.7, 13.6, 13.9, 11.2, 9.1, 11.8, 13.1, 10.0, 10.7, 9.7, 8.9, 13.1, 9.7, 13.0, 12.8, 12.3, 12.3, 14.6, 12.1, 11.6, 10.5, 12.8, 10.7, 9.5, 10.1, 10.9, 11.5, 10.8, 10.7, 7.5, 10.3, 14.5, 13.2, 9.4, 12.7, 13.4, 11.4, 9.9, 12.1, 11.0, 10.7, 13.1, 8.5, 8.7, 10.7, 14.5, 10.8"
#> attr(,"rational")
#> [1] TRUE
#> attr(,"class")
#> [1] "daytime" "numeric"
This shows that the mean is roughly the 687th minute of
the day. We can represent this as a string using the time of day
(tod
) function:
::tod(
daytimemean(timestamps)
)#> [1] "11:26:55"
#> attr(,"x")
#> [1] "Circular: 10.6, 11.9, 8.3, 12.3, 13.0, 12.8, 10.3, 14.0, 12.2, 12.7, 13.2, 12.1, 14.3, 10.4, 10.3, 10.6, 11.2, 10.7, 9.7, 13.6, 13.9, 11.2, 9.1, 11.8, 13.1, 10.0, 10.7, 9.7, 8.9, 13.1, 9.7, 13.0, 12.8, 12.3, 12.3, 14.6, 12.1, 11.6, 10.5, 12.8, 10.7, 9.5, 10.1, 10.9, 11.5, 10.8, 10.7, 7.5, 10.3, 14.5, 13.2, 9.4, 12.7, 13.4, 11.4, 9.9, 12.1, 11.0, 10.7, 13.1, 8.5, 8.7, 10.7, 14.5, 10.8"
#> attr(,"rational")
#> [1] TRUE
Notably, this differs from the result we get when taking the non-circular mean:
::tod(
daytimemean(
as.numeric(timestamps)
)
)#> Warning: Setting `rational` to TRUE
#> [1] "11:26:47"
#> attr(,"x")
#> [1] 686.7872
#> attr(,"rational")
#> [1] TRUE
Overall, the mean is not too much trouble to work with, and we can see that it does a pretty intuitive job of showing central tendency among the data points.
plot(timestamps)
<-
mean_radians as.numeric(mean(timestamps)) /
-1440 * ## Fraction of the circle (clockwise)
2 * pi + ## Convert to radians
/ 2) ## Rotate to match the clock
(pi
arrows(
x0 = 0, y0 = 0,
x1 = cos(mean_radians),
y1 = sin(mean_radians),
col = "#E66100",
lwd = 4.5
)
With circular data, it is harder to capture variability than central tendency. Several metrics exist to capture variability, but none are in the original units of measurement. We can circumvent this by inventing new metrics, but it is important to note that their descriptive utility may not correspond to statistical utility the way similar metrics (e.g., SD) do for non-circular data.
daytime
provides several options for looking at
variation in the data, and this is accomplished using an object-oriented
approach to the sd
function in R. When you call
sd
on a daytime
object, the function
daytime:::sd.daytime
is called. This function allows you to
specify the desired units (“min” or “hr”), and the desired method for
calculation. Using the type
argument, you can choose from
these three:
MSD: This is the default option, which stands for ‘mean shorter distance’. Essentially, we want to measure the mean distance between each of the individual data points and the overall circular mean. Each of the individual distances can be calculated going clockwise or counterclockwise around the circle, and the MSD method takes whichever option gives the shorter distance. Thus, the mean distance reflects the mean of the ‘shorter’ individual distances.
SRL: This stands for ‘scaled resultant length’. In Section 3.3 of Cremers & Klugkist (2018), they show that the “mean resultant length” is an indicator of variability. If all circular values are identical, there is no variability in the data, and the mean resultant length is 1. If data are evenly split on opposite sides of the circle, the data are maximally variable, and the mean resultant length is 0. The idea of the SRL metric is, by straightforward scaling, to map this range of variabilities (from 0 to 1) onto the equivalent range in units of measure (0 to 12 hours, or 0 to 770 minutes).
circular: This method simply invokes the
sd
method for circular data, based on code in the
circular
package. For more information, see
?circular::sd.circular
.
sd(timestamps, units = "hr", type = "MSD")
#> [1] 1.404713
#> attr(,"x")
#> [1] 52.827356 26.455978 186.160689 49.155978 95.922644 78.872644
#> [7] 70.960689 151.905978 46.972644 77.172644 103.222644 40.539311
#> [13] 169.789311 60.910689 67.544022 53.860689 12.244022 47.260689
#> [19] 105.527356 128.205978 146.705978 13.660689 138.694022 20.989311
#> [25] 99.139311 85.510689 46.810689 107.744022 152.727356 99.089311
#> [31] 103.344022 93.922644 79.522644 52.455978 50.205978 186.172644
#> [37] 36.372644 6.405978 59.344022 82.222644 45.460689 116.494022
#> [43] 79.927356 34.677356 6.005978 39.127356 44.627356 237.610689
#> [49] 66.060689 185.055978 107.755978 122.510689 73.272644 117.972644
#> [55] 1.594022 95.377356 39.972644 26.844022 42.327356 100.239311
#> [61] 179.194022 163.777356 42.294022 182.939311 40.710689
#> attr(,"x")attr(,"rational")
#> [1] TRUE
#> attr(,"rational")
#> [1] TRUE
#> attr(,"class")
#> [1] "daytime" "numeric"
sd(timestamps, units = "min", type = "MSD")
#> [1] 84.2828
#> attr(,"x")
#> [1] 52.827356 26.455978 186.160689 49.155978 95.922644 78.872644
#> [7] 70.960689 151.905978 46.972644 77.172644 103.222644 40.539311
#> [13] 169.789311 60.910689 67.544022 53.860689 12.244022 47.260689
#> [19] 105.527356 128.205978 146.705978 13.660689 138.694022 20.989311
#> [25] 99.139311 85.510689 46.810689 107.744022 152.727356 99.089311
#> [31] 103.344022 93.922644 79.522644 52.455978 50.205978 186.172644
#> [37] 36.372644 6.405978 59.344022 82.222644 45.460689 116.494022
#> [43] 79.927356 34.677356 6.005978 39.127356 44.627356 237.610689
#> [49] 66.060689 185.055978 107.755978 122.510689 73.272644 117.972644
#> [55] 1.594022 95.377356 39.972644 26.844022 42.327356 100.239311
#> [61] 179.194022 163.777356 42.294022 182.939311 40.710689
#> attr(,"x")attr(,"rational")
#> [1] TRUE
#> attr(,"rational")
#> [1] TRUE
#> attr(,"class")
#> [1] "daytime" "numeric"
sd(timestamps, units = "hr", type = "SRL")
#> [1] 1.09163
sd(timestamps, units = "min", type = "SRL")
#> [1] 65.49778
## `units` argument is not relevant for the `circular` method
sd(timestamps, type = "circular")
#> [1] 2.008734
Through the PAutilities
package, we can summarize mean
and variability together, as follows:
## The below code will only run if the PAutilities package is installed
if (!!requireNamespace("PAutilities", quietly = TRUE)) {
::mean_sd(
PAutilitiesunits = "hr", method = "SRL",
timestamps, digits = 1, nsmall = 1
)
}#> mean sd sum_string
#> 1 686.9274 1.404713 11:26:55 ± 1.4
## The below code will only run if the PAutilities package is installed
if (!!requireNamespace("PAutilities", quietly = TRUE)) {
::mean_sd(
PAutilitiesunits = "min", method = "MSD",
timestamps, give_df = FALSE, digits = 1, nsmall = 1
)
}#> [1] "11:26:55 ± 84.3"
In this vignette, we have discussed how to calculate descriptive
metrics for circular (i.e., daytime
) data. Although the
descriptive utility is not necessarily coupled with statistical utility
the way it is for non-circular data, these features of the
daytime
package are nevertheless valuable for adding
context to investigations of circular data.