2.2 Creating a calendar display

2.2.1 Data transformation

The algorithm of transforming data for constructing a calendar plot uses linear algebra, similar to that used in the glyph map displays for spatio-temporal data (Wickham et al. 2012). To make a year long calendar requires cells for days, embedded in blocks corresponding to months, organized into a grid layout for a year. Each month can be captured with 35 (5 \(\times\) 7) cells, where the top left is Monday of week 1, and the bottom right is Sunday of week 5 by default. These cells provide a micro canvas on which to plot the data. The first day of the month could be any of Monday–Sunday, which is determined by the year of the calendar. Months are of different lengths, ranging from 28 to 31 days, and each month could extend over six weeks but the convention in these months is to wrap the last few days up to the top row of the block. The notation for creating these cells is as follows:

  • \(k = 1, \dots , 7\) is the day of the week that is the first day of the month.
  • \(d = 28, 29, 30\) or \(31\) representing the number of days in any month.
  • \((i, j)\) is the grid position where \(1 \le i \le 5\) is week within the month, \(1 \le j \le 7\), is day of the week.
  • \(g = k, \dots,(k+d)\) indexes the day in the month, inside the 35 possible cells.

The grid position for any day in the month is given by

\[\begin{equation} \begin{aligned} i &= \lceil (g \text{ mod } 35) / 7\rceil, \\ j &= g \text{ mod } 7. \end{aligned} \tag{2.1} \end{equation}\]

Figure 2.4 illustrates this \((i,j)\) layout for a month where \(k=5\).

Illustration of the indexing layout for cells in a month, where \(k\) is day of the week, \(g\) is day of the month, \((i, j)\) indicates grid position.

Figure 2.4: Illustration of the indexing layout for cells in a month, where \(k\) is day of the week, \(g\) is day of the month, \((i, j)\) indicates grid position.

To create the layout for a full year, \((m, n)\) denotes the position of the month arranged in the plot, where \(1 \le m \le M\) and \(1 \le n \le N\); \(b\) denotes the small amount of white space between each month for visual separation. Figure 2.5 illustrates this layout where \(M = 3\) and \(N = 4\).

Illustration of the indexing layout for months of one year, where \(M\) and \(N\) indicate number of rows and columns, \(b\) is a space parameter separating cells.

Figure 2.5: Illustration of the indexing layout for months of one year, where \(M\) and \(N\) indicate number of rows and columns, \(b\) is a space parameter separating cells.

Each cell forms a canvas on which to draw the data. Initialize the canvas to have limits \([0, 1]\) both horizontally and vertically. For the pedestrian sensor data, within each cell, hour is plotted horizontally and count is plotted vertically. Each variable is scaled to have values in \([0, 1]\), using the minimum and maximum of all the data values to be displayed, assuming fixed scales. Let \(h\) be the scaled hour, and \(c\) the scaled count.

Then the final points for making the calendar line plots of the pedestrian sensor data is given by:

\[\begin{equation} \begin{aligned} x &= j + (n - 1) \times 7 + (n - 1) \times b + h, \\ y &= i - (m - 1) \times 5 - (m - 1) \times b + c. \end{aligned} \tag{2.2} \end{equation}\]

Note that for the vertical direction, the top left is the starting point of the grid (in Figure 2.4) which is why subtraction is performed. Within each cell, the starting position is the bottom left.

The calendar-based display of hourly foot traffic at Flagstaff Station using line glyphs. The disparities between week day and weekend along with public holiday are immediately apparent. The arrangement of the data into a \(3 \times 4\) monthly grid represents all the traffic in 2016. Note that the algorithm wraps the last few days in the sixth week to the top row of each month block for a compact layout, which occurs to May and October.

Figure 2.6: The calendar-based display of hourly foot traffic at Flagstaff Station using line glyphs. The disparities between week day and weekend along with public holiday are immediately apparent. The arrangement of the data into a \(3 \times 4\) monthly grid represents all the traffic in 2016. Note that the algorithm wraps the last few days in the sixth week to the top row of each month block for a compact layout, which occurs to May and October.

Figure 2.6 shows the line glyphs framed in the monthly calendar over the year 2016. This is achieved by the frame_calendar function, which computes the coordinates on the calendar for the input data variables. These can then be plotted using the usual ggplot2 R package (H. Wickham, Chang, et al. 2018) functions. All of the grammar of graphics can be applied.

In order to make calendar-based graphics more accessible and informative, reference lines dividing each cell and block as well as labels indicating week day and month are also computed before plot construction.

Regarding the monthly calendar, the major reference lines separate every month panel and the minor ones separate every cell, represented by the thick and thin lines in Figure 2.6, respectively. The major reference lines are placed surrounding every month block: for each \(m\), the vertical lines are determined by \(\min{(x)}\) and \(\max{(x)}\); for each \(n\), the horizontal lines are given by \(\min{(y)}\) and \(\max{(y)}\). The minor reference lines are only placed on the left side of every cell: for each \(i\), the vertical division is \(\min{(x)}\); for each \(j\), the horizontal is \(\min{(y)}\).

The month labels located on the top left using \((\min{(x)}, \max{(y)})\) for every \((m, n)\). The week day texts are uniformly positioned on the bottom of the whole canvas, that is \(\min{(y)}\), with the central position of a cell \(x / 2\) for each \(j\).

2.2.2 Options

The algorithm has several optional parameters that modify the layout, direction of display, scales, plot size and switching to polar coordinates. These are accessible to the user by the inputs to the function frame_calendar:

frame_calendar(data, x, y, date, calendar = "monthly", dir = "h", 
  sunday = FALSE, nrow = NULL, ncol = NULL, polar = FALSE, scale = "fixed", 
  width = 0.95, height = 0.95, margin = NULL)

It is assumed that the data is in tidy format (Wickham 2014), and x, y are the variables that will be mapped to the horizontal and vertical axes in each cell. For example, the x is the time of the day, and y is the count (Figure 2.6). The date argument specifies the date variable used to construct the calendar layout.

The algorithm handles displaying a single month or several years. The arguments nrow and ncol specify the layout of multiple months. For some time frames, some arrangements may be more beneficial than others. For example, to display data for three years, setting nrow = 3 and ncol = 12 would show each year on a single row.

2.2.2.1 Layouts

The monthly calendar is the default, but two other formats, weekly and daily, are available with the calendar argument. The daily calendar arranges days along a row, one row per month. The weekly calendar stacks weeks of the year vertically, one row for each week, and one column for each day. The reader can scan down all the Mondays of the year, for example. The daily layout puts more emphasis on day of the month. The weekly calendar is appropriate if most of the variation can be characterized by days of the week. On the other hand, the daily calendar should be used when there is a yearly effect but not a weekly effect in the data (for example weather data). When both effects are present, the monthly calendar would be a better choice. Temporal patterns motivate which variant should be employed.

2.2.2.2 Polar transformation

When polar = TRUE, a polar transformation is carried out on the data. The computation is similar to the one described in Wickham et al. (2012). The resulting plot is star glyphs embedded in the monthly calendar layout. Star glyphs are time series lines transformed in polar coordinates.

2.2.2.3 Scales

By default, global scaling is done for values in each plot, with the global minimum and maximum used to fit values into each cell. If the emphasis is comparing trend rather than magnitude, it is useful to scale locally. For temporal data this would harness the temporal components. The choices include: free scale within each cell (free), cells derived from the same day of the week (free_wday), or cells from the same day of the month (free_mday). The scaling allows for the comparisons of absolute or relative values, and the emphasis of different temporal variations.

With local scaling, the overall variation gives way to the individual shape. Figure 2.7 shows the same data as Figure 2.6 scaled locally using scale = "free". The daily trends are magnified.

Line glyphs on the calendar format showing hourly foot traffic at Flagstaff Station, scaled over all the days. The individual shape on a single day becomes more distinctive, however it is impossible to compare the size of peaks between days.

Figure 2.7: Line glyphs on the calendar format showing hourly foot traffic at Flagstaff Station, scaled over all the days. The individual shape on a single day becomes more distinctive, however it is impossible to compare the size of peaks between days.

The free_wday scales each week day together. It can be useful to comparing trends across week days, allowing relative patterns for weekends versus week days to be examined. Similarly, the free_mday uses free scaling for any day within a given month.

2.2.2.4 Orientation

By default, grids are laid out horizontally. This can be transposed by setting the dir parameter to "v", in which case \(i\) and \(j\) are swapped in Equation (2.1). This can be useful for creating calendar layouts for countries where vertical layout is the convention.

2.2.2.5 Language support

Most countries have adopted this western calendar layout, while the languages used for week day and month would be different across countries. We also offer language specifications other than English for text labelling.

2.2.3 Variations

2.2.3.1 Overlaying and faceting subsets

Plots can be layered. The comparison of sensors can be done by overlaying the values for each (Figure 2.8). Differences between the pedestrian patterns at these sensors can be seen. Flagstaff Station exhibits strong commuters patterns, with fewer pedestrian counts during the weekends and public holidays. This suggests that Flagstaff Station has limited functionality on non-work days. From Figure 2.8 it can be seen that Birrarung Marr has a distinct temporal pattern from the other two all year round. The nighttime events, such as White Night, have barely affected the operation of Flagstaff Station but heavily affected the incoming and outgoing traffic to the State Library and Birrarung Marr.

Overlaying line graphs of the three sensors in the monthly calendar. Three sensors demonstrate very different traffic patterns. Birrarung Marr tends to attract many pedestrians for special events held on weekends, contrasting to the bimodal commuting traffic at Flagstaff Station.

Figure 2.8: Overlaying line graphs of the three sensors in the monthly calendar. Three sensors demonstrate very different traffic patterns. Birrarung Marr tends to attract many pedestrians for special events held on weekends, contrasting to the bimodal commuting traffic at Flagstaff Station.

To avoid the overlapping problem, the calendar layout can be embedded into a series of subplots for the different sensors. Figure 2.9 presents the idea of faceting calendar plots. This allows comparing the overall structure between sensors, while emphasizing individual sensor variation. In particular, it can be immediately learned that Birrarung Marr was busy and packed, for example during the Australian Open, a major international tennis tournament, in the last two weeks of January. This is concealed in the conventional graphics.

Line charts, embedded in the \(6 \times 2\) monthly calendar, colored and faceted by the 3 sensors. The variations of an individual sensor are emphasised, and the shapes can be compared across the cells and sensors.

Figure 2.9: Line charts, embedded in the \(6 \times 2\) monthly calendar, colored and faceted by the 3 sensors. The variations of an individual sensor are emphasised, and the shapes can be compared across the cells and sensors.

2.2.3.2 Different types of plots

The frame_calendar function is not constrained to line plots. The full range of plotting capabilities in ggplot2 is essentially available. Figure 2.10 shows a lag scatterplot at Flagstaff Station, where the lagged hourly count is assigned to the x argument and the current hourly count to the y argument. This figure is organized in the daily calendar layout. Figure 2.10 indicates two primary patterns, strong autocorrelation on weekends, and weaker autocorrelation on work days. At the higher counts, on week days, the next hour sees possibly substantial increase or decrease in counts, essentially revealing a bimodal distribution of consecutive counts, as supported by Figure 2.6.

Lag scatterplot in the daily calendar layout. Each hour’s count is plotted against previous hour’s count at Flagstaff Station to demonstrate the autocorrelation at lag 1. The correlation between them is more consistent on non-work days than work days.

Figure 2.10: Lag scatterplot in the daily calendar layout. Each hour’s count is plotted against previous hour’s count at Flagstaff Station to demonstrate the autocorrelation at lag 1. The correlation between them is more consistent on non-work days than work days.

The algorithm can also produce more complicated plots, such as boxplots. Figure 2.11 uses a loess smooth line (Cleveland 1979) superimposed on side-by-side boxplots. It shows the distribution of hourly counts across all 43 sensors during December. The last week of December is the holiday season: people are off work on the day before Christmas (December 24), go shopping on the Boxing day (December 26), and stay out for the fireworks on New Year’s Eve.

Side-by-side boxplots of hourly counts for all the 43 sensors in December 2016, with the loess smooth line superimposed on each day. It shows the hourly distribution in the city as a whole. There is one sensor attracting a larger number of people on New Year's Eve than the rest.

Figure 2.11: Side-by-side boxplots of hourly counts for all the 43 sensors in December 2016, with the loess smooth line superimposed on each day. It shows the hourly distribution in the city as a whole. There is one sensor attracting a larger number of people on New Year’s Eve than the rest.

The same plot as Figure 2.11, but with the month and week day labels in Chinese. It demonstrates the natural support for languages other than English.

Figure 2.12: The same plot as Figure 2.11, but with the month and week day labels in Chinese. It demonstrates the natural support for languages other than English.

2.2.3.3 Interactivity

As a data restructuring tool, the interactivity of calendar-based displays can be easily enabled, as long as the interactive graphic system remains true to the spirit of the grammar of graphics, for example plotly (Sievert 2018) in R. As a standalone display, an interactive tooltip can be added to show labels when mousing over it in the calendar layout, for example the hourly count with the time of day. It is difficult to sense the values from the static display, but the tooltip makes it possible. Options in the frame_calendar function can be ported to a form of selection button or text input in a graphical user interface like R shiny (Chang et al. 2018). The display will update on the fly accordingly via clicking or text input, as desired.

Linking calendar displays to other types of charts is valuable to visually explore the relationships between variables. An example can be found in the wanderer4melb shiny application (Wang 2018). The calendar most naturally serves as a tool for date selection: by selecting and brushing the glyphs in the calendar, it subsequently highlights the elements of corresponding dates in other time-based plots. Conversely, selecting on weather data plots, linked to the calendar can help to assess if very hot/cold days and heavy rain affect the number of people walking in downtown Melbourne. The linking between weather data and calendar display is achieved using the common dates.