4.1 Introduction
Temporal missingness occurs when there is an absence of a value in time. For regularly spaced data, which is often assumed in time series, implicit missing values can be relatively easily spotted because there are gaps in the regularity. These can be converted to explicit missing values with ease. In irregular temporal data, missings should be specified explicitly in the raw data. Once the missings are explicitly declared, the patterns can be explored, and appropriate methods for imputation employed. Missing value research has a long history, but little attention has been paid to temporal missings.
Little (1988) established a taxonomy for missing data mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). This is a view of missingness from the probabilistic perspective, because these mechanisms all specify a generating distribution from which to specify imputation methods. A data-centric approach to missings is described in Unwin et al. (1996), which shows how to explore missing value patterns with interactive graphics. D F Swayne and Buja (1998) illustrated how using a shadow matrix could be useful for exploring multivariate missings using interactive graphics. A graphical user interface for exploring missing values in multivariate data using static plots is provided by Cheng, Cook, and Hofmann (2015). Recently, Tierney and Cook (2018) developed a collection of tidy tools in the R package naniar to facilitate transforming, visualizing, and imputing missing data.
In contrast to multivariate data, temporal data has the time dimension that needs to be explored, to understand the temporal dynamics of missing values. Little work has been conducted in this area. Gschwandtner et al. (2012) provides a taxonomy of time-oriented data quality problems from single and multiple sources. This work is accompanied by an interactive visual system, TimeCleanser (Gschwandtner et al. 2014), for assessing data qualities for time-oriented missing data, which facilitates cleaning different time formats. Missing values are considered to be a data quality issue as a component of that system. The R package imputeTS (Moritz and Bartz-Beielstein 2017) provides time series imputation methods, such as temporal interpolation and Kalman Smoothing (Welch, Bishop, and others 2006), with a few graphical methods for summarizing missing values in time series. None of the existing work fully addresses the problem of handling temporal missing data. There is a need for better data structures, and visualization methods to explore temporal missing data, to better understand the temporal dynamics, and prepare it for imputation and subsequent modeling.
The paper is organized as follows. Section 4.2 outlines four categories of temporal missing patterns. Section 4.3 proposes a new type of vector class to encode missing values in time, coupled with visual tools (Section 4.4). A new suite of polishing techniques, for dealing with missings on large collection of series, are discussed in Section 4.5. Applications illustrating the new techniques are in Section 4.6. Section 4.7 concludes the paper.