Singular Spectrum Analysis Theory

2. THEORY

Singular-Spectrum Analysis

SSA is a nonparametric method. It tries to overcome the problems of finite sample length and noisiness of sampled time series not by fitting an assumed model to the available series, but by using a data-adaptive basis set, instead of the fixed sine and cosine of the Blackman-Tukey method.

Vautard and Ghil (1989: VG hereafter) realized the formal similarity between classical lagged-covariance analysis and the method of delays. They exploited this similarity further by pointing out that pairs of SSA eigenmodes corresponding to nearly equal eigenvalues and associated (temporal) principal components that are nearly in phase quadrature can represent efficiently a nonlinear, anharmonic oscillation. This is due to the fact that a single pair of data-adaptive eigenmodes can capture the basic periodicity of a boxcar or seesaw-shaped oscillation, say, rather than necessitate the many overtones that will appear in methods with fixed basis functions.

There are three basic steps in SSA: i) embedding the sampled time series in a vector space of dimension M; ii) computing the MxM lag-covariance matrix C_D of the data; and iii) diagonalizing C_D.

Step (i): The time series {x(t): t=1, .......N} is embedded into a vector space of dimension M by considering M lagged copies {x(t-j): j=1, .......M} thereof. The choice of the dimension M is not obvious, but SSA is typically successful at analyzing periods in the range (M/5, M).

Step (ii): One defines the MxM lag-covariance matrix estimator C_D. There are three distinct methods used widely to define C_D. In the BK (Broomhead and King) algorithm, a window of length M is moved along the time series, producing a sequence of N'=N-M+1 vectors in the embedding space. This sequence is used to obtain the N' x M trajectory matrix D, where the i-th row is the i-th view of the time series through the window. In this approach, C_D is defined by:

In the VG algorithm, C_D is estimated directly from the data as a Toeplitz matrix with constant diagonals, i.e., its entries C_ij depend only on the lag |i-j|:

Singular Spectrum Toeplitz matrix

Burg covariance estimation is an iterative process based on fitting an AR model with a number of AR components equal to the SSA window length. Both the Burg and VG methods impose a Toeplitz structure upon the autocovariance matrix whereas the BK method does not. The Burg approach in principal should involve less "power leakage" due to the finite length of the time series and should therefore improve resolution. However, the Burg estimate can induce significant biases when nonstationarities and very low-frequency variations are present in the series. Thus in some cases it will be worthwhile to try the VG method. Also, for long series (N on the order of 5,000), the VG estimate is less computationally burdensome and thus is completed more quickly.

In practice, the VG method is more prone than is the Burg method to numerical instabilities (yielding negative eigenvalues) when pure oscillations are analyzed. The BK method is thus somewhat less prone to problems with nonstationary time series, although the VG method seems untroubled by all but the most extreme nonstationarities. However the Toeplitz methods do appear to yield more stable results under the influence of minor perturbations of the time series.

Both the Burg, BK and VG algorithms are implemented in the kSpectra Toolkit.

Step (iii): The covariance matrix calculated from the N sample points, is then diagonalized and the eigenvalues are ranked in decreasing order. The eigenvalue {&lambda_k, k=1,..,M} gives the variance of the time series in the direction specified by the corresponding eigenvector E_k ; the square roots of the eigenvalues are called singular values and the corresponding set the singular spectrum. These terms give SSA its name; BK showed how to obtain singular spectrum by SVD applied to the trajectory matrix D. VG called the E_k's temporal empirical orthogonal functions (EOFs), by analogy with the meteorological term used when applying PCA in the spatial domain.

If we arrange and plot the singular values in decreasing order, one can often distinguish an initial steep slope, representing the signal, and a (more or less) ``flat floor,'' representing the noise level. To correctly separate signal from noise, this visual observation of the spectral plot must be confirmed by suitable criteria for statistical significance as discussed later.

Once these three steps have been completed, a number of other interesting results can be obtained. For each EOF E_k with components {E_k(j): j=1, .......M} , we can construct the time series of length N' given by:

Principal component of SSA

called the k-th principal component (PC). It represents the projection of the original time series onto the k-th EOF. The sum of the power spectra of the PCs is identical to the power spectrum of the time series x(t); therefore we can study separately the spectral contribution of the various components. The PCs, however, have length N', not N, and do not contain phase information.

In order to extract time series of length N, corresponding to a chosen subset of K eigenelements, the associated PCs are combined to form the partial reconstruction R_K(t) of the original time series x(t). Over the central part of the time series,

Singular-Spectrum Reconstruction

and

These series of length N are called reconstructed components (RCs). They have the important property of preserving the phase of the time series; therefore x(t) and R_K(t) can be superimposed. When K contains the components of a single eigenvector E_k , R_K(t) is called the k-th RC. The time series can be reconstructed completely by summing its RCs, without losing any information. When a linear trend is present, reconstruction based on the BK algorithm has certain advantages near the endpoints of the sample.

The correct partial reconstruction of the signal, i.e., the reconstruction over the correct set K={1,2,.S} of EOFs, yields the optimal signal-to-noise (S/N) enhancement with respect to white noise. MEM spectral estimation will work very well on this clean signal which, unlike the raw time series, is band limited. As a result, a low-order AR-process fit can yield good spectral resolution, without spurious peaks.

In order to perform a good reconstruction of the signal or of its oscillatory part(s) several methods - either heuristic or based on Monte Carlo ideas can be used for S/N separation or for reliable identification of oscillatory pairs and trends and are included in the kSpectra Toolkit.

Monte Carlo SSA

Monte Carlo SSA (MC-SSA) can be used to establish whether a given timeseries is linearly distinguishable from any well-defined process, including the output of a deterministic chaotic system, but we will focus on testing against the linear stochastic processes which are normally considered as ``noise''. ``Red noise'' is often used to refer to any linear stochastic process in which power declines monotonically with increasing frequency, but we prefer to use the term to refer specifically to a first-order autoregressive, or AR(1), whose value at a time t depends on the value at time t-1 only,

where &xi(t) is a gaussian-distributed white-noise process with a variance &sigma², x₀ is the process mean and a₁ is a constant coefficient.

When testing against a red noise null-hypothesis, the first step in MC-SSA is to determine the red-noise coefficients &sigma and a₁from the time series x(t) using a maximum-likelihood criterion. Based on these coefficients, an ensemble of surrogate red-noise data can be generated and, for each realization, a covariance matrix C_R is computed. These covariance matrices are then projected onto the eigenvector basis E_X of the original data,

The resulting matrix &Lambda_R is not necessarily diagonal, but it measures the resemblance of a given surrogate set with the data set. This resemblance can be quantified by computing the statistics of the diagonal elements of &Lambda_R. The statistical distribution of these elements, determined from the ensemble of Monte Carlo simulations, gives confidence intervals outside which a time series can be considered to be significantly different from a generic red-noise simulation. For instance, if an eigenvalue &lambda_k lies outside a 95% noise percentile, then the red-noise null hypothesis for the associated EOF (and PC) can be rejected with this confidence. Otherwise, that particular SSA component of the time series cannot be considered as significantly different from red noise.

Special care must be taken at the parameter-estimation stage because, for the results of a test against AR(1) noise to be interesting, we must ensure we are testing against that specific AR(1) process which maximises the likelihood that we will fail to reject the null-hypothesis. Only if we can reject this (the ``worst case'') red noise process, can we be confident of rejecting all other red noise processes at the same or higher confidence level. A second important point is test multiplicity: comparing M data eigenvalues with M confidence intervals computed from the surrogate ensemble, we expect M/10 to lie above the 90th percentiles even if the null hypothesis is true. Thus a small number of excursions above a relatively low percentile should be interpreted with caution.

The MC-SSA algorithm described above can be adapted to eliminate known periodic components and test the residual against noise. This adaptation provides sharper insight into the dynamics captured by the data, since known periodicities (like orbital forcing on the Quaternary time scale or seasonal forcing on the intraseasonal-to-interannual one) often generate much of the variance at the lower frequencies manifest in a time series and alter the rest of the spectrum. The refinement of MC-SSA consists in restricting the projections to the EOFs that do not account for known periodic behavior. This refinement is implemented in the kSpectra toolkit, where it is referred to as the "Include EOFs".

Other variations of the basic MC-SSA include projection C_R onto the set eigenvectors of the null-hypothesis covraince matrix (AR(1) basis), and a much computationally faster Chi-squared approximation. In contrast to MC-SSA, the chi-squared test approximates the distribution of surrogate projections as chi-squared with 3M/N degrees of freedom. This is quite accurate for the case of sinusoidal EOFs and a linear (AR-type) noise null-hypothesis. The probability of n excursions above the m-th percentile is then estimated using the binomial distribution.

Using SSA for identifying oscillatory components

A typical testing procedure for Monte Carlo or Chi-Squared tests would be as follows (i) begin by testing against pure noise and identify frequencies at which the data displays anomalously high power against this null-hypothesis; (ii) find the data EOFs which correspond to these frequencies (being data-adaptive, these will often provide a better filter for the signal than the corresponding EOFs of the noise); finally (iii) re-test including these EOFs in the null-hypothesis to check if there are other features in the spectrum which were concealed by the dominant signal . It is, of course, consistent to iterate this procedure, but beware of the problem of test multiplicity: iteration makes the test increasingly liberal, so results which are still on the margins of acceptable significance after a couple of iterations including successively more EOFs into the "signal" should be ignored,

Results of the Monte Carlo or Chi-Squared tests should be also compared with the output from Pairing tests, Heuristic error bars, and checking that relevant EOFs and PCs are indeed in phase-quadrature.

References

1. Allen, M.R., and Smith, L.A., 1996: Monte Carlo SSA: detecting oscillations in the presence of coloured noise. J. Climate, 9, 3373.

2. Broomhead, D.S., and King, G., 1986: Extracting qualitative dynamics from experimental data, Physica D, 20, 217-236.

3. J. P. Burg, A new analysis technique for time series data, in Modern Spectrum Analysis (D. G. Childers, ed.), IEEE Press, New York, 1978, 42-48.

4. D. Kondrashov and M. Ghil, 2006: Spatio-temporal filling of missing points in geophysical data sets, Nonl. Proc. Geophys., 13, 151-159, get it here.

5. Vautard, R., and Ghil, M., 1989: Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series, Physica D, 35, 395-424.