文档库 最新最全的文档下载
当前位置:文档库 › www.nonlin-processes-geophys.net1432007 Author(s) 2007. This work is licensed under a Crea

www.nonlin-processes-geophys.net1432007 Author(s) 2007. This work is licensed under a Crea

www.nonlin-processes-geophys.net1432007  Author(s) 2007. This work is licensed under a Crea
www.nonlin-processes-geophys.net1432007  Author(s) 2007. This work is licensed under a Crea

Nonlin.Processes Geophys.,14,3–4,2007 https://www.wendangku.net/doc/c31884974.html,/14/3/2007/?Author(s)2007.This work is licensed under a Creative Commons

License.Nonlinear Processes in Geophysics

Reply to T.Schneider’s comment on“Spatio-temporal?lling of missing points in geophysical data sets”

D.Kondrashov1and M.Ghil1,*

1University of California,Los Angeles,CA,USA

*Ecole Normale Sup′e rieure,Paris,France

Received:9August2006–Revised:21December2006–Accepted:22December2006–Published:15January2007

First,we thank T.Schneider(TS hereafter)for his posi-tive and constructive comments about Kondrashov and Ghil (2006)(KG hereafter).KG focused on exploiting temporal covariability in geophysical data sets,an idea that Schneider (2001;S01hereafter)had suggested,but not applied to any data,synthetic or geophysical.Two unfortunate inaccuracies –corrected in comments(iii)and(iv)of TS–did crop up when KG described the expectation-maximization(EM)al-gorithm and its regularized version used by S01for?lling in missing data.We regret this slip,being thoroughly familiar with the general EM framework,which we used for prob-ability density estimation when studying multiple weather regimes(Smyth et al.,1999;Kondrashov et al.,2004,2006). We thus agree with TS that the regularized EM algorithm and KG’s method are both based on estimating mean and co-variance components of the gappy data set under study(his comment(iii)),and that several gap-?lling methods,includ-ing regularized EM and our own(multi-channel)singular-spectrum analysis(M-)SSA,rely–among many other as-sumptions–also on the probability of a data point’s absence being independent of the missing value itself(his comment (iv)).

Singular-value decomposition(SVD;Golub and Van Loan,1989)underlies both regression and principal compo-nents analysis,and thus represents a common basis for KG’s M-SSA as well as S01’s regularized EM method.In this reply,we concentrate on discussing several differences be-tween our methods,which might look minor to TS,but lead to differences in computational performance and numerical results in practical applications.We have tried out,before submitting KG,the free gap-?lling software kindly provided on TS’s personal website and plan to add KG’s gap-?lling feature as soon as feasible to the SSA-MTM Toolkit,avail-able for free at https://www.wendangku.net/doc/c31884974.html,/tcd/ssa;see also Ghil et al.(2002).

Correspondence to:D.Kondrashov

(dkondras@https://www.wendangku.net/doc/c31884974.html,)

KG aim to?ll the gaps with smooth information from an iteratively inferred“signal”that represents coherent spatio-temporal structures,and discard the“noise”variance.This idea is best illustrated by our synthetic example of a noise-contaminated oscillatory signal with a gap,see Fig.2in KG. Doing so can be quite valuable in a wide variety of applica-tions,ranging from climate predictability and paleoclimate reconstruction to oceanographic and space physics data.To illustrate as simply as possible the difference between regu-larized EM(S01)and M-SSA(KG),consider a multivariate data set with just one missing value x ij in one record at time t j;the number of points i in each record(channels)is M, while the number of records(i.e.,of sampling times t j)is N. S01(see Eq.1there)stresses regularization of the EM algo-rithm for rank-de?cient cases,while in M-SSA regularization comes in the form of discarding“noise”EOFs,regardless of whether the data set is rank-de?cient or not;see also Fig.A1 in Ghil et al.(2002).

For simplicity,we consider in this reply reconstruction based on spatial correlations only;our method is identical, in this case,to that of Beckers and Rixen(2003),while the emphasis in KG was on the use of purely temporal or mixed, spatio-temporal correlations.In our approach,we start an inner-loop iteration by computing the leading empirical or-thogonal function(EOF)of the centered,zero-padded record. Then we perform the algorithm again on the new time se-ries in which the principal component corresponding to that EOF alone was used to obtain nonzero values in place of the missing point and correct the channel’s mean,the covariance matrix and the EOF.When this inner iteration has converged, we perform an outer-loop iteration by adding a second EOF for reconstruction and repeat the inner iteration.The outer iteration is performed only for a few signi?cant,or“signal”EOFs,whose number is found by cross-validation.Beckers and Rixen(2003)discuss,in their Appendix A,how the bias introduced into the EOFs by missing data disappears as the iteration progresses.

Published by Copernicus GmbH on behalf of the European Geosciences Union and the American Geophysical Union.

4 D.Kondrashov and M.Ghil:Reply to T.Schneider’s comment

In S01,iterations are also used in order to obtain an esti-mate of the missing value x ij,along with the temporal mean in the spatial channel i,and the spatial covariance ma-trix.Estimating the regression coef?cients in the EM algo-rithm with ridge regression has to be done record by record (see Eq.2in S01),including both“signal”and“noise”EOFs in the covariance matrix.As the number of records with missing data increases,KG offers potential computational savings since KG need(and want)to compute only a few leading EOFs in the outer iteration.

KG’s double iteration differs,furthermore,from the Karhunen-Lo`e ve procedure for image processing of Everson and Sirovich(1995),where a previously?xed number of EOFs are estimated simultaneously in a single estimation loop.Everson and Sirovich’s version of gap?lling is closest to using the total truncated least-squares(TTLS)option in combination with the EM algorithm in S01.In this case, the relative computational performance of both KG and EM methods will largely depend on the number of EOFs involved,but ideas from both approaches could be useful in devising even better gap-?lling methods in the future. Edited by:B.D.Malamud

Reviewed by:T.Schneider and another referee References

Beckers,J.and Rixen,M.:EOF calculations and data?lling from incomplete oceanographic data sets,J.Atmos.Ocean.Technol., 20,1839–1856,2003.

Everson,R.M.and Sirovich,L.:The Karhuren-Lo`e ve transform for incomplete data,J.Opt.Soc.Am.,12,1657–1664,1995. Ghil,M.,Allen,R.M.,Dettinger,M.D.,Ide,K.,Kondrashov,D., et al.::Advanced spectral methods for climatic time series,Rev.

Geophys.,40(1),3.1–3.41,doi:10.1029/2000RG000092,2002. Golub,G.H.and Van Loan,C.F.:Matrix Computations(3rd ed.), The Johns Hopkins University Press,Baltimore and London,728 pp,1996.

Kondrashov,D.,Ide,K.,and Ghil,M.:Weather regimes and pre-ferred transition paths in a three-level quasigeostrophic model,J.

Atmos.Sci.,61,568–587,2004.

Kondrashov,D.,Kravtsov,S.,and Ghil,M.:Empirical mode re-duction in a model of extratropical low-frequency variability,J.

Atmos.Sci.,63,1859–1877,2006.

Smyth,P.,Ide,K.,and Ghil,M.:Multiple regimes in Northern Hemisphere height?elds via mixture model clustering,J.Atmos.

Sci.,56,3704–3723,1999.

Schneider,T.:Analysis of incomplete climate data:Estimation of mean values and covariance matrices and imputation of missing values,J.Climate,14,853–871,2001.

Nonlin.Processes Geophys.,14,3–4,https://www.wendangku.net/doc/c31884974.html,/14/3/2007/

相关文档