文档库 最新最全的文档下载
当前位置:文档库 › 由GIMMS NDVI建立的全球LAI和FPARS数据集

由GIMMS NDVI建立的全球LAI和FPARS数据集

由GIMMS NDVI建立的全球LAI和FPARS数据集
由GIMMS NDVI建立的全球LAI和FPARS数据集

Remote Sens.2013, 5, 927-948; doi:10.3390/rs5020927

OPEN ACCESS

Remote Sensing

ISSN 2072-4292

https://www.wendangku.net/doc/0014374291.html,/journal/remotesensing Article

Global Data Sets of Vegetation Leaf Area Index (LAI)3g and Fraction of Photosynthetically Active Radiation (FPAR)3g Derived from Global Inventory Modeling and Mapping Studies (GIMMS) Normalized Difference Vegetation Index (NDVI3g) for the Period 1981 to 2011

Zaichun Zhu 1,2,*,?, Jian Bi 1,?, Yaozhong Pan 2, Sangram Ganguly 3, Alessandro Anav 4,

Liang Xu 1, Arindam Samanta 5, Shilong Piao 6,7, Ramakrishna R. Nemani 8

and Ranga B. Myneni 1

1Department of Earth and Environment, Boston University, 685 Commonwealth Avenue, Boston, MA 02215, USA; E-Mails: bijian.bj@https://www.wendangku.net/doc/0014374291.html, (J.B.); bireme@https://www.wendangku.net/doc/0014374291.html, (L.X.);

ranga.myneni@https://www.wendangku.net/doc/0014374291.html, (R.B.M.)

2 College of Resources Science & Technology, State Key Laboratory of Earth Processes and

Resource Ecology, Beijing Normal University, Beijing 100875, China; E-Mail: pyz@https://www.wendangku.net/doc/0014374291.html,

3 Bay Area Environmental Research Institute, NASA Ames Research Center, Moffett Field,

CA 94035, USA; E-Mail: sangramganguly@https://www.wendangku.net/doc/0014374291.html,

4 College of Engineering, Mathematics & Physical Sciences, Harrison Building, University of Exeter,

North Park Road, Exeter EX4 4QF, UK; E-Mail: A.Anav@https://www.wendangku.net/doc/0014374291.html,

5 Atmospheric and Environmental Research Inc., 131 Hartwell Avenue, Lexington, MA 02421, USA;

E-Mail: arindam.sam@https://www.wendangku.net/doc/0014374291.html,

6 Department of Ecology, Peking University, Beijing 100871, China; E-Mail: slpiao@https://www.wendangku.net/doc/0014374291.html,

7Institute of Tibetan Plateau Research, Chinese Academy of Sciences, Beijing 100085, China

8 NASA Advanced Supercomputing Division, NASA Ames Research Center, Moffett Field,

CA 94035, USA; E-Mail: rama.nemani@https://www.wendangku.net/doc/0014374291.html,

?These authors contributed equally to this work.

*Author to whom correspondence should be addressed;

E-Mails: zzc@https://www.wendangku.net/doc/0014374291.html,; zhu.zaichun@https://www.wendangku.net/doc/0014374291.html,; Tel.: +1-617-353-8828; Fax: +1-617-353-8399.

Received: 28 December 2012; in revised form: 7 February 2013 / Accepted: 16 February 2013 / Published: 22 February 2013

Abstract: Long-term global data sets of vegetation Leaf Area Index (LAI) and Fraction of

Photosynthetically Active Radiation absorbed by vegetation (FPAR) are critical to

monitoring global vegetation dynamics and for modeling exchanges of energy, mass and

momentum between the land surface and planetary boundary layer. LAI and FPAR are also

state variables in hydrological, ecological, biogeochemical and crop-yield models. The

generation, evaluation and an example case study documenting the utility of 30-year long

data sets of LAI and FPAR are described in this article. A neural network algorithm was

first developed between the new improved third generation Global Inventory Modeling and

Mapping Studies (GIMMS) Normalized Difference Vegetation Index (NDVI3g) and

best-quality Terra Moderate Resolution Imaging Spectroradiometer (MODIS) LAI and

FPAR products for the overlapping period 2000–2009. The trained neural network

algorithm was then used to generate corresponding LAI3g and FPAR3g data sets with the

following attributes: 15-day temporal frequency, 1/12 degree spatial resolution and

temporal span of July 1981 to December 2011. The quality of these data sets for scientific

research in other disciplines was assessed through (a) comparisons with field

measurements scaled to the spatial resolution of the data products, (b) comparisons with

broadly-used existing alternate satellite data-based products, (c) comparisons to plant

growth limiting climatic variables in the northern latitudes and tropical regions, and

(d) correlations of dominant modes of interannual variability with large-scale circulation

anomalies such as the EI Ni?o-Southern Oscillation and Arctic Oscillation. These

assessment efforts yielded results that attested to the suitability of these data sets for

research use in other disciplines. The utility of these data sets is documented by comparing

the seasonal profiles of LAI3g with profiles from 18 state-of-the-art Earth System Models: the

models consistently overestimated the satellite-based estimates of leaf area and simulated

delayed peak seasonal values in the northern latitudes, a result that is consistent with

previous evaluations of similar models with ground-based data. The LAI3g and FPAR3g

data sets can be obtained freely from the NASA Earth Exchange (NEX) website.

Keywords: LAI; FPAR; NDVI3g; MODIS; NASA NEX; artificial neural networks;

remote sensing of vegetation

1. Introduction

Monitoring and modeling global vegetation dynamics in the context of climate variability and change studies require long-term data sets of key biophysical variables that characterize vegetation structure and functioning [1]. Leaf Area Index (LAI) and the Fraction of Photosynthetically Active Radiation absorbed by vegetation (FPAR) are two examples of such variables. LAI is defined as the one-sided green leaf area per unit vegetated ground area in broadleaf canopies and as one-half the total needle surface area per unit vegetated ground area in coniferous canopies. It characterizes the physiologically functioning surface area with which energy, mass (e.g., water and CO2) and momentum are exchanged between the vegetated land surface and the planetary boundary layer [2]. Similarly, FPAR is a relative measure of the vegetation-absorbed radiation in the 0.4–0.7 μm spectral region of solar radiation, and hence, characterizes the energy that is potentially used in the process of

photosynthesis. LAI and FPAR are therefore key state variables in many biogeochemical, ecological, hydrological and crop yield models [3–17].

There are several different approaches for estimating LAI and FPAR from remotely sensed reflectance data in the optical domain, i.e., the wavelength span of solar radiation. They can be broadly be categorized as:

(a) Empirical methods based on relationships between vegetation indices, e.g., the Normalized Difference Vegetation Index (NDVI), and LAI or FPAR [18,19]. These relationships are generally sensitive to soil background, leaf optical properties, the orientation and spatial distribution of leaves in a canopy and the general architecture of vegetation stands within the spatial scale of measurements [20]. Site- and vegetation-specific empirical relationships between NDVI and LAI, for example, have been used in some studies [21,22]. The relationships tend to vary seasonally and inter-annually. Consequently, empirical methods tend to be site-, time-, and species-specific, and are therefore not well-suited for large-scale operational use [23].

(b) Physical methods based on the physics of radiation interaction with elements of a canopy and transport within the vegetative medium. These methods provide a physically-based linkage between biophysical variables and vegetation canopy reflectance at different wavelengths [24–26]. These methods can be categorized into four broad groups: (1) radiative transfer models [24,25], (2) geometric-optical models [27], (3) hybrid models that incorporate both radiative transfer as well as geometric-optics [28], and (4) Monte-Carlo simulation models [29,30]. The methods involve iterative techniques and are thus computationally intensive for operational use. But, methods to alleviate this have also been developed, e.g., use of Look-Up-Tables [25].

(c) Machine learning algorithms that are accurate, fast and require less computational power are increasingly being used lately to mimic the underlying physical processes in the remote sensing of vegetation [31–33]. The efficacy of these algorithms is dependent on a knowledge-based inference paradigm that is dependent on the robustness and availability of training data.

LAI and FPAR products from the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Système Pour l’Observation de la Terre (SPOT) sensor have gradually acquired a large user community due to ease of access, provision of pixel-level quality indicators and validation information. Research on inter-sensor product consistency and collaborative validation efforts have helped provide accuracy and precision information of existing products [34–40]. A decade-long global and regional data sets of LAI and FPAR from these sensors are now available for scientific use. These records will likely be extended by the Visible/Infrared Imager Radiometer Suite instrument onboard the Suomi National Polar-orbiting Partnership, the Advanced Baseline Imager onboard the Geostationary Operational Environmental Satellite-R series satellite, the Charge-Coupled Device onboard the Huan Jing series satellites and the Advanced Visible and Near Infrared Radiometer onboard Advanced Land Observation Satellite [41–45]. These existing products are of short time span, thus precluding determination of long-term trends. Thus, there is a continuing interest and need for utilizing data from the Advanced Very High Resolution Radiometers (AVHRR) sensors, which is now more than three decades long and continuing.

The first generation NDVI data (NDVIg) from AVHRR sensors onboard the National Oceanic and Atmospheric Administration (NOAA) 7 to 14 series of satellites have been processed by the Global Inventory Modeling and Mapping Studies (GIMMS) group to a consistent time series of NDVI and is

made available to the research community [46]. The latest version, termed the third generation NDVI data set (GIMMS NDVI3g) has been recently produced for the period July 1981 to December 2011 with AVHRR sensor data from NOAA 7 to 18 satellites. This data set specifically aims to improved data quality in the high latitudes where the growing season is shorter than 2 months. It has also improved calibration that is tied to the Sea-Viewing Wide-Field-of-View Sensor, as opposed to earlier versions of GIMMS NDVI data sets that were based on inter-calibration with the SPOT sensor. The availability of this new improved NDVI3g data set and its overlap with the Terra MODIS LAI and FPAR products for the period 2000 to 2009 provides an opportunity to design and implement a neural network algorithm to generate and evaluate the corresponding LAI and FPAR data sets—that is the objective of this article. These data sets will be termed LAI3g and FPAR3g henceforth and have the following attributes: 15-day temporal frequency, 1/12 degree spatial resolution and temporal span of July 1981 to December 2011.

This following presentation is organized as follows. Section 2 describes the algorithmic details and generation of the LAI3g and FPAR3g data sets. Section 3 is focused on validation and evaluation of these data sets in order to assess their suitability for research use in other disciplines. Section 4 describes a test case where the seasonal profiles of LAI3g are compared to simulations from 18 state-of-the-art Earth System Models to document the utility of these data sets. Concluding remarks are briefly stated in Section 5.

2. Production of LAI3g and FPAR3g Data Sets

2.1. Input Data and Preprocessing

We used improved versions of Collection 5 Terra MODIS LAI and FPAR products and the NDVI3g data for developing the algorithm. The MODIS BNU (Beijing Normal University version) LAI product is an improved version of the standard MODIS LAI product (MOD15A2) which provides 8 day global LAI data from 2000 to 2009 at 1 km spatial resolution [47]. The MODIS BU (Boston University) FPAR product is also an improved version of the standard MODIS FPAR product which provides monthly global FPAR data from 2000 to 2010 at 0.072 degree spatial resolution [48]. The accuracy of the MODIS LAI and FPAR products are 0.66 LAI units RMSE and 0.12 FPAR units RMSE respectively [49]. The improved MODIS LAI and FPAR all provide higher accuracy due to spatial temporal filtering and introducing of quality flags [47,48]. The three data sets—GIMMS NDVI3g, MODIS BNU LAI and MODIS BU FPAR—were resampled and composited to a uniform spatial grid and temporal frequency. The details are given in Sections S5 and S6 of the supplementary material. The generation of LAI3g and FPAR3g required a land cover classification product. We used the Collection 5 MODIS land cover product (MCD12C1) with International Geosphere Biosphere Programme (IGBP) classes. This product was resampled to match the spatial resolution of the NDVI3g data (1/12 degree) using the nearest neighbor algorithm. The IGBP classes are defined in [50]. It should be noted that we used the constant land cover map because we do not have access to land cover maps of equal quality and accuracy for the entire research period. This may lead to some uncertainties in our products due to the land cover change in the thirty-year period [2,51,52].

2.2. Algorithm Development

We used Feed-Forward Neural Network (FFNN) as the algorithm to generate LAI3g and FPAR3g data sets. The FFNN models consisted of four neurons in the input layer (four input parameters corresponding to the land cover class, pixel-center latitude, pixel-center longitude, and NDVI3g), 11 neurons in the hidden layer and 1 neuron in the output layer (LAI3g or FPAR3g). These models were trained through Back-Propagation process, which is one of the most popular and widely-used method for training neural networks [53]. A FFNN model was generated for each month; thus producing a set of 12 FFNN models for generating LAI3g and another set of 12 FFNN models for generating FPAR3g. These 24 FFNN models were developed with the data sets described in Section 2.1. To prevent over-fitting and test the performance of the FFNN, the training data set was split into three sets: 70% as training data, 15% as validation data and 15% as test data. The network was trained with training data until its performance began to decrease on the validation data, which means that generalization has peaked. Ten networks with different initial values were trained independently. The network providing the best performance was selected as the final FFNN model that was used for generating LAI3g and FPAR3g data sets. More detailed technical descriptions are given in Section S7 of the supplementary material.

2.3. LAI3g and FPAR3g Production

The NDVI3g data from July 1981 through December 2011 were used together with the 24 trained FFNN models to generate the corresponding LAI3g and FPAR3g data sets. These data sets have the same attributes as the input NDVI3g data: 1/12 degree spatial resolution and 15-day temporal frequency. Figure 1(a,b) shows color-coded maps of 30-year averages of annual mean LAI3g and FPAR3g. Figure 1(c,d) shows the time series of LAI3g anomalies for different latitudinal zones and land cover classes. The impact of Mount Pinatubo eruption in mid-1991 and significant orbit loss of NOAA 11 is clearly visible in the time series, especially in the tropics and in the forested regions of the globe. Data from year 2011 also exhibit significant positive anomalies, reasons for which are not known.

3. Assessment of LAI3g and FPAR3g Data Sets

The two objectives of our assessment of LAI3g and FPAR3g data sets are: (a) to provide uncertainty estimates through comparisons with field measurements and (b) to evaluate their suitability for use in research related to climate, hydrological, ecological, biogeochemical and crop yield models [3–17]. Analyses related to meet these objectives are described below.

3.1. Uncertainty Assessment

Providing uncertainty estimates for coarse resolution (1/12 degree) LAI3g and FPAR3g data sets is a challenging task as it requires a large number of comparable values derived from ground measurements. Further, the comparisons should be made for all major vegetation types and also cover the phenological cycle. Field campaigns are man-power intensive and therefore expensive. There have been very few suitable field campaigns before the NASA Earth Observing System (EOS) era, i.e., prior to year 2000. However, since the launch of Terra MODIS instrument, the scientific community has collaboratively developed a network of sites—e.g., BigFoot, AErosol RObotic NETwork, FLUXNET,

EOS Land Validation Core Sites, VAlidation of Land European remote sensing Instruments, etc.—data from which have been used to validate moderate resolution (1 km) MODIS LAI and FPAR products [36,54–63]. In the process, the community has also developed common protocols for sampling the field sites and scaling methodologies that translate point-based field measurements of LAI and FPAR to the spatial scale of remotely-sensed products to facilitate accurate validation [35,64]. These efforts have resulted in establishing uncertainty estimates for the MODIS LAI and FPAR products [36].

Figure 1. Leaf Area Index (LAI)3g and Fraction of Photosynthetically Active Radiation

(FPAR)3g products. (a) Thirty year average annual mean LAI3g. (b) Thirty year average

annual mean FPAR3g. (c) Time series of LAI3g anomalies for different latitudinal bands.

(d) Time series of LAI3g anomalies for different vegetation types. The background

shading in (c) and (d) shows the occurrence and intensity of EI Ni?o-Southern Oscillation

(ENSO) events as defined by the Multivariate ENSO Index. The black dashed lines

indicate transition times for the various National Oceanic and Atmospheric Administration

(NOAA) satellites (N07 to N18). The two major volcanic eruptions (El Chichón and Mount

Pinatubo) and the two recent Amazonian droughts are depicted by the orange and purple

dashed lines, respectively.

(a) (b)

(c) (d)

Most of the field campaigns, and therefore, the validation efforts have been focused on the MODIS LAI product. The FPAR product is a by-product of the MODIS LAI algorithm and the underlying relationship between LAI and FPAR is based on the physics of radiative transfer [25]. The underlying relationship alleviates the need for independent and comprehensive validation of the FPAR3g product.

In this study, we selected sites with field measurements recorded over regions with more or less homogeneous groups of land cover classes, called biomes. Even then, a pixel-by-pixel comparison between LAI3g and field-measured LAI values scaled to LAI3g spatial resolution is not feasible because: (a) the spatial location of the satellite pixel contains uncertainties due to geo-location errors and pixel-shift errors resulting from the point spread function [65], and (b) point field measurements scaled to the resolution of the sensor necessarily involves uncertainty arising from the scaling methodology [55–59,64]. Therefore, the comparisons were performed on groups of pixels belonging to a particular biome and the assessment was based on the distribution properties of the respective values [36,57–59]. Specifically, the LAI3g data were compared to 45 sets of appropriately scaled field measurements from 29 sites listed in Table A4 of [40]. Monthly LAI3g values from nearby pixels of the same biome type were used for comparison purposes. The results indicate satisfactory agreement between LAI3g and scaled field measurements (p < 0.001; RMSE = 0.68 LAI) (Figure 2). This RMSE value may be taken as the uncertainty estimate of the LAI3g product, i.e., the average difference between LAI3g and ground truth value of LAI at the spatial resolution of the LAI3g product (1/12 degree).

Figure 2. Comparison of LAI3g with scaled field measurements from six biomes

representative of the global land cover classes. A total of 45 field data sets from 29 sites

listed in Table A4 of [40] were used (details of field data handling to derive LAI values

comparable to satellite retrievals of LAI can be found in [36]).

3.2. Evaluation-Part 1: Comparison with the CYCLOPES LAI and FPAR Products

The goal of evaluation of LAI3g and FPAR3g products is to further imbue confidence in the use of these data sets in studies on monitoring of global vegetation dynamics and in modeling and applications research. One way to achieve this goal is to compare the new products (LAI3g and FPAR3g) with those already in use by the research community. The Carbon Cycle and Change in Land

Observational Products from an Ensemble of Satellites (CYCLOPES) LAI and FPAR products (version 3.1) derived from the SPOT VEGETATION sensor are available at 1/112° Plate-Carrée spatial resolution and 10-day temporal frequency [66]. These products have reached a level of maturity comparable to the MODIS LAI and FPAR products [38]. We compared LAI3g and FPAR3g products with CYCLOPES LAI and FPAR products at global and site scale-information regarding the required preprocessing for these comparative analyses is given in Section S3. All data from the overlapping period between the two product sets, years 1999 to 2007, were used.

3.2.1. Global Scale Comparison

Figure 3 shows a comparison between LAI3g and corresponding CYCLOPES LAI values for four broad vegetation classes (Table S5) at the monthly time scale. The analysis suggests: (a) only in cropland/natural vegetation mosaics, the two products show satisfactory agreement (slope close to unity and minimal bias, (b) the slopes are considerably larger than unity in the case of forests and other woody vegetation classes, and (c) the slope is less than unity in the case of herbaceous vegetation. To investigate the general disagreement between the two LAI products, annual mean LAI from the two data sets for each of the IGBP land covers was evaluated (Table 1). Annual mean LAI3g is greater than the corresponding CYCLOPES LAI for all IGBP land covers with the exception of mixed forests and evergreen needleleaf forests. It is evident that the large disagreement in the case of forests (Figure 3(a)) is largely due to the Evergreen broadleaf forest cover type, which occupies 12.87% of the global total vegetated area (46.40% of the total forested area) and contributes to the maximum difference between the two data sets (1.03 in absolute LAI units). The CYCLOPES algorithm in general produces saturated LAI values at low values of LAI [38,39]. Results from a similar analysis between FPAR3g and CYCLOPES FPAR are shown in Figure S4. Besides, comparisons of multi-year average monthly values from CYCLOPES and our products are presented in Figure S5 (CYCLOPES LAI and LAI3g) and Figure S6 (CYCLOPES FPAR and FPAR3g).

Figure 3. Comparison of monthly LAI values from CYCLOPES and LAI3g data sets for

four broad vegetation classes (forests, herbaceous vegetation, other woody vegetation and

cropland/natural vegetation mosaics) for the period 1999 to 2007. These classes are groups

of International Geosphere Biosphere Programme (IGBP) land cover types as per Table S5.

(a) (b)

Figure 3.Cont.

(c) (d)

Table 1. Comparison of annual mean values of LAI3g and CYCLOPES LAI for different

IGBP land covers. The table is sorted by descending order of the area fraction of the land

covers. The data are averages for the years 1999 to 2007.

IGBP Land Covers GIMMS LAI3g CYCLOPES LAI Area Fraction (%)

Open shrublands 0.56 0.45 19.18

13.89

Grasslands 0.6

0.45

Evergreen broadleaf forests 4.19 3.16 12.87

Woody savannas 1.72 1.44 12.29

10.77

1.01

Croplands 1.05

8.1

1.17

Savannas 1.51

Cropland/natural vegetation mosaics 1.89 1.59 6.71

Mixed forests 1.94 1.95 5.86

Evergreen needleleaf forests 1.43 1.69 5.35

Deciduous needleleaf forests 1.46 1.4 2.08

Deciduous broadleaf forests 2.34 1.91 1.58

Closed shrublands 0.81 0.57 1.33

3.2.2. Site Scale Comparison

In this exercise, CYCLOPES LAI values from the Benchmark Land Multisite Analysis and Intercomparison of Products (BELMANIP) sites [67] were compared to corresponding LAI3g values.

We chose sites representative of four broad vegetation classes (Table S5), each of areal extent 24 × 24 km2 (about 3 × 3 GIMMS pixels) and calculated monthly LAI values. Figure 4 shows comparison plots between the two products for 323 BELMANIP sites. In all cases, LAI3g and CYCLOPES LAI values

lie in the proximity of the 1:1 line (slopes of 1.05, 1.11, 1.06 and 1.02, respectively, with corresponding offsets of 0.53, 0.26, 0.22 and 0.20). LAI3g explains 72.0%, 84.0%, 82.0% and 81.0%

of the variability in the CYCLOPES LAI data and, on average, shows an error of 0.92, 0.54, 0.45 and

0.52 (in absolute LAI units). These results indicate that the agreement between the two data sets is better at the site scale than at global scale analysis, probably because of higher homogeneity of vegetation types at the sites. Results from a similar analysis between FPAR3g and CYCLOPES FPAR

are shown in Figure S7.

Figure 4. Density scatter plots of monthly LAI3g and CYCLOPES LAI for 323

BELMANIP sites for the time period from 1999 to 2007. The plots show correlation

between the two products for four broad groups of vegetation which are grouping of the

IGBP land covers (Table S5). The black dash line is the 1:1 line. The solid black lines are

regression lines derived from the scatter plot.

(a) (b)

(c) (d)

3.3. Evaluation-Part 2: Comparison with Climatic Variables

A second method of evaluating LAI3g and FPAR3g data sets is to assess the degree of statistical association between these and climatic variables that limit plant growth [40,68]. Temperature, solar radiation and precipitation are the three key climatic variables that govern plant growth [69]. Vegetated areas showing temperature limitations to plant growth are mostly located in the northern latitudes, while areas strongly governed by precipitation are located in the tropical latitudes [69]. Therefore, examining the statistical association between covariations of LAI3g (and FPAR3g) and temperature in the northern latitudes and precipitation in the tropical regions provides an independent means of evaluating the new data sets. It is important to note that the LAI3g and FPAR3g data sets were generated without using climatic data—thus, the following statistical analyses are indeed independent evaluations of these data sets.

3h 1F s p e g li o r r c d 3o e in p 3.3.1. LAI3g Vegetatio have been r 1999 [70–76First, we ca surface temp pixels in the eliminate ge growing sea ikelihood o of growing s resulted in a reproduced continuing a data are show Figure precip season

latitud inset averag

annual are sho 3.3.2. LAI3g Pixel-lev overlapping exhibits grea nto 18 inter

pixels in eac g Variation on photosyn reported to 6]. We exam alculated th perature for e 50°N to 9eometrical ason averag of spurious season aver a statisticall previously as a result o wn in Figur e 5. Statisti itation in th n (May to S des (50°N–9in (a) show ges of LAI l total preci own in Figu g Variation el data of record lengt ater spatial v rvals of wid

ch of these with Surfac nthetic activi increase as mine if the he approxim r the region 90°N latitud effects. Fig ges of LAI3correlation,rages of LA ly significa published of continuin re S8(a).

ical evaluat he tropical r September)90°N) for th ws tempora 3g and tem ipitation in ure S6.

(a )

with Precip annual tota th of these t variability. T dth 100 mm/

18 precipit ce Tempera ity and net s a result o LAI3g and mate growin n 50°N to 9dinal zone w gure 5(a) sh 3g and surf , we also an AI3g and sur ant correlati analyses bu ng surface tion of LAI regions. (a ) averages o he overlappi al variation mperature. the tropica pitation in th al precipita two variable The precipit /year. The m

tation interv ature in the N primary pro of amplified d FPAR3g ng season (90°N for ea were area-w hows a stat face temper nalyzed the rface tempe ion (R 2 = 0ut also show warming. R I3g with te ) Statistical of LAI3g a ing period o ns of stand (b ) Correla al latitudes (he Tropical ation and a es (1982 to 2tation range mean and st

vals were ev Northern La oduction in d surface w data sets re (May to Se ach year of weighted wi tistically si ature (R 2 =e relationshi erature [71]0.574, p < 0w that the n Results from emperature analyses b and surface of the two d dardized an ation betwe (23°S–23°N Regions annual mean 2009) becau e, 200 mm/y tandard dev

valuated. T atitudes the northern warming dur eproduce an eptember) a f the time s ith the squa gnificant re = 0.525, p

n LAI3g w use precipita year to 2,000viation of an

The resulting rn latitudes (ring the per nd extend t averages of series. All t are root of elationship < 0.001). T standardize nset). This ese results t atitude green r analysis w hern latitud proximate g re in the no 982 to 2009f growing mean LAI3plots for FP were averag ation, unlike 0 mm/year, nnual mean

g relationsh (50°N–90°N riod 1981 t these result f LAI3g an the vegetate pixel area t between th To reduce th ed anomalie analysis als thus not onl ning trend with FPAR3des and

rowing

orthern 9). The

season

3g and PAR3g

ged over th e temperatur was divide LAI3g of a

hip, shown i N) to ts. nd ed to he he es so ly is 3g he re, ed all in

Figure 5(b), indicates a statistically significant relationship (R2 = 0.988, p < 0.001). The dispersion in annual mean LAI3g (standard deviation plotted along the y-axis of Figure 5(b)) is greater than the dispersion in annual total precipitation (plotted along the x-axis of Figure 5(b)). This indicates that although precipitation may be the dominant control of plant growth in the tropical regions, other factors also play an important role [69]. Results from a similar analysis with FPAR3g are shown in Figure S8(b).

3.4. Evaluation-Part 3: Identifying Dominant Modes of Interannual Variability Using CCA

The correlations observed between LAI3g and temperature (Figure 5(a)) and between LAI3g and precipitation (Figure 5(b)) can be explained in terms of large-scale circulation anomalies, such as the EI Ni?o-Southern Oscillation (ENSO) and Arctic Oscillation (AO) [40,77]. The canonical correlation analysis (CCA) is well-suited to explore these connections as it seeks to estimate dominant and independent modes of co-variability between two sets of spatio-temporal variables [77], e.g., LAI3g and temperature.

The CCA is designed to select those temporal features in the LAI3g, or FPAR3g, fields that are best correlated with temporal features in springtime climatic variables such as temperature and/or precipitation. The methodology is illustrated here with LAI3g and “climatic variables” below denote either temperature or precipitation. Springtime (March to May) average values of LAI3g and the corresponding climatic variable for each pixel is denoted as a variable (for temperature, the total number of variables is the number of pixels in the latitudinal zone 10°N to 90°N; for precipitation, the total number of variables is the number of pixels in the latitudinal zone 40°S to 40°N). Further, each year is denoted as an observation, i.e., 28 observations for the overlapping time period—years 1982 to 2009. The anomaly fields of these variables for each pixel were weighted by the respective pixel area to avoid geometrical effects. Each set of variables was then transformed into principal components (PCs) using singular value decomposition. In each case, only the first six PCs were retained as they explain a large fraction of the variance in the input set of variables (LAI3g: 58.34%; FPAR3g: 56.55%; temperature: 69.88%; precipitation: 47.34%). PCs of LAI3g and climatic variables were input to the CCA. The CCA generates two canonical loading matrices, one for PCs of LAI3g and the other for PCs of the climatic variable. These are then used to construct canonical factors (CFs) from the original PC time series. This results in an eigenvalue matrix that depicts the correlation between the CFs. These eigenvalues (Table 2) suggest that the correlations between the first two CFs are reasonably high (r > 0.6), with the next two being moderate and the last two showing the weakest correlation. Table 2 indicates that: (1) there are strong correlations between CFs of LAI3g (and FPAR3g) and climatic variables;

(2) the first two CFs are suitable to explore correlations with ENSO or AO.

We used the September to November of the preceding year (SON-1) NINO3 index to represent ENSO and the January to March (JFM) average AO index because strong correlations between these indices with springtime climatic variables and NDVI were previously reported [77]. Figure 6(a) shows

a low correlation between SON-1 NINO3 index and the first CFs of temperature and LAI3g (0.17 and

0.10, respectively). Compared to a previous study [77], the decline in correlation may be attributed to weak ENSO activity during the past decade [40]. We did observe a strong correlation between SON-1 NINO3 and the first CFs of temperature and LAI3g during the period 1982 to 1998 (0.66 and 0.67,

respectively; figure not shown for brevity), consistent with results reported in [77]. Figure 6(b) shows a moderately strong correlation between JFM AO index and the second CFs of temperature and LAI3g (0.57 and 0.45, respectively), which is concordant with earlier studies [40,74].

Table 2. Eigenvalues from the canonical correlation analysis (CCA) of springtime

anomalies of vegetation biophysical (LAI3g and FPAR3g) and climate variables

(Temperature, abbreviated as TEMP; Precipitation, abbreviated as PRECIP). The

eigenvalues represent the squared correlation between the reconstructed temporal canonical

factors of biophysical and climatic variables.

Canonical Factors

Eigenvalues

LAI3g/TEMP LAI/3gPRECIP FPAR3g/TEMP FPAR3g/PRECIP

1 0.93 0.90 0.93 0.86

2 0.86 0.6

3 0.91 0.72

3 0.69 0.49 0.80 0.49

4 0.57 0.40 0.78 0.47

5 0.33 0.29 0.54 0.12

6 0.02 0.09 0.00 0.04

Figure 6. Correlations between the standardized time series of the first and second canonical factors (CF1 and CF2) of land surface temperature, precipitation and LAI3g with

NINO3 and AO indices in the northern (10°N to 90°N) and tropical/extra-tropical regions

(40°S to 40°N) for the period 1982 to 2009. The standardized September through November average NINO3 index time series of the preceding year and the January through

March average AO index are shown in these plots as black dash lines.

(a) (b)

(c) (d)

Reasonable correlations were also observed between SON-1 NINO3 and the first CFs of precipitation and LAI3g (0.50 and 0.48, respectively; Figure 6(c)). This is consistent with previous reports of ENSO influence on interannual variability of tropical and extra-tropical precipitation [78]. The second CFs of precipitation and LAI3g are uncorrelated with the JFM AO index (Figure 6(d)), which is expected as the AO is not known to be a prime driver of precipitation in the 40°S to 40°N latitudinal zone [78]. The CCA was also performed on the FPAR3g data set. The results are similar to those reported here (Figure S9).

In summary, in the northern hemisphere in contrast to the period 1982 to 1998, ENSO driven linked variations between surface temperature and vegetation activity have weakened since 2000. In the tropical and extra-tropical regions, ENSO is still a strong driver of precipitation and vegetation variability. The influence of AO remains a factor in the interannual variability of temperature and vegetation activity in the northern hemisphere over the past three decades. These results reproduce published reports [40,77] but also reflect an updated picture. Additional analysis to assess the impact of loss of NOAA 11 orbit on these results is presented in Section S8.

4. Simple Case Study to Illustrate the Utility of LAI3g and FPAR3g Data Sets

Earth System Models (ESMs) are being used to project changes in different components of the climate system for various forcing scenarios [79]. The latest generation of ESMs include dynamic vegetation models that simulate global vegetation dynamics [80]. The confidence in model projections depends to a large degree on how well these models can simulate present-day climatic variables and several studies are geared to address this e.g., [81]. Here, we illustrate the utility of the newly developed LAI3g and FPAR3g data in the context of model evaluation with a simple case study where we compare LAI3g to model simulated values (the ESM output did not include the FPAR variable).

A strict comparison is difficult, if not impossible, because of the manner in which vegetation is modeled within the grid cell in each model and lack of this quantitative information. Comparisons at the grid cell-level are also difficult because of differences in spatial resolution of the ESMs (model grid cells varying from 0.938 to 3.75 degree) and satellite products (1/12 degree). Therefore, we performed comparisons at the zonal scale. We chose six latitudinal bands that may be approximately characterized as: (a) Arctic (65°N–90°N), (b) Boreal (45°N–65°N), (c) Northern Temperate (23°N–45°N), (d) Northern Equatorial (0°–23°N), (e) Southern Equatorial (0°–23°S) and (f) Southern Temperate (23°S–90°S). We evaluated the vegetated fraction of each model grid cell using the satellite-data based land cover map (Section S5.2) because of the difficulty of evaluating this quantity for each model grid cell and for all the 18 models used in this exercise.

Figure 7 shows the annual cycle of LAI3g and the ensemble mean LAI from 18 ESMs from the Coupled Model Intercomparison Project 5 (CMIP5) for the six latitudinal zones. The annual cycle of LAI3g is generally about the ensemble mean LAI minus one standard deviation of the ESM LAI. The bell-shaped patterns of LAI3g and the ensemble mean ESM LAI in the six latitudinal zones are approximately consistent. However, the timing of maximum ESM LAI is delayed by about 1 month relative to the LAI3g in the Arctic and Boreal zones. Comparison of MODIS LAI with the Community Land Model and Carbon-Nitrogen model simulations, also show that the timing of the maximum LAI lagged the observations by 1–2 months [82]. The fact that the ESMs are “greener” and their

phenological cycle lags observations has important implications for simulated fluxes of energy, mass and momentum in the ESMs. Could it be that the dynamic vegetation models overestimate carbon fixation and/or allocation of biomass to leaves? This simple exercise illustrates the utility of satellite-data based products by highlighting an area of research that requires refinement in ESMs [83].

Figure 7. Comparison of 1982 to 2005 average seasonal cycle between LAI simulated by

18 Earth System Models (ESMs) and LAI3g. The shaded area shows the standard variation

for the 18 ESMs. This analysis is based on the assumption that ESM LAI is defined with

respect to the vegetated area of the model grid cell. In the southern hemisphere, dates for

regions south of 23°S are shifted by 6 months.

(a) (b)

(c) (d)

(e) (f)

5. Concluding Remarks

The objective of this work was to generate, assess and document the utility of long-term (30-year) global LAI and FPAR data sets. The availability of a new improved NDVI data set from the NASA GIMMS research group, termed the third generation NDVI, or NDVI3g, and its overlap with the Terra MODIS LAI and FPAR products provided an opportunity to design and implement a neural network algorithm to generate the corresponding LAI3g and FPAR3g data sets with the following attributes: 15-day temporal frequency, 1/12 degree spatial resolution and temporal span of July 1981 to December 2011.

The suitability of LAI3g and FPAR3g data sets for monitoring and modeling global vegetation in the context of climate, biogeochemistry, eco-physiology, hydrology and agriculture [3–17] was comprehensively assessed. The LAI3g data were compared to 45 sets of appropriately scaled field measurements from 29 sites representative of all major biomes. The results indicated satisfactory agreement (p < 0.001; RMSE = 0.68 LAI). Compared to the widely-used alternate CYCLOPES LAI and FPAR products, the LAI3g and FPAR3g showed higher values, especially in forests—this is concordant with previous reports of CYCLOPES products as underestimates of ground truth values of LAI and FPAR. The LAI3g and FPAR3g products exhibited expected behavior with respect to their relationship with climatic variables: temperature in the northern latitudes and precipitation in the tropical regions. The interannual variability embedded in the LAI3g and FPAR3g data sets was evaluated for its relationship to large scale circulation anomalies that are dominant modes of interannul variability in climatic variables. The resulting strong correlations between the two dominant modes of variability in LAI3g (and FPAR3g) with ENSO and AO imbued confidence in the interannual variability of these data sets.

The utility of these data sets was documented by comparing the annual profile of LAI3g with profiles generated by 18 Earth System Models for various latitudinal zones. The results indicated that the models consistently overestimated satellite-based estimates of leaf area. Moreover, the models’ simulation of the seasonal cycle in the northern latitudes is at odds with the satellite product. Ground based studies have confirmed this inability of models to simulate the phenological cycle, which lends support to the seasonal cycle observed in the satellite-based LAI3g product. Finally, the LAI3g and FPAR3g data sets can be obtained freely from the NASA Earth Exchange (NEX) web site.

Acknowledgments

We thank C. J. Tucker and J. Pinzon of NASA GSFC for making available the GIMMS NDVI3g data set. We also thank BELMANIP Project for providing the BELMANIP site for validation of our products. This study was partially funded by the China Scholarship Council and NASA Earth Science Division.

References

1.Sellers, P.J.; Tucker, C.J.; Collatz, G.J.; Los, S.O.; Justice, C.O.; Dazlich, D.A.; Randall, D.A.

A revised land surface parameterization (SiB2) for atmospheric GCMS. Part II: The generation of

global fields of terrestrial biophysical parameters from satellite data. J. Climate1996, 9, 706–737.

2.Myneni, R.B.; Hoffman, S.; Knyazikhin, Y.; Privette, J.L.; Glassy, J.; Tian, Y.; Wang, Y.; Song, X.;

Zhang, Y.; Smith, G.R.; Lotsch, A.; Friedl, M.; Morisette, J.T.; Votava, P.; Nemani, R.R.;

Running, S.W. Global products of vegetation leaf area and fraction absorbed PAR from year one of MODIS data. Remote Sens. Environ.2002, 83, 214–231.

3.Sellers, P.J.; Mintz, Y.; Sud, Y.C.; Dalcher, A. A simple biosphere model (sib) for use within

general-circulation models. J. Atmos. Sci. 1986, 43, 505–531.

4.Melillo, J.M.; McGuire, A.D.; Kicklighter, D.W.; Moore, B.; Vorosmarty, C.J.; Schloss, A.L.

Global climate change and terrestrial net primary production. Nature1993, 363, 234–240.

5.Ji, J. A climate-vegetation interaction model: Simulating physical and biological processes at the

surface. J. Biogeogr. 1995, 22, 445–451.

6.Foley, J.A.; Prentice, I.C.; Ramankutty, N.; Levis, S.; Pollard, D.; Sitch, S.; Haxeltine, A.

An integrated biosphere model of land surface processes, terrestrial carbon balance, and vegetation dynamics. Glob. Biogeochem. Cy. 1996, 10, 603–628.

7.Bonan, G.B.; Levis, S.; Sitch, S.; Vertenstein, M.; Oleson, K.W. A dynamic global vegetation

model for use with climate models: Concepts and description of simulated vegetation dynamics.

Glob. Change Biol. 2003, 9, 1543–1566.

8.Zhang, P.; Anderson, B.; Tan, B.; Huang, D.; Myneni, R. Potential monitoring of crop production

using a satellite-based Climate-Variability Impact Index. Agr. Forest Meteorol. 2005, 132, 344–358.

9.Krinner, G.; Viovy, N.; de Noblet-Ducoudré, N.; Ogée, J.; Polcher, J.; Friedlingstein, P.; Ciais, P.;

Sitch, S.; Prentice, I.C. A dynamic global vegetation model for studies of the coupled atmosphere-biosphere system. Glob. Biogeochem. Cy.2005, doi: 10.1029/2003GB002199.

10.Demarty, J.; Chevallier, F.; Friend, A.D.; Viovy, N.; Piao, S.; Ciais, P. Assimilation of global

MODIS leaf area index retrievals within a terrestrial biosphere model. Geophys. Res. Lett.2007, doi: 10.1029/2007GL030014.

11.Wesely, M.L. Parameterization of surface resistances to gaseous dry deposition in regional-scale

numerical models. Atmos. Environ.1989, 23, 1293–1304.

12.Lu, L.; Shuttleworth, W.J. Incorporating NDVI-Derived LAI into the climate version of RAMS

and its impact on regional climate. J. Hydrometeorol.2002, 3, 347–362.

13.Brown, M.E.; Pinzon, J.E.; Didan, K.; Morisette, J.T.; Tucker, C.J. Evaluation of the consistency

of long-term NDVI time series derived from AVHRR,SPOT-vegetation, SeaWiFS, MODIS, and Landsat ETM+ sensors. IEEE Trans. Geosci. Remote Sens.2006, 44, 1787–1793.

https://www.wendangku.net/doc/0014374291.html,thière, J.; Hauglustaine, D.A.; Friend, A.; de Noblet-Ducoudré, N.; Viovy, N.; Folberth, G.

Impact of climate variability and land use changes on global biogenic volatile organic compound emissions. Atmos. Chem. Phys. Discuss.2005, 5, 10613–10656.

15.Alessandri, A.; Gualdi, S.; Polcher, J.; Navarra, A. Effects of land surface-vegetation on the

boreal summer surface climate of a GCM. J. Climate2007, 20, 255–278.

16.Piao, S.; Ciais, P.; Friedlingstein, P.; de Noblet-Ducoudré, N.; Cadule, P.; Viovy, N.; Wang, T.

Spatiotemporal patterns of terrestrial carbon cycle during the 20th century. Glob. Biogeochem. Cy.

2009, doi: 10.1029/2008GB003339.

17.Anav, A.; Menut, L.; Khvorostyanov, D.; Viovy, N. A comparison of two canopy conductance

parameterizations to quantify the interactions between surface ozone and vegetation over Europe.

J. Geophys. Res.2012, doi: 10.1029/2012JG001976.

18.Asrar, G.; Fuchs, M.; Kanemasu, E.T.; Hatfield, J.L. Estimating absorbed photosynthetic radiation

and leaf area index from spectral reflectance in wheat. Agron. J. 1984, 76, 300–306.

19.Chen, J.M.; Pavlic, G.; Brown, L.; Cihlar, J.; Leblanc, S.G.; White, H.P.; Hall, R.J.; Peddle, D.R.;

King, D.J.; Trofymow, J.A.; Swift, E.; van der Sanden, J.; Pellikka, P.K.E. Derivation and validation of Canada-wide coarse-resolution leaf area index maps using high-resolution satellite imagery and ground measurements. Remote Sens. Environ.2002, 80, 165–184.

20.Myneni, R.B.; Ramakrishna, R.; Nemani, R.; Running, S.W. Estimation of global leaf area index

and absorbed par using radiative transfer models. IEEE Trans. Geosci. Remote Sens.1997, 35, 1380–1393.

21.Fassnacht, K.S.; Gower, S.T.; MacKenzie, M.D.; Nordheim, E.V.; Lillesand, T.M. Estimating the

leaf area index of North Central Wisconsin forests using the Landsat Thematic Mapper. Remote Sens. Environ.1997, 61, 229–245.

22.Colombo, R.; Bellingeri, D.; Fasolini, D.; Marino, C.M. Retrieval of leaf area index in different

vegetation types using high resolution satellite data. Remote Sens. Environ.2003, 86, 120–131. 23.Houborg, R.; Soegaard, H.; Boegh, E. Combining vegetation index and model inversion methods

for the extraction of key vegetation biophysical parameters using Terra and Aqua MODIS reflectance data. Remote Sens. Environ.2007, 106, 39–58.

24.Myneni, R.B.; Ross, J.; Asrar, G. A review on the theory of photon transport in leaf canopies.

Agric. For. Meteorol. 1989, 45, 1–153.

25.Knyazikhin, Y.; Martonchik, J.V.; Myneni, R.B.; Diner, D.J.; Running, S.W. Synergistic

algorithm for estimating vegetation canopy leaf area index and fraction of absorbed photosynthetically active radiation from MODIS and MISR data. J. Geophys. Res. 1998, 103, 32257–32275.

https://www.wendangku.net/doc/0014374291.html,bal, B.; Baret, F.; Weiss, M.; Trubuil, A.; Mace, D.; Pragnere, A.; Myneni, R.B.; Knyazikhin, Y.;

Wang, L. Retrieval of canopy biophysical variables from bidirectional reflectance using prior information to solve the ill-posed inverse problem. Remote Sens. Environ.2002, 84, 1–15.

27.Li, X.; Strahler, A.H. Geometric-optical bidirectional reflectance modeling of the discrete crown

vegetation canopy—Effect of crown shape and mutual shadowing. IEEE Trans. Geosci. Remote Sens.1992, 30, 276–292.

28.Li, X.; Strahler, A.H.; Woodcock, C.E. A hybrid geomeric optical-radiative transfer approach for

modeling albedo and directional reflectance of discontinuous canopies. IEEE Trans. Geosci.

Remote Sens.1995, 33, 446–480.

29.Ross, J.K.; Marshak, A.L. Calculation of canopy bidirectional reflectance using the monte-carlo

method. Remote Sens. Environ.1988, 24, 213–225.

30.Lewis, P. Three-dimensional plant modelling for remote sensing simulation studies using the

Botanical Plant Modelling System. Agron. Sustainable Dev. 1999, 19, 185–210.

31.Baret, F.; Clevers, J.G.P.W.; Steven, M.D. The robustness of canopy gap fraction estimates from

red and near-infrared reflectances: A comparison of approaches. Remote Sens. Environ.1995, 54, 141–151.

32.Kimes, D.S.; Ranson, K.J.; Sun, G. Inversion of a forest backscatter model using neural networks.

Int. J. Remote Sens. 1997, 18, 2181–2199.

33.Weiss, M.; Baret, F.; Leroy, M.; Hautec?ur, O.; Bacour, C.; Prévot, L.; Bruguier, N. Validation

of neural net techniques to estimate canopy biophysical variables from remote sensing data.

Agron. Sustain. Dev. 2002, 22, 547–553.

34.Fernandes, R.; Butson, C. A Landsat TM/ETM+ based accuracy assessment of leaf area index

products for Canada derived from SPOT4/VGT data. Can. J. Remote Sens.2003, 29, 241–258. 35.Morisette, J.T.; Baret, F.; Privette, J.L.; Myneni, R.B.; Nickeson, J.E.; Garrigues, S.; Shabanov, N.V.;

Weiss, M.; Fernandes, R.A.; Leblanc, S.G.; et al. Validation of global moderate-resolution LAI products: A framework proposed within the CEOS land product validation subgroup. IEEE Trans.

Geosci. Remote Sens. 2006, 44, 1804–1817.

36.Yang, W.; Tan, B.; Huang, D.; Rautiainen, M.; Shabanov, N.V.; Wang, Y.; Privette, J.L.;

Huemmrich, K.F.; Fensholt, R.; Sandholt, I.; et al. MODIS leaf area index products: From validation to algorithm improvement. IEEE Trans. Geosci. Remote Sens. 2006, 44, 1885–1898. 37.Pisek, J.; Chen, J.M. Comparison and validation of MODIS and VEGETATION global LAI

products over four BigFoot sites in North America. Remote Sens. Environ.2007, 109, 81–94.

38.Weiss, M.; Baret, F.; Garrigues, S.; Lacaze, R. LAI and fAPAR CYCLOPES global products

derived from VEGETATION. Part 2: Validation and comparison with MODIS collection 4 products. Remote Sens. Environ.2007, 110, 317–331.

39.Garrigues, S.; Lacaze, R.; Baret, F.; Morisette, J.T.; Weiss, M.; Nickeson, J.E.; Fernandes, R.;

Plummer, S.; Shabanov, N.V.; Myneni, R.B.; et al. Validation and intercomparison of global Leaf Area Index products derived from remote sensing data. J. Geophys. Res.2008, doi:

10.1029/2003GB002199.

40.Ganguly, S.; Samanta, A.; Schull, M.A.; Shabanov, N.V.; Milesi, C.; Nemani, R.R.; Knyazikhin,

Y.; Myneni, R.B. Generating vegetation leaf area index Earth system data record from multiple sensors. Part 2: Implementation, analysis and validation. Remote Sens. Environ.2008, 112, 4318–4332.

41.Lee, T.E.; Miller, S.D.; Turk, F.J.; Schueler, C.; Julian, R.; Deyo, S.; Dills, P.; Wang, S. The

NPOESS VIIRS day/night visible sensor. Bull. Am. Meteorol. Soc.2006, 87, 191–199.

42.Lee, T.F.; Miller, S.D.; Schueler, C.; Miller, S. NASA MODIS previews NPOESS VIIRS

capabilities. Weather Forecast. 2006, 21, 649–655.

https://www.wendangku.net/doc/0014374291.html,ler, S.D.; Schmidt, C.C.; Schmit, T.J.; Hillger, D.W. A case for natural colour imagery from

geostationary satellites, and an approximation for the GOES-R ABI. Int. J. Remote Sens.2012, 33, 3999–4028.

44.Wang, Q.; Wu, C.; Li, Q.; Li, J. Chinese HJ-1A/B satellites and data characteristics. Sci. China

Ser. D.2010, 53, 51–57.

45.Shimada, M.; Tadono, T.; Rosenqvist, A. Advanced Land Observing Satellite (ALOS) and

monitoring global environmental change. Proc. IEEE2010, 98, 780–799.

46.Tucker, C.J.; Pinzon, J.E.; Brown, M.E.; Slayback, D.A.; Pak, E.W.; Mahoney, R.; Vermote, E.F.;

El Saleous, N. An extended AVHRR 8-km NDVI dataset compatible with MODIS and SPOT vegetation NDVI data. Int. J. Remote Sens. 2005, 26, 4485–4498.

47.Yuan, H.; Dai, Y.; Xiao, Z.; Ji, D.; Shangguan, W. Reprocessing the MODIS Leaf Area Index

products for land surface and climate modelling. Remote Sens. Environ.2011, 115, 1171–1187.

48.Samanta, A.; Costa, M.H.; Nunes, E.L.; Vieira, S.A.; Xu, L.; Myneni, R.B. Comment on

“Drought-induced reduction in global terrestrial net primary production from 2000 through 2009”.

Science2011, 333, 1093–1093.

49.NASA. Goddard Space Flight Center. Available online: https://www.wendangku.net/doc/0014374291.html,/ (accessed on

15 January 2013).

50.Friedl, M.A.; McIver, D.K.; Hodges, J.C.F.; Zhang, X.Y.; Muchoney, D.; Strahler, A.H.;

Woodcock, C.E.; Gopal, S.; Schneider, A.; Cooper, A.; et al. Global land cover mapping from MODIS: Algorithms and early results. Remote Sens. Environ.2002, 83, 287–302.

51.Lotsch, A.; Tian, Y.; Friedl, M.A.; Myneni, R.B. Land cover mapping in support of LAI and

FPAR retrievals from EOS-MODIS and MISR: Classification methods and sensitivities to errors.

Int. J. Remote Sens.2003, 24, 1997–2016.

52.Fang, H.; Li, W.; Myneni, R.B. The impact of potential land cover misclassification on MODIS

Leaf Area Index (LAI) estimation: A statistical perspective. Remote Sens. 2013, 5, 830–844.

53.Heermann, P.D.; Khazenie, N. Classification of multispectral remote sensing data using a

back-propagation neural network. IEEE Trans. Geosci. Remote Sens. 1992, 30, 81–88.

54.Privette, J.L.; Myneni, R.B.; Knyazikhin, Y.; Mukelabai, M.; Roberts, G.; Tian, Y.; Wang, Y.;

Leblanc, S.G. Early spatial and temporal validation of MODIS LAI product in the Southern Africa Kalahari. Remote Sens. Environ.2002, 83, 232–243.

55.Tian, Y.; Woodcock, C.E.; Wang, Y.; Privette, J.L.; Shabanov, N.V.; Zhou, L.; Zhang, Y.;

Buermann, W.; Dong, J.; Veikkanen, B.; et al. Multiscale analysis and validation of the MODIS LAI product: I. Uncertainty assessment. Remote Sens. Environ.2002, 83, 414–430.

56.Tian, Y.; Woodcock, C.E.; Wang, Y.; Privette, J.L.; Shabanov, N.V.; Zhou, L.; Zhang, Y.;

Buermann, W.; Dong, J.; Veikkanen, B.; et al. Multiscale analysis and validation of the MODIS LAI product: II. Sampling strategy. Remote Sens. Environ.2002, 83, 431–441.

57.Shabanov, N.V.; Wang, Y.; Buermann, W.; Dong, J.; Hoffman, S.; Smith, G.R.; Tian, Y.;

Knyazikhin, Y.; Myneni, R.B. Effect of foliage spatial heterogeneity in the MODIS LAI and FPAR algorithm over broadleaf forests. Remote Sens. Environ.2003, 85, 410–423.

58.Wang, Y.; Woodcock, C.E.; Buermann, W.; Stenberg, P.; Voipio, P.; Smolander, H.; H?me, T.;

Tian, Y.; Hu, J.; Knyazikhin, Y.; Myneni, R.B. Evaluation of the MODIS LAI algorithm at a coniferous forest site in Finland. Remote Sens. Environ.2004, 91, 114–127.

59.Tan, B.; Hu, J. N.; Huang, D.; Yang, W.Z.; Zhang, P.; Shabanov, N.V.; Knyazikhin, Y.;

Nemani, R.R.; Myneni, R.B. Assessment of the broadleaf crops leaf area index product from the Terra MODIS instrument. Agric. For. Meteorol. 2005, 135, 124–134.

60.Cohen, W.B.; Maiersperger, T.K.; Turner, D.P.; Ritts, W.D.; Pflugmacher, D.; Kennedy, R.E.;

Kirschbaum, A.; Running, S.W.; Costa, M.; Gower, S.T. MODIS land cover and LAI collection 4 product quality across nine sites in the western hemisphere. IEEE Trans. Geosci. Remote Sens.

2006, 44, 1843–1857.

61.Huang, D.; Yang, W.Z.; Tan, B.; Rautiainen, M.; Zhang, P.; Hu, J.N.; Shabanov, N.V.; Linder, S.;

Knyazikhin, Y.; Myneni, R.B. The importance of measurement errors for deriving accurate reference leaf area index maps for validation of moderate-resolution satellite LAI products. IEEE Trans. Geosci. Remote Sens.2006, 44, 1866–1871.

Matlab笔记——数据预处理——剔除异常值及平滑处理

012. 数据预处理(1)——剔除异常值及平滑处理测量数据在其采集与传输过程中,由于环境干扰或人为因素有可能造成个别数据不切合实际或丢失,这种数据称为异常值。为了恢复数据的客观真实性以便将来得到更好的分析结果,有必要先对原始数据(1)剔除异常值; 另外,无论是人工观测的数据还是由数据采集系统获取的数据,都不可避免叠加上“噪声”干扰(反映在曲线图形上就是一些“毛刺和尖峰”)。为了提高数据的质量,必须对数据进行(2)平滑处理(去噪声干扰); (一)剔除异常值。 注:若是有空缺值,或导入Matlab数据显示为“NaN”(非数),需要①忽略整条空缺值数据,或者②填上空缺值。 填空缺值的方法,通常有两种:A. 使用样本平均值填充;B. 使用判定树或贝叶斯分类等方法推导最可能的值填充(略)。 一、基本思想: 规定一个置信水平,确定一个置信限度,凡是超过该限度的误差,就认为它是异常值,从而予以剔除。

二、常用方法:拉依达方法、肖维勒方法、一阶差分法。 注意:这些方法都是假设数据依正态分布为前提的。 1. 拉依达方法(非等置信概率) 如果某测量值与平均值之差大于标准偏差的三倍,则予以剔除。 3x i x x S -> 其中,11n i i x x n ==∑为样本均值,1 2 2 11()1n x i i S x x n =?? ??? =--∑为样本的标准偏差。 注:适合大样本数据,建议测量次数≥50次。 代码实例(略)。 2. 肖维勒方法(等置信概率) 在 n 次测量结果中,如果某误差可能出现的次数小于半次时,就予以剔除。 这实质上是规定了置信概率为1-1/2n ,根据这一置信概率,可计算出肖维勒系数,也可从表中查出,当要求不很严格时,还可按下列近似公式计算: 10.4ln()n n ω=+

图销售分析”的多维数据集模型的设计共8页word资料

数据仓库与数据挖掘 实验报告 姓名:岩羊先生 班级:数技2011 学号:XXXXXX 实验日期:2013年11月14日 目录 实验.............................................. 错误!未定义书签。 【实验目的】............................... 错误!未定义书签。 1、熟悉SQLservermanager studio和VisualStudio2008软件功能 和操作特点; ................................ 错误!未定义书签。 2、了解SQLservermanager studio和VisualStudio2008软件的各 选项面板和操作方法; ........................ 错误!未定义书签。 3、熟练掌握SQLserver manager studio和VisualStudio2008工 作流程。................................... 错误!未定义书签。 【实验内容】............................... 错误!未定义书签。 1.打开SQLserver manager studio软件,逐一操作各选项,熟悉

软件功能; (4) 2.根据给出的数据库模型“出版社销售图书Pubs”优化结构,新建立数据库并导出; (4) 3.打开VisualStudio2008,导入已有数据库、或新建数据文件,设计一个“图书销售分析”的多维数据集模型。并使用各种输出节点,熟悉数据输入输出。 (4) 【实验环境】............................... 错误!未定义书签。【实验步骤】............................... 错误!未定义书签。 1.打开 SQL Server manager studio; (5) 2.附加备份的数据库文件pubs_DW_Data.MDF和pubs_DW_Log.LDF 并且做出优化; (5) 3.修改数据库属性; (5) 4.建立数据仓库所需的数据库bb(导出); (5) 5. 创建新的分析服务项目; (5) 6. 新建数据源(本地服务器输入“.”) (5) 7.建立多维数据集 (6) 8.处理多维数据集,得出模型: (6) 9.模型实例: (6) 【实验中的困难及解决办法】................. 错误!未定义书签。问题1:SQLserver中数据库的到导出. (6)

利用SPSS 19.0剔除异常值

如何利用SPSS 19.0剔除数据中的异常值(Outliers) 一般数组应遵循正态分布,但一列数组中有可能会出现异常值,从而影响数据的方差和统计结果,因此挡在SPSS中输入数据后,首先要检查数据中是否存在异常值。方法如下: 1.选择想要观察的数据,此处我们选择normal 列中的数据进行查看 2.进入菜单栏中“分析”→“描述统计”→“探索” 3.将“normal”数组放入因变量列表中

4.点击“探索”窗口中的“统计量”,点掉“描述性”,选择“界外值”和“百分位数” 5.点击“探索”窗口中“绘制”,选择“直方图”,去掉“茎叶图” 6.选择结束后点击“探索”窗口“确定”查看结果: (1)百分位数图:

(2)以50%左右两个百分位数(即四分位数25和75下方的加权平均值)的加权平均值计算最高和最低临界值,使用计算公式如下: Upper=Q3+(2.2*(Q3-Q1)) Lower=Q1-(2.2*(Q3-Q1)) 此处Q3=26.0281, Q1=17.8396 计算后,Upper=44.0428,Lower=-0.1751 (3)查看“极值”表格: 极值 案例号值 normal 最高 1 20 29.30 2 22 29.30 3 2 4 29.30 4 46 29.30 5 47 29.30a 最低 1 81 16.82 2 78 16.82 3 75 16.82 4 57 16.82 5 54 16.82b a. 上限值表中仅显示一部分具有值 29.30 的案例。 b. 下限值表中仅显示一部分具有值 16.82 的案例。 如果有最高值查过Upper,或最低值小于Lower值,则被视为Outliers, 即异常值。由图中看,此列数组并无异常值

实验1_建立多维数据集

实验1 建立多维数据集 实验目的 通过使用SQL Server建立多维数据集,使学生理解和掌握建立多维数据集的一般过程和方法。 实验内容 1、建立FoodMart多维数据集 实验条件 1.操作系统:Windows XP SP2 2.SQL Server 2000 实验要求: 1、按照实验步骤中练习建立FOODMART多维数据集。 实验步骤 第一步, 建立系统数据源连接 1.单击“开始”按钮,指向“设置”,单击“控制面板”,然后双击“管理工具”,再双击“数据源(ODBC)”。 1.在“系统DSN”选项卡上单击“添加”按钮。 2.选择“Microsoft Access 驱动程序(*.mdb)”,然后单击“完成”按钮。 3.在“数据源名”框中,输入“教程”,然后在“数据库”下,单击“选择”。 4.在“选择数据库”对话框中,浏览到“C:\Program Files\Microsoft Analysis Services\Samples”,然后单击“FoodMart 2000.mdb”。单击“确定”按钮。 5.在“ODBC Microsoft Access 安装”对话框中单击“确定”按钮。 6.在“ODBC 数据源管理器”对话框中单击“确定”按钮。 第二步, 启动Analysis Manager

单击“开始”按钮,依次指向“程序”、“Microsoft SQL Server”和“Analysis Services”,然后单击“Analysis Manager”。 第三步,建立数据库和数据源 1.在Analysis Manager 树视图中展开“Analysis Servers”。 2.单击服务器名称,即可建立与Analysis Servers 的连接。 3.右击服务器名称,然后单击“新建数据库”命令。 4.在“数据库”对话框中的“数据库名称”框中,输入“教程”,然后单击“确定”按钮。 5.在Analysis Manager 树窗格中展开服务器,然后展开刚才创建的“教程”数据库。 6.在Analysis Manager 树窗格中,右击“教程”数据库下的“数据源”文件夹,然后单击“新数据源” 命令。 7.在“数据链接属性”对话框中,单击“提供者”选项卡,然后单击“Microsoft OLE DB Provider for ODBC Drivers”。

SAS软件对数据集一些简单操作

SAS软件对数据集一些简单操作Libname AA 'd:\SAS'; Data AA.feng; Input a b c; cards; 3 4 56 64 43 34 累加 DATA A; INPUT X Y @@; S+X; CARDS; 3 5 7 9 20 21 ; PROC PRINT; RUN; ; run; DATA D1; INFILE ‘C:FIT.TXT' INPUT NUM $ 1-4 SEX $ 5 H 6-9 W 10-11; RUN; 建立数据集求均值 data a; input name$sex$math chinese@@; cards; 张三男82 96 刘四女81 98 王五男90 92 黄六女92 92 ; proc print data=a; proc means data=a mean; var math chinese; run; 保留列 data b; set a; keep name math; run; 丢弃列 data b; set b;

drop name; run; 条件选择 data c; set a; if math>90 and chinese>90; run; 把超过九十分改为90分data aa; set a; if chinese>90 then chinese=90; run; 筛选行 data aaa ; set a(firstobs=2 obs=3); run; 拆分男女 data a1 a2; set a; select(sex); when('男')output a1; when('女')output a2; otherwise put sex='wrong'; end; drop sex; run; 合并 data new; set a1(in=male) a2(in=female); if male=1 then sex=''; if female=1 then sex=''; run; 纵向合并Set 横向合并merge 重命名rename 改标志label 排序语句 proc sort data=a out=b; by sex;

☆☆【】异常值的剔除--肖维勒法则

一、线性方程的异常值剔除——肖维勒准则,适用于小样本和线性分析 1、用spss方法计算出残差和标准值,具体步骤如下: 步骤1:选择菜单“【分析】—>【回归】—>【线性】”,打开Linear Regression 对话框。将变量住房支出y移入Dependent列表框中,将年收入x移入Independents 列表框中。在Method 框中选择Enter 选项,表示所选自变量全部进入回归模型。 步骤2:单击Statistics 按钮,如图在Statistics 子对话框。该对话框中设置要输出的统计量。这里选中估计、模型拟合度复选框。 ?估计:输出有关回归系数的统计量,包括回归系数、回归系数的标准差、标准化的回归系数、t 统计量及其对应的p值等。 ?置信区间:输出每个回归系数的95%的置信度估计区间。 ?协方差矩阵:输出解释变量的相关系数矩阵和协差阵。 ?模型拟合度:输出可决系数、调整的可决系数、回归方程的标准误差 回归方程F检验的方差分析 步骤3:单击绘制按钮,在Plots子对话框中的标准化残差图选项栏中选中正态概率图复选框,以便对残差的正态性进行分析。 步骤4:单击保存按钮,在Save 子对话框中残差选项栏中选中未标准化复选框,这样可以在数据文件中生成一个变量名尾res_1 的残差变量,以便对残差进行进一步分析。 其余保持Spss 默认选项。在主对话框中单击ok按钮,执行线性回归命令。 结果输出与分析 散点图(判断随机扰动项是否存在异方差,根据散点图,若随着解释变量x的增大,被解释变量的波动幅度明显增大,说明随机扰动项可能存在比较严重的异方差问题,应该利用加权最小二乘法等方法对模型进行修正)、相关系数表Correlations(皮尔逊相关系数,双尾检验概率p值尾<0.05,则变量之间显著相关,在此前提下进一步进行回归分析,建立一元线性

二、创建SAS数据集(学生)

二、创建SAS数据集 本课内容: 1.用编写SAS程序的方法建立数据集 2.用“菜单”工具导入SAS外部环境建立的数据(.dbf和excel ) 3.非编程方式建立SAS数据集 前面说过,SAS语言是一种专用的数据管理、分析语言,它提供了很强的数据操作能力。这些能力表现在它可以轻易地读入任意复杂格式的输入数据,并可以对输入的数据进行计算、子集选择、更新、合并、拆分等操作。另外,SAS 系统还提供了用来访问其它数据库系统的接口,访问各种微机用数据库文件(如dBase、FoxPro、Excel )的接口及向导等。但是对于SAS系统来说,无论何种类型的数据文件,都需要转换为SAS数据集的形式才能被系统使用,只有SAS数据集才能被系统识别和使用。用SAS 语言直接或间接产生数据集的方式很多,本课程只介绍以下几种常用的方法。 一、 用编写SAS程序的方法建立数据集 1.用INPUT 语句和CARDS语句在程序中输入数据 在数据步中输入原始数据,要使用INPUT 语句来指定输入的变量和格式,用CARDS 语句输入数据的值,数据输入完毕后要以一个分号结束,分号单独占一行(从CARDS到分号之间的行我们称为数据块)。 ①INPUT 语句的自由格式: 以每一个列作为每个观测的变量(系统默认),变量之间用空格分开。变量如果是字符型的需要在变量名后面加一个$符号。 产生数据集常用SAS语句: DATA [数据集名]; INPUT [变量名]; CARDS; 数据块 ; RUN

例2.1: data c9901; input code name$ sex$ math chinese; cards; 1 李明 男 9 2 98 2 张红艺 女 89 106 3 王思明 男 86 90 4 张聪 男 98 109 5 刘颍 女 80 110 ; proc print;run; 以上程序运行后生成的数据集有五个观测,五个变量,每行数据的各变量之间用空格分隔。为输入这些数据,INPUT 语句中依次列出了五个变量名,并在字符型变量NAME 和SEX 后加了$符。程序提交运行后生成一个名为c9901的SAS临时数据集。 如果要将生成的数据集放入永久逻辑库,可以使用SASUSER,也可使用预先设定的自定义逻辑库名,然后修改data语句中的数据集名,将其改为两水平命名,把数据集保存到指定的永久库中。 注意:在SAS工作中一旦要与逻辑库发生联系,无论是放置数据集还是从逻辑库中调用某个已经存在的数据集,数据集的名称要采用两水平命名(即逻辑库名+数据集名称)。例如:现在要将c9901放到sasuser库中,程序的data语句要写:data sasuser.c9901;运行后 c9901放入sasuser中,如果要将建立的数据集放入自定义永久库中时,逻辑库名替换为自定义符号。 使用自由格式输入数据有一些限制条件: 1)数据块中的每行为一个观测,各数据值之间用空格分隔; 2)无论是字符型还是数值型缺失数据都必须用小数点表示; 3)字符型数据长度不能超过8个字符,中间不允许有空白; 有特殊格式的数据需要用有格式输入,即在变量名后加格式名。其中最常见的是用来输入日期。数据中的日期输入方法经常是多种多样的,比如1998 年10 月9 日可以写成“1998-10-9”,“19981009”,“9/10/98”等等,为读入这样的日期数据就需要为它指定特殊的日期输入格式。另外,日期数据在SAS 中是按数值存储的,所以如果要显示日期值,也需要为它指定特殊的日期输出格式。

SQL Server 2005 多维数据集创建过程

SQL Server 2005 多维数据集创建过程 一.创建新的Analysis Services项目 1.单击“开始”,指向“所有程序”,再指向Microsoft SQL Server 2005,再单击SQL Server Business Intelligence Development Studio,打开Microsoft Visual Studio 2005开发环境。 2.在Visual Studio的“文件”菜单上,指向“新建”,再单击“项目”。 3.在“新建项目”对话框中,从“项目类型”窗格中选择“商业智能项目”,再在“模板”窗格中选择“Analysis Services项目”。 4.将项目名称更改为Analysis Services Tutorial1,这也将更改解决方案名称,然后单击“确定”。 至此,在同样名为Analysis Services Tutorial1的新解决方案中基于Analysis Services项目模板成功创建了Analysis Services Tutorial1项目。 二.定义新的数据源 1.在Microsoft Visual Studio 2005开发环境中,打开解决方案资源管理器,右键单击“数据源”,然后单击“新建数据源”,将打开数据源向导。

2.在“欢迎使用数据源向导”页上,单击“下一步”。 3.在“选择如何定义连接”页上,单击“新建”。 4.在“提供程序”的下拉列表框中,选中“本机OLE DB\Microsoft OLE DB Provider for SQL Server”,然后单击“确定”。 5.在“服务器名称”文本框中,键入localhost。 6.确保已选中“使用Windows身份验证”。在“选择或输入数据库名称”列表中,选择AdventureWorksDW,然后单击“确定”。 7.在“新建数据源向导”页上,然后单击“下一步”。 8.选择“使用服务帐户”,然后单击“下一步”。 9.在“完成向导”页上,单击“完成”以创建名为Adventure Works DW的新数据源。 10.打开解决方案资源管理器,可以看到“数据源”文件夹中的新数据源。 三.定义一个新的数据源视图 1.在解决方案资源管理器中,右键单击“数据源视图”,再单击“新建数据源视图”。 2.在“欢迎使用数据源视图向导”页中,单击“下一步”。

《SAS数据分析范例》(SAS数据集)

《SAS数据分析范例》数据集 目录 表1 sas.bd1 (3) 表2 sas.bd3 (4) 表3 sas.bd4 (5) 表4 sas.belts (6) 表5 sas.c1d2 (7) 表6 sas.c7d31 (8) 表7 sas.dead0 (9) 表8 sas.dqgy (10) 表9 sas.dqjyjf (11) 表10 sas.dqnlmy3 (12) 表11 sas.dqnlmy (13) 表12 sas.dqrjsr (14) 表13 sas.dqrk (15) 表14 sas.gjxuexiao0 (16) 表15 sas.gnsczzgc (17) 表16 sas.gnsczzs (18) 表17 sas.gr08n01 (19) 表18 sas.iris (20) 表19 sas.jmcxck0 (21) 表20 sas.jmjt052 (22) 表21 sas.jmjt053 (23) 表22 sas.jmjt054 (24) 表23 sas.jmjt055 (25) 表24 sas.jmxfsps (26) 表25 sas.jmxfspzs0 (27) 表26 sas.jmxfzss (28) 表27 sas.jmxfzst (29) 表28 sas.kscj2 (30) 表29 sas.modeclu4 (31) 表30 sas.ms8d1 (32) 表31 sas.nlmyzzs (33) 表32 sas.plates (34) 表33 sas.poverty (35) 表34 sas.rjnycpcl0 (36) 表35 sas.rjsrs (37) 表36 sas.sanmao (38) 表37 sas.sczz1 (39) 表38 sas.sczz06s (40) 表39 sas.sczz (41) 表40 sas.sczzgc1 (42)

SAS数据集操作

目录 SAS 数据集操作 2014年03月28日 1.合并 2.删选,修改 3.查询 PPT 模板下载:https://www.wendangku.net/doc/0014374291.html,/moban/

1 数据集的合并: (1)纵向合并:添加或合并样本变量 (2)横向合并:添加或合并(指标)变量

(1)数据集纵向合并:可以添加或合并样本变量 形式: data 合并后数据名; set 数据名1 数据名2 ; run; 例:将名为male、female 的两个数据集纵向合并成一个名为total 的数据集data total; set male female; proc print data=total; run; /*若male 与female 变量名不同则total 的变量名为两者之并,数据值以缺失值形式出现*/

(2)数据集横向合并:添加或合并(指标)变量 形式: data 合并后数据名; merge 数据名1 数据名2 ; by 共有变量名; run; 例:将名为dataONE 和data TWO 的两个数据集按共有变量pid 横向合并成数据集total2 (以下程序以data total2 名义保存)

data one; input pid sex$ age; cards; 101 m 54 105 w 36 102 m 43 104 w 45 ; data two; input pid weight height; cards; 105 54 163 102 63 174 103 57 173 104 45 156 ;

proc sort data=one;/*必须先对共有变量(本例中pid)分别排序才能横向合并*/ by pid; /* 排序语句proc sort data=被排序变量所在数据集名; by 被排序变量名;排序时默认数值由小到大字母由先而后*/ proc sort data=two; /*必须先对共有变量(本例中pid)分别排序才能横向合并*/ by pid; /*以下为合并过程*/ data total2; /*合并后数据名*/ merge one two; /*形式: merge 被合并数据集名1 被合并数据集名2; */ 注意输出结果中的缺省值,输入数据时若有缺省分量一定要以. 表示,否则SAS 会将该行数据自行删除*/ by pid; proc print data=total2; run;

ArcGIS中网络数据集的建立

ArcGIS中网络数据集的建立 1对道路中心线的要求 (1)平面相交的道路,在路口打断; 立体相交的道路,不在路口打断。 (2)相连的道路端点必须要捕捉;线的空间结构需正确,可以利用拓扑规则检查修改空间位置有误的要素; (3)图层必须包含的字段:NAME、LENGTH、Hierarchy、OneWay,这些字段是为了方便建立网络数据集。 2道路中心线的处理 3.1建立拓扑 注:拓扑只能在geodatabase中的dataset下建立,因此需要将shapefile格式的图层导入geodatabase中。 (1)打开Catalog,在指定目录下新建Personal Geodatabase,双击进入,

空白处右击,选择“New->Feature Dataset”,输入名称,最好不要有 空格,选择与道路中心线数据相同的坐标系统,一路默认; (2)双击进入Feature Dataset,空白处右击,选择“Import->Feature Class (Multiple)…”,打开导入数据对话框,Input Features下浏览选择需要 导入的道路中心线数据,点击OK进行导入;(若导入出错,可能是 因为道路中心线和新建的Geodatabase所在路径存在空格或中文字 符,将道路中心线和新建的Geodatabase都拷贝至盘符根目录下, 再进行导入操作) (3)Feature Dataset目录下,空白处右击,选择“New->Topology”,按照以下图示进行拓扑的建立;

(4)打开ArcMap,点击,添加新建立的拓扑,同时将道路中心线一起添加进地图窗口,Editor->Start Editing,根据错误指示进行修改。

SAS介绍和SAS数据集

SAS系统
SAS系统介绍
SAS系统是用于数据分析与决策支持的大
邓 伟 2013.11 wdeng@https://www.wendangku.net/doc/0014374291.html,
型集成式模块化软件包。 其早期的名称Statistical Analysis Software 统计分析软件→大型集成应用系统 商业智能(BI)和分析挖掘(DM)
1
2
SAS系统是用于决策支持 的大型集成信息系统
SAS系统主要完成以数据为中心的四大任务: 数据访问 数据管理 数据呈现 数据分析
SAS历史
SAS成立于1976年,是全球最大的私人软件公司(预 打包软件),全球十大独立软件供应商之一 1966年 美国北卡州立大学 Jim Barr and Jim
Goodnight
1972年 推出SAS72供大学使用 1976年 创立公司
SAS软件研究所(SAS Institute Inc.) 举办第一个SUGI (SAS Users Group International) 会议 Base SAS 软件上市 与IBM建立合作伙伴关系
3 4
SAS历史
1985 第一个PC DOS SAS System 版本(Base SAS 和SAS/RTERM 软件)取得成功 1986面向个人计算机的SAS/IML 和SAS/STAT 软 件上市 1992
决策支持功能扩展到以下领域:指导性数据分析、临床 试验分析和报告、财务电子表格和英语查询 SAS第一个垂直市场软件:制药行业的临床审查系统上 市
SAS历史
1995 SAS 成为真正的端到端数据仓库解决 方案唯一的供应商,推出Rapid Warehousing Program 1999 美国食品和药品管理局选择SAS开发的 技术,作为接收和归档电子数据的标准
5
6
1

geodatabase数据库创建

Geodatabase 数据库创建 1 Geodatabase概述 地理数据库(GeoDatabase)是为了更好的管理和使用地理要素数据,而按照一定的模型、规则组合起来的存储空间数据和属性数据的容器。地理数据库是按照层次性的数据对象来组织地理数据的,这些数据对象包括对象类和要素数据集(feature dataset)。 对象类(Object Classes)是指存储非空间数据的表格(Table)。在Geodatabase中,对象类是一种特殊的类,它没有空间特征,如:某块地的主人。在“地块”和“主人”之间,可以定义某种关系。 要素类(Feature Classes) 是具有相同几何类型和属性的要素的集合,即同类空间要素的集合。如河流、道路、植被、用地、电缆等。要素类之间可以独立存在,也可具有某种关系。当不同的要素类之间存在关系时,我们将其组织到一个要素数据集中(Feature dataset)。 要素数据集(Feature Dataset) 是共享空间参考系统的要素类的集合,即一组具有相同空间参考的要素类的集合。将不同的要素类放到一个要素数据集下的理由可能很多,但一般而言,在以下三种情况下,我们考虑将不同的要素类组织到一个要素数据集中:(1)当不同的要素类属于同一范畴。如:全国范围内某种比例尺的水系数据,其点、线、面类型的要素类可组织为同一个要素数据集。 (2)在同一几何网络中充当连接点和边的各种要素类,必须组织到同一要素数据集中。如:配电网络中,有各种开关、变压器、电缆等,它们分别对应点或线类型的要素类,在配电网络建模时,应将其全部考虑到配电网络对应的集和网络模型中去。此时,这些要素类就必须放在同一要素数据集下。 (3)对于共享公共几何特征的要素类,如:用地、水系、行政边界等。当移动其中的一个要素时,其公共的部分也要求一起移动,并保持这种公共边关系不变。此种情况下,也要将这些要素类放到同一个要素数据集中。 对象类、要素类和要素数据集是Geodatabase中的基本组成项。当在数据库中创建了目这些项目后,可以向数据库中加载数据,并进一步定义数据库,如建立索引,建立拓扑

第三课SAS数据集

第三课SAS数据集 一.SAS数据集的结构 SAS数据集是关系型的,它通常分为两部分: ●描述部分——包含了一些关于数据属性的信息 ●数据部分——包括数据值 SAS的数据值被安排在一个矩阵式的表状结构中,见图3-1所示。 ●表的列称之为变量(Variable),变量类似于其它文件类型的域或字段(Field); ●表的行称之为观察(Observation),观察相当于记录(Record)。 变量1 变量2 变量3 变量4 Name Test1 Test2 Test3 观察1 Xiaoer 90 86 88 观察2 Zhangsan 100 98 89 观察3 Lisi 79 76 70 观察4 Wangwu 68 71 64 观察5 Zhaoliu 100 89 99 图3-1 一个SAS数据文件 二.SAS数据集形式 SAS系统中共有两种类型的数据集: ●SAS 数据文件(SAS data files) ●SAS 数据视窗(SAS data views) SAS 数据文件不仅包括描述部分,而且包括数据部分。SAS 数据视窗只有描述部分,没有数据部分,只包含了与其它数据文件或者其它软件数据的映射关系,能使SAS的所有过程可访问到,实际上并不包含SAS 数据视窗内的数据值。 自始自终,在SAS语言中,“SAS数据集”与这二种形式中之一有关。在下面的例子中,PRINT过程用相同方法处理数据集aaa.abc,而忽略它的形式: PROC PRINT DATA=aaa.abc 三.SAS数据集的名字 SAS数据集名字包括三个部分,格式如下: Libref.data-set-name.membertype ●Libref(库标记)──这是SAS数据库的逻辑名字 ●data-set-name(数据集名字)──这是SAS数据集的名字 ●membertype(成员类型)──SAS数据集名字的这一部分用户使用时不必给出。 SAS 数据文件的成员类型是DATA;SAS 数据视窗的成员类型是VIEW 例如上面例子中的aaa.abc这个SAS数据集名字,aaa是库标记,abc是数据集名字,成

试验数据异常值的检验及剔除方法

目录 摘要......................................................................... I 关键词...................................................................... I 1引言 (1) 2异常值的判别方法 (1) 检验(3S)准则 (1) 狄克松(Dixon)准则 (2) 格拉布斯(Grubbs)准则 (2) 指数分布时异常值检验 (3) 莱茵达准则(PanTa) (3) 肖维勒准则(Chauvenet) (4) 3 实验异常数据的处理 (4) 4 结束语 (5) 参考文献 (6)

试验数据异常值的检验及剔除方法 摘要:在实验中不可避免会存在一些异常数据,而异常数据的存在会掩盖研究对象的变化规律和对分析结果产生重要的影响,异常值的检验与正确处理是保证原始数据可靠性、平均值与标准差计算准确性的前提.本文简述判别测量值异常的几种统计学方法,并利用DPS软件检验及剔除实验数据中异常值,此方法简单、直观、快捷,适合实验者用于实验的数据处理和分析. 关键词:异常值检验;异常值剔除;DPS;测量数据

1 引言 在实验中,由于测量产生误差,从而导致个别数据出现异常,往往导致结果产生较大的误差,即出现数据的异常.而异常数据的出现会掩盖实验数据的变化规律,以致使研究对象变化规律异常,得出错误结论.因此,正确分析并剔除异常值有助于提高实验精度. 判别实验数据中异常值的步骤是先要检验和分析原始数据的记录、操作方法、实验条件等过程,找出异常值出现的原因并予以剔除. 利用计算机剔除异常值的方法许多专家做了详细的文献[1] 报告.如王鑫,吴先球,用Origin 剔除线形拟合中实验数据的异常值;严昌顺.用计算机快速剔除含粗大误差的“环值”;运用了统计学中各种判别异常值的准则,各种准则的优劣程度将体现在下文. 2 异常值的判别方法 判别异常值的准则很多,常用的有t 检验(3S )准则、狄克松(Dixon )准则、格拉布斯(Grubbs )准则等准则.下面将一一简要介绍. 2.1 检验(3S )准则 t 检验准则又称罗曼诺夫斯基准则,它是按t 分布的实际误差分布范围来判别异常值,对重复测量次数较少的情况比较合理. 基本思想:首先剔除一个可疑值,然后安t 分布来检验被剔除的值是否为异常值. 设样本数据为123,,n x x x x ,若认j x 为可疑值.计算余下1n -个数据平均值 1n x -及标准差1n s - ,即2 111,1,1n n i n i i j x x s n --=≠=-∑. 然后,按t 分布来判别被剔除的值j x 是否为异常值. 若1(,)n j x x kn a -->,则j x 为异常值,应予剔除,否则为正常值,应予以保留.其中:a 为显著水平;n 数据个数;(,)k n a 为检验系数,可通过查表得到.

sas数据集例题

试 验目的本实验主要练习数据集的导入和导出,建立、删除和保留变量、数据集的合并与拆分,排序、转置等操作。 掌握从已有数据文件建立数据集以及在已有数据集的基础上建立、删除变量; 掌握sas的程序控制的三种基本控制流; 掌握数据数据修正、排序、转置和标准化的过程或语句。 实验内容完成下列各题 一.某班12 名学生3 门功课成绩如下: 用sas的data步建立数据集。 筛选出有一科不及格的学生。 计算每人平均成绩,并按五级制评定综合成绩。 二.教材P141的6,7题。 三.data2_1.sav和data2_2.sav是一组被试(编号1-47)分别做两个量表数据,请把它们合并起来,保存为“量表.sav”,data2_3.sav是另一组被试(编号48-65)做成量表的数据,请把这些数据加到“量表.sav”里,并保存。 1)a1、a5、a30、a43、a49和b2、b6、b19为反向计分,把他们转化为正向。 2)data2_1.sav和data2_2.sav是一组被试(编号1-47)分别做两个量表的 数据,请把它们合并起来,保存为“量表.sav”,data2_3.sa v是另一组被试(编号48-65)做成量表的数据,请把这些数据加到“量表.sav”里,并保存。 3)a1到a25为a量表的第一个维度,a26到a50为第二个维度,b量表只有 一个维度,分别求出三个维度的总分(即所有项目得分相加)。 4)把b量表总分按照从小到大的顺序排列,设置另外一个变量(group),b 量表得分前十名赋值“1”,标签为“高分组”,后十名赋值“3”,标签为“低分组”,其它赋值“2”,标签为“中间组”。 5)各维度总分中如果有缺失,请用该维度的平均分进行替换。

异常值处理

data下拉菜单里有define variable properties,把变量选到右边的框里,点continue,在新窗口中有变量在样本中的所有取值,要定义某个值是异常值,就把相应的missing框勾上就ok 啦~~~然后再处理数据时这些值就已经被剔除,不参与分析了~~~ 使用箱型图Boxplot...发现异常值,然后把大于等于最小异常值或小于等于最大异常值的值 用Data主菜单里的Cases Select子菜单里的条件设置按钮,就可以自动剔除异常值。 spss里有个功能,好像是绘图吧。绘制Box plot图的。Box plot,可译成箱线图,由一个矩形箱和几条线段组合而成。针对一个数据批,其箱线图的绘制一般由以下几个步骤:第一、画数轴,度量单位大小和数据批的单位一致,起点比最小值稍小,长度比该数据批的全距稍长。 第二、画一个矩形盒,两端边的位置分别对应数据批的上下四分位数(Q1 和Q3)。在矩形盒内部中位数(X m)位置画一条线段为中位线。 第三、在Q3+1.5IQR(四分位距)和Q1-1.5IQR处画两条与中位线一样的线段,这两条线段为异常值截断点,称其为内限;在F+3IQR和F-3IQR处画两条线段,称其为外限。处于内限以外位置的点表示的数据都是异常值,其中在内限与外限之间的异常值为温和的异常值(mild outliers),在外限以外的为极端的异常值(extreme outliers)。 第四、从矩形盒两端边向外各画一条线段直到不是异常值的最远点,表示该批数据正常值的分布区间。 第五、用“〇”标出温和的异常值,用“*”标出极端的异常值。相同值的数据点并列标出在同一数据线位置上,不同值的数据点标在不同数据线位置上。至此一批数据的箱线图便绘出了。统计软件绘制的箱线图一般没有标出内限和外限。箱线图示例可见下图。 我常用一下方法: 1、可以通过“分析”下“描述统计“下“频率”的”绘制“直方图”,看图发现频数出现最少的值,就可能是异常值,但还要看距离其它情况的程度。 2、可通过“分析”下的“描述统计”下的“探索”下的“绘制”选项的“叶茎图”,看个案偏离箱体边缘(上端、下端)的距离是箱体的几倍,“○”代表在1.5-3倍之间(离群点),“*”代表超过3倍(极端离群点)。 3、可以通过“分析”下“描述统计“下“描述”下的选项“将标准化存为变量Z”,选择相应的变量,“确定”。将生成新变量,如果值超过2,肯定是异常值。

《R语言实战》第二章:创建数据集(代码实例)

#----R语言介绍--------# options() #显示选项设置情况 options(digits=4) #数字格式化为小数点后三位有效数字 install.packages("gclubs") #安装包 installed.packages() #查看已经安装的包 library("gclubs") #加载包 #----创建矩阵--------# cells<-c(1,16,24,68) rnames<-c("R1","R2") #为行命名 cnames<-c("C1","C2") #为列命名 mymatrix<-matrix(cells,nrow=2,ncol=2,byrow=TRUE,dimnames=list(rnames,cnames)) mymatrix #----矩阵的下标--------# x<-matrix(1:10,nrow=2,byrow=TRUE) x x[1,c(3,5)] #第1行第3、第5个元素 x[,2] #抽取矩阵第二列 x[7] #单独下标是矩阵中元素个数的索引 x<-matrix(1:10,nrow=2,byrow=FALSE) x x[7] #在建立索引时,以先列后行的顺序(bycol),而不管矩阵的建立是byrow or bycol #-------------建立数组-------------# dim1<-c("A1","A2") dim2<-c("B1","B2","B3") dim3<-c("C1","C2","C3","C4") z<-array(1:24,c(2,3,4),dimnames=list(dim1,dim2,dim3)) #创建数组貌似没有byrow参数?z #-------------建立数据框--------------# #-- 数据框可以通过函数data.frame()创建: #-- mydata<-data.frame(col1,col2,col3,……) #-- 其中的列向量col1, col2, col3,…可为任何类型(如字符、数值、逻辑型) #-- 每一列数据的模式必须唯一,但可以将多个模式不同的列放在一起组成数据框patientID<-c(1,2,3,4) age<-c(23,36,29,53) diabetes<-c("Type1","Type2","Type1","Type1") status<-c("Poor","Improved","Excellent","Poor") patientdata<-data.frame(patientID,age,diabetes,status) patientdata #--------选取数据框中元素的三种方法------------# patientdata[1:2] patientdata[c("diabetes","status")] patientdata$age #-------------使用table生成列联表--------------# table(patientdata$diabetes,patientdata$status)

多维数据组织与分析

多维数据组织与分析 Prepared on 22 November 2020

昆明理工大学信息工程与自动化学院学生实验报告 ( 2016 — 2017 学年第二学期) 一、上机目的 目的: 1.理解维(表)、成员、层次(粒度)等基本概念及其之间的关系; 2.理解多维数据集创建的基本原理与流程; 3.理解并掌握OLAP分析的基本过程与方法; 4. 学会使用基本的MDX语句 二、上机内容 1.基于上次实验建立的地铁数据仓库,构建地铁公司收入的多维数据 集。 2.使用维度浏览器进行多维数据的查询、编辑操作。 3.对多维数据集进行切片、切块、旋转、钻取操作。 4.使用MDX语句对多维数据集进行切片。 注意:可参照Analysis Services的教程,构建多维数据集。要求时间和站点维度采用层次结构。 利用实验室和指导教师提供的实验软件,认真完成规定的实验内

容,真实地记录实验中遇到的各种问题和解决的方法与过程,并根据实验案例绘出多维数据组织模型及其OLAP操作过程。实验完成后,应根据实验情况写出实验报告。 三、实验原理及基本技术路线图(方框原理图或程序流程图) 请描述联机分析处理的相关基本概念(MOLAP、ROLAP、切片、切块、旋转、钻取等)。 1.M OLAP:表示基于多维数据组织的OLAP实现。使用多维数组存储数 据。 特点:将细节数据和聚合后的数据均保存在cube中,所以以空间换效率,查询时效率高,但生成cube时需要大量的时间和空间。 2.R OLAP:表示基于关系数据库的OLAP实现。将多维数据库的多维结构 划分为事实表,和维表。 特点:将细节数据保留在关系型数据库的事实表中,聚合后的数据也保存在关系型的数据库中。这种方式查询效率最低,不推荐使用。 3.切片:在给定数据立方体的一个维上进行选择操作就是切片,切片的 结果是得到一个二维平面数据。 4.切块:在给定数据立方体的两个或多个维上进行选择操作就是切块, 切块的结果得到一个子立方体。 5.旋转:维度变换的方向,即在表格中重新安排维的放置(例如行列互 换)。 6.钻取:改变维的层次,变换分析的粒度。它包括向下钻取和向上钻 取。 四、实验方法、步骤(或:程序代码或操作过程) 1.多维数据集

相关文档