文档库 最新最全的文档下载
当前位置:文档库 › gmm_1

gmm_1

gmm_1
gmm_1

Generalized Method of Moments(GMM)

Econ681,Winter2008

Department of Economics,Concordia University Let E[g(y;x; )]=0be a population moment condition implied by economic theory.For example,consider the consumption-based capital asset pricing model(C-CAPM)of Hansen and Singleton(1982)in which a representative agent maximizes expected utility

1X t=1 t E[U(C t)j I0]

subject to a budget constraint

A t+1=(A t+Y t C t)R t+1;

where C t is consumption,A t is an asset with gross return R t;Y t is income and I t is the information set at time t.It is easy to show that the Euler equation is given by

E[U0(C t+1)R t+1j I t]=U0(C t):(1) Assume further a constant relative risk aversion(CRRA)utility function

U(C t)=C1

t 1 1

:

After substituting for U0(C t)=C t into(1)and rearranging we have

E" C t+1C t R t+1 1 I t#=0:

Let x t2I t;y t+1=(C t+1C t;R t+1); =( ; )and h(y t+1; )= (C t+1C t) R t+1 1:Then,the conditional moment restriction implied by the model is

E[h(y t+1; )j x t]=0.(2) By the law of iterated expectations,we obtain the unconditional moment restriction

E[h(y t+1; )x t]=0.

Thus,(2)can be rewritten as E[g(y;x; )]=0;where g(y;x; )=h(y t+1; )x t and x t=(1;

C t C t 1;C t 1

C t 2

;:::;R t;R t 1;::::):In general,the function h(y t+1; )could be a vector.For example,

h(y t+1; )= (C t+1

C t ) R t+1 1;where R t+1is a vector of asset returns and1is a vector of ones.

Then,g(y;x; )=h(y t+1; ) x t is an m 1vector(m=dim(h)dim(x t)),where is the Kronecker product.

Given the population moment conditions,the generalized method of moments(GMM)esti-mator1(Hansen,1982)minimizes a quadratic form in the sample counterparts of these moment conditions.The quadratic form is constructed using a positive de…nite matrix and has the form

Q n( )=g n( )0W n g n( );

where g n( )=1

n

n P i=1g(y i;x i; )is an m 1vector of moment conditions, is a k 1vector of

unknown parameters from ;g(y i;x i; ) g i( )is a given function g:R k!R m with m k and W n is a positive de…nite weight matrix with dimension m m:

Then,the GMM estimator is given by

b =arg min 2 Q n( ):

The GMM estimation is semi-parametric in the sense that it does not fully specify the density from which the sample is obtained.Therefore,it imposes weaker assumptions than parametric methods such as MLE.

Just-Identi…ed Case(m=k):When m=k,the choice of W n is irrelevant.Since Q n( )is a quadratic form,Q n( ) 0and Q n( )=0if and only if g n( )=0:Therefore,in the just-identi…ed case,the method of moments estimator solves the equation g n(b )=0:

Several popular estimation methods can be considered to be special cases of the GMM method

for di¤erent choices of estimating functions1

n P n i=1g i( )=0: Example1The sample average is a method of moments estimator,b =x n;if g i( )=x i : Example2In the maximum likelihood framework,b =MLE if g i( )=@@ log f(x i j ),i.e.

g n( )=F OC or score function.

Example3In the least squares problem,b =OLS if g i( )=x i(y i x0i ): Example4The simple instrumental variables estimator is a method of moments estimator,b = IV(2SLS);with g i( )=z i(y i x0i ):To see this,consider the model

y i=x0i +e i

x i=z0i +v i;

in which E[e i j x i]=0but E[e i j z i]=0;dim(x i)=dim(z i)and =0:By the law of iterated expectations E[e i z i]=0and the IV estimator solves

1 n

n

X i=1z i y i x0i b =0

or in matrix notation

1

n

(Z0Y Z0X b )=0:

Hence,the simple IV estimator is given by

b = Z0X 1Z0Y:

1For an up-to-date comprehensive review of the GMM,see Hall(2005)and the October2002issue of the Journal of Business and Economic Statistics.

Example 5For the -quantile estimator,g i ( )=x i h (y i x 0i ),where h (u )= I f u >0g (1 )I f u 0g ;which simpli…es to g i ( )=x i sgn (y i x 0i )for =1=2(LAD estimator).

Over-Identi…ed Case (m >k ).Typically,we have overidenti…cation (number of moment con-ditions m is greater than the number of parameters k )and g n (b )cannot be set exactly to zero.The idea behind the GMM estimator is to …nd b so that g n (b )is close to 0by minimizing the distance of g n ( )from 0using the quadratic form

Q n ( )=g n ( )0W n g n ( ):

Example 6This example is the same as Example 4but now m >k .In this case,Q n ( )= Y 0Z 0X 0Z W n (Z 0Y Z 0X ).The …rst-order conditions are @@

Q n ( )= 2X 0ZW n (Z 0Y Z 0X )=0;where X 0Z is a k m matrix,W n is an m m matrix,(Z 0Y Z 0X )is an m 1vector and 0is a k 1vector :Hence,the GMM estimator is given by

b = X 0ZW n Z 0X 1X 0ZW n Z 0Y:

If W n =(Z 0Z ) 1;then

b = X 0

Z Z 0Z 1Z 0X 1X 0Z Z 0Z 1Z 0Y = X 0P Z X 1X 0P Z Y which is the two-stage least squares (2SLS)or IV estimator for overidenti…ed models.Note that the 2SLS estimator is a GMM estimator for a particular choice of W n =(Z 0Z ) 1:We will see below that this is an ine¢cient estimator in the presence of conditional heteroskedasticity.

Example 7Non-linear simultaneous equations

y i =h (x i ; )+e i ;

where E (e i j z i )=0:The moment condition is g n ( )=1n n P i =1

z i (y i h (x i ; )):E¢cient GMM Estimator.Let M n ( )=@@ g n ( )=1n P n i =1@@ g i ( )and suppose that W n p

!W ,W >0:Under some regularity conditions (we will discuss these conditions in details later),

b p ! 0;M n (b )p !M ( 0) M;

and p n b d !N (0; );where =(M 0W M ) 1M 0W V W M (M 0W M ) 1and V =E g i ( 0)g i ( 0)0 :

(i)In the just-identi…ed case(m=k);M is square and invertible and(M0W M) 1=M 1W 1(M0) 1:

Then,

=M 1W 1(M0) 1M0W V W MM 1W(M0) 1

=M 1W 1W V W W 1 M0 1

=M 1V M0 1

= M0V 1M 1:

(ii)In the over-identi…ed case(m>k);the e¢cient GMM estimator(the GMM estimator with the smallest variance)sets W=V 1.In this case,

=(M0V 1M) 1M0V 1V V 1M M0V 1M 1

=(M0V 1M) 1M0V 1M M0V 1M 1

= M0V 1M 1

which is the lowest variance bound.The optimal GMM estimator b is e¢cient within the

class of estimators which use only the information contained in E[g i( )]=0,i.e.it is e¢cient

for the given set of moment conditions.

Implementation.Minimizing the criterion function Q n( )typically involves numerical opti-mization using standard gradient-type(for instance,quasi-Newton)methods.The e¢cient GMM sets W n=V n( ) 1;where V n( )is a consistent estimator of V=E(g i g0i):

(i)Two-step GMM(2S-GMM)estimator

(a)estimate the model with some arbitrary symmetric and positive de…nite matrix;for

example,the identity matrix W n=I:This preliminary estimator e is consistent but not

e¢cient.

(b)set W n=V n(e ) 1and reestimate by minimizing Q n( )=g n( )0V n(e ) 1g n( ):The

obtained estimator b is asymptotically e¢cient.

(ii)Iterated GMM

(a)same as part(a)in(i)

(b)same as part(b)in(i)

(c)keep iterating until convergence of both W n and :

Disadvantages of(i)and(ii).

–poor…nite sample properties:in particular,the two-step GMM could be severely biased

in small samples and the magnitude of the bias increases as the number of instruments

increases.

–(i)is not invariant to linear transformations of the moment conditions of the form

e g n( )=Ag n( )for some…xed,non-singular,m m matrix A:

–(ii)is invariant to transformations of the form e g n ( )=Ag n ( )but not to e g n ( )=A ( )g n ( )where A depends on the unknown parameter

(iii)Continuously-updated (CU)GMM (Hansen,Heaton and Yaron,1996)It does not require a

preliminary estimate and directly minimizes:Q n ( )=g n ( )0V n ( ) 1g n ( ):

–it is invariant to both types of transformations considered above

–the …rst-order conditions of the standard two-step GMM estimator,evaluated at the true values of the parameters,are non-zero.This gives rise to an important source

of bias mentioned above which is exacerbated if the number of instruments increases.

By contrast,the …rst-order conditions for the continuously-updated GMM are exactly

satis…ed at the true values of the parameters,i.e.they are centered at zero.As a result,

the CU-GMM estimator is approximately median unbiased but it has a larger variance

and fatter tails that the 2S-GMM estimator.

–CU-GMM is numerically much more complicated than (i)and (ii).

First-Order Conditions for the Two-Step GMM Estimator.The e¢cient two-step GMM estimator b solves

0|{z}k 1=@ g n b 0V n (e ) 1g n b @ =20@@g n b @ 01A 0|{z }k m

V n (e ) 1|{z }m m g n (b )|{z }m 1

(3)where

@g n ( )@ 0|{z }m k =0@@g 1@ 0:::::@g m @ 01A =0B @@g 1@ 1:::::@g 1@ k ::::::::::@g m @ 1:::::@g m @ k 1C A :Equation (3)shows that the 2S-GMM estimator takes a linear combination of the moment conditions and sets it equal to zero.Thus,the 2S-GMM estimator solves a n g n (b )=0;where the weights are a n = @g n (b )@

0 0V n (e ) 1:The …rst-order conditions are typically solved by numerical methods except in linear models where the moment conditions are linear functions of the parameters (as in Example 6).Note that in the case of the CU-GMM estimator,the weighting matrix also depends on the parameter vector and the form of the …rst-order conditions is more complicated.Empirical Likelihood-Based Methods of Moments.The GMM estimator minimizes the distance of the sample counterparts of the moment conditions E [g (y;x; )]from zero using a quadratic form in g n ( )=E [g (y;x; )j F n ]=R g (y;x; )dF n ;where F n denotes the empirical measure of the sample,generated by the unknown distribution function F;that places probability

mass of 1n on all data points.Let P n be another probability measure that assigns multinomial weights p 1;p 2;:::;p n to each of the observations such that it satis…es exactly the moment conditions E [g (y;x; )j P n ]=R g (y;x; )dP n =0.Then,an alternative approach to estimation is to select from the set of distributions ;that

satisfy exactly the moment conditions,a probability measure P n closest to the empirical measure F n de…ned by the Cressie and Read(1984)power divergence criterion

D (F n;P n)=

2

(1+ )

n

X i=1p i[(np i) 1];(4)

where is a…xed scalar parameter that determines the shape of the criterion function.Thus,the estimator is de…ned as the solution to

min

P n2 ; 2

D (F n;P n)(5)

subject to E[g(y;x; )j P n]=Z g(x; )dP n=0:(6) This form of the analogy principle maps the empirical distribution function onto the space of feasible distribution functions and chooses the probability measure that is most likely to have generated the observed data subject to the moment conditions(Manski,1988).

The solution to the above constrained optimization problem is a straightforward application of the Lagrange multiplier principle.In particular,if we let approach0,the solution to the problem

min p; 2

n

n

X i=1ln np i(7)

subject to

n

X i=1p i g(y i;x i; )=0and n X i=1p i=1:(8)

is the empirical likelihood(EL)estimator b EL of Owen(1988,1990,1991)and Qin and Lawless (1994)that we discussed earlier.As before,the EL estimator can be obtained as the root of the system of equations

0B B@1n X n i=1g(x i;b EL)= 1+b 0g(x i;b EL)

1

n X n i=1b 0@g(x i;b EL)@ 0!= 1+b 0g(x i;b EL) 1C C A=0(m+k) 1;

where is a vector of Lagrange multipliers on the moment conditions.The(k+m) 1pa-rameter vector(b EL;b )0can then be used to compute the vector of probability weights p i= h n 1+b 0g(x i;b EL) i 1for i=1;2;:::;n.

Another estimator from this class can be obtained for ! 2and is given by the argument that minimizes

min p; 1

n

n

X i=1(n2p2i 1)(9)

subject to(8).The Lagrangian to this problem is given by2

1 2n

n

X i=1(n2p2i 1) 0n X i=1p i g i( ) n X i=1p i 1!(10)

2We divide the objective function by2for convenience.We know that this type of transformation does not a¤ect the optimal solution.

and its…rst-order condition with respect to p i has the form

np i 0g i( ) =0:(11) Taking averages of both sides yields =1 0g n( );where g n( )=1n P n i=1g i( ):Substituting for into(11)and solving for p i gives p i=1+ 0[g i( ) g n( )]

n

:Plugging this expression for p i into the…rst constraint and solving for ;we get = X n i=1[g i( ) g n( )][g i( ) g n( )]0o 1h X n i=1g i( )i: Substituting for into the expression for p i and then plugging this expression into the objective function(9)yields g n( )0V 1n( )g n( ),where V n( )=1n X n i=1[g i( ) g n( )][g i( ) g n( )]0:This is the objective function of the CU-GMM estimator with the only di¤erence that the weighting matrix is computed from the demeaned moment conditions.

Taking the…rst-order conditions of(10)with respect to and rewriting,we de…ne the CU-GMM as the solution to

24n X i=11+b 0h g i(b ) g n(b )i

n

@g i(b )@ 0!350V n(b ) 1g n(b )=0k 1;(12)

where b = X n i=1h g i(b ) g n(b )i h g i(b ) g n(b )i0 1g n(b ):Therefore,similarly to the two-step GMM estimator,the CU-GMM also sets a linear combination of the moment conditions to zero a n g n(b )=0but the weights are di¤erent and are given by a n= X n i=11+ 0[g i(b ) g n(b )]n @g i(b )@ 0 0V n(b ) 1: Hence,the CU-GMM estimator can be interpreted as an optimally weighted GMM estimator. Heteroskedasticity and Autocorrelation Consistent(HAC)Covariance Estimation.The GMM estimation can be used with stationary and ergodic time series data in which case the mo-ment conditions are possibly serially correlated.To denote that we work with time series data,we

will change the indexing of the moment conditions from f g i( )g n i=1to f g t( )g T t=1:Recall that for the e¢cient GMM,p

(b )!d N(0; );(13) where = M0V 1M 1,V=lim T!1V ar T 1=2P T t=1g t( 0) and M=E @@ g t( 0) .A consistent estimator of M can be easily constructed as

c M M T(b )=1T T X t=1@@ g t(b )=@@ g T(b )(14)

but the construction of an estimator of V that is consistent to heteroskedasticity and serial corre-lation of unknown form in the moment conditions is more involved.

By de…nition,

V =lim T !1

V ar 1p T T X t =1

g t !(15)=

lim T !1E 24 1p T T X t =1[g t E (g t )]!0@1p T T X j =1[g j E (g j )]1A 03

5=lim T !1E 0@1T T X t =1T X j =1[g t E (g t )][g j E (g j )]01A =lim T !11T T X t =1T X j =1E [g t E (g t )][g j E (g j )]0 :Denote the autocovariance function of g t at delay j by R T (j )=E ([g t E (g t )][g t j E (g t j )]0):The autocovariance matrix is given by

2664R T (0)R T (1):::::R T (T 1)R T ( 1)R T (0)::::::::::::::::::::

R T ( T +1):::::R T (0)

3775:(16)The double summation in (15)is equivalent to summing up all elements of matrix (16).Then,using that R T ( j )=R T (j )0we obtain

V =lim T !11T 24T R T (0)+T 1X j =1(T j ) R T (j )+R T (j )0 35=lim T !124R T (0)+T 1X j =1

1 j T R T (j )+R T (j )0 35=R T (0)+1X j =1 R T (j )+R T (j )0 :

Since R T (j )can be consistently estimated by

b R T (j )=1T T X t =j +1"

g t (b ) 1T T X t =1g t (b )#"g t j (b ) 1T T X t =1

g t (b )#0;one natural estimator of V is given by V T (b )=b R T (0)+T 1X j =1

b R T (j )+b R T (j )0 :

But this estimator is inconsistent because it is constructed using the estimates of T covariance functions whose number increases with the sample size.

One possible solution is to get rid of the autocovariances that provide least information,i.e. truncate the sum at some m

V T(b )=b R T(0)+m X j=1 b R T(j)+b R T(j)0 :

Unfortunately,the matrix V T(b )is not guaranteed to be positive semi-de…nite.Newey and West (1987)proposed the estimator

V T(b )=b R T(0)+m X j=1 1 j m+1 b R T(j)+b R T(j)0 (17)

which ensures that V T(b )is positive semi-de…nite3.Newey and West(1987)show that if m!1

and m4

T!0,then V T(b )p!V:A popular choice of m is m= T100 14:

In general,

b V V T(b )=b R T(0)+T 1X j=1K(j=m)(b R T(j)+b R T(j)0);(18)

where K(j=m)is a kernel(weight)function such that K(j=m)=0if j>m,and m is the bandwidth.Kernels that are commonly used in applied work are Bartlett kernel,Parzen kernel and Quadratic spectral kernel:The Bartlett kernel is the kernel used in(17)by Newey and West (1987).

Thus,the heteroskedasticity and autocorrelation consistent(HAC)estimator of the variance-covariance matrix of the GMM estimator is given by

b =

c M0b V 1c M 1;

where the expressions for c M and b V are given in(14)and(18),respectively.

References

[1]Cressie,N.and T.Read(1984),”Multinomial goodness-of-…t test”,Journal of Royal Statistical

Society,Series B,46,440-464.

[2]Hall,A.R.(2005),Generalized Method of Moments,Oxford University Press:Oxford.

[3]Hansen,L.P.(1982),”Large sample properties of generalized method of moments estimators”,

Econometrica,50,1029-1054.

[4]Hansen,L.P.,Heaton,J.and A.Yaron(1996),”Finite-sample properties of some alternative

GMM estimators”,Journal of Business and Economic Statistics,14,262-280.

[5]Manski,C.F.(1988),Analog Estimation Methods in Econometrics,Chapman and Hall,New

York.

3For instance,if m=1,V T(b )=b R T(0)+12(b R T(1)+b R T(1)0)and if m=2,V T(b )=b R T(0)+23(b R T(1)+b R T(1)0)+ 1(b R T(2)+b R T(2)0):

[6]Newey,W.K.,and K.West(1987),”A Simple Positive Semi-de…nite Heteroskedasticity and

Autocorrelation Consistent Covariance Matrix Estimator’,Econometrica,55,703-708.

[7]Owen,A.(1988),”Empirical likelihood ratio con…dence intervals for a single functional”,

Biometrika,75,237-249.

[8]Owen,A.(1988),”Empirical likelihood ratio con…dence regions”,Annals of Statistics,18,

90-120.

[9]Owen,A.(1988),”Empirical likelihood for linear models”,Annals of Statistics,19,1725-1747.

[10]Qin,J.and https://www.wendangku.net/doc/753299684.html,wless(1994),”Empirical likelihood and general estimating equations”,Annals

of Statistics,22,300-325.

相关文档