文档库 最新最全的文档下载
当前位置:文档库 › multreg

multreg

multreg
multreg

Multiple Regression by Iterative Fitting(EDTTS, Ch.7) We can use the same principle with more variables:

(1a) Fit y using x1

(2a) Fit residuals from step (1a) to x2

(1b) Fit residuals from step (2a) to x1

(2b) Fit residuals from step (1b) to x2

(1c) Fit residuals from step (2b) to x1

(2c) Fit residuals from step (1c) to x2

and so on, until the coef?cients are effectively zero.When ?tting by least squares, the ?nal residuals will be the same as the least squares residuals.

Final coef?cients must be calculated:

Intercept: add intercept terms together from all regressions for x1:add coef?cients of x1from (1a), (2a), (3a), ...

for x2:add coef?cients of x2from (1b), (2b), (3b), ... NB:This is a very slow way to obtain least squares ?t

A more direct way of ?nding this least squares ?t that does not require iteration is to adjust each variable for the suc-cessive ones as described in EDTTS (Ch.7).We use this same "adjusted iterative ?tting" when we replace the LS ?t by the RR ?t.

Multiple Regression by Adjusted Iterative Fitting "Sweep out" one variable at a time

Basic idea:

(1) Adjust y for x1→y?1

(2) Adjust x2for x1→x2?1

(3) Adjust y?1for x2?1

Now y is adjusted for both x1and x2.Also, see how much of x1is in x2.

See how this results in LS line:

(1) Fit y to x1and form residuals:

?y=c0+c1x1

y?1=y?c1x1

c1=

n

i=1

Σ(x i1?x1)y i/n

i=1

Σ(x i1?x1)2

(2) Fit x2to x1and form residuals:

?x2=d0+d1x1

x2?1=x2?d1x1

d1=

n

i=1

Σ(x i1?x1)x i2/n

i=1

Σ(x i1?x1)2

(3) Fit y?1to x2?1and form residuals:

?y?1=c0+c2x2?1

y?12=y?1?c2x2?1

c2=

n

i=1

Σ(x2?1,i?x2?1)y?1,i/n

i=1

Σ(x2?1,i?x2?1)2

Least squares ?t is?y=b0+b1x1+b2x2,where

b2=c2

b1=c1?d1c2

b0=mean(y i?b1x i1?b2x i2).

Note:If you plot y?12against x1,the least squares slope of the line will be zero; likewise for x2.

Everywhere above,replace "LS ?t" with "RR line ?t", but the "Note"no longer applies, so must add iterative steps: (4) Plot y?12against x1.If slope is not "small", adjust c1

slope accordingly,forming new c1,y?1,c2,y?12. (5) Plot y?12against x2.If slope is not "small", adjust c1

slope accordingly,return to (4).Otherwise, go to (6).

(6) Obtain?nal residuals from step (5);

obtain ?nal ?t from either(data?residuals)

or from assembling all coef?cients from all steps

or from?y=b0+b1+x1+b2x2where

b2=c2

b1=c1?d1c2

b0=median(y1?b1x i1?b2x i2).

EDTTS, p.251: adjusted iterative ?tting:

(1) Fit y to x1=Poverty:

y=87.54+1. 72x1+y?1

(2) Fit x4=Infant mortality rate to x1:

x4=89.82+1. 15x1+x4?1

(3) Fit y?1to x4?1:

y?1=?4. 34?0. 329x4?1+y?14 Assemble:

y=87.54+1. 72x1?4. 34?0. 329(x4?89.82?1. 15x1)+y?14 =11 2.75+2. 10x1?0. 329x4+y?14

Note:The correlation between x1and x4is high -- 0.57 --so unadjusted least squares ?tting requires nine steps to converge to the least squares solution!(When the correla-tion is only 0.30, as for x1and x2,only six steps are needed.) We reduce the need for extensive iteration by adjusting variables as we go along.

Same decomposition as before:

data=fit1+residual1

=fit2+residual2

=fit3+residual3

with ?nal decomposition

data=(fit1+fit2+fit3)+residual3

Here,fit2is obtained by ?tting x2.1to residual1,and so on, as variables are used in turn.The results are not the same as ?tting x2to residual1,unless x1is uncorrelated with x2 (there is no component of x1in x2).

General procedure, > 2 independent variables:

(1) Fit y to x1→

y=c10+c1x1+y?1

(2) Fit x2to x1→

x2=d20+d21x1+x2?1

(1) + (2)→(3):

(3) Fit y?1to x2?1→

y?1=c21+c2x2?1+y?12

(4) Fit x3to x1and x2?1→

x3=d30+d31x1+d32x2?1+x3?12

(3) + (4)→(5)

(5) Fit y?12to x3?12→

y?12=c30+c3x3?12+y?123

(6) Fit x4to x1,x2?1,x3?12→

x4=d40+d41x1+d42x2?1+d43x3?12+x4?123

(5) + (6)→(7):

(7) Fit y?123to x4?123→

y?123=c40+c4x4?123+y?1234

(8) Fit x5to x1,x2?1,x3?12,x4?123→(etc.)

Assemble coef?cients (for intercepts,x1,x2,etc.)

Strategies for removing variables:

(1) Plot y versus x k,in turn; choose that x k for which

relationship looks strongest.Note: some plots may

suggest transforming x k?rst; e.g.,1/income,

log(rate).

(2) Regress (rank of y i)on(rank of x ik), and choose that

x k for which the slope coef?cient is largest (use of

ranks reduces impact of outliers on the LS regression

-- but there could be many ties for the "most signi?-

cant" if there are only few data points and several

variables have monotonic relationships with y)

Some important facts to remember:

(1) The?t depends on what variables are in the equation.

Omission of a variable can make drastic changes in

?t -- even rev e rse sign of coef?cients

(cf. ?t of y?some to x k?some).

(2) The LS coef?cient b k can be obtained by regressing

y?(all but k)on x k?(all but k).So a plot of y?(all but k)on

x k?(all but k)is more informative than y on x k.

Robust Regression: "Iteratively reweighted LS"

y=Xβ+epislon=?t+rough

?β=(X′X)?1X′y

linear function of y=(y1,... ,y n)′Weighted least squares:

=(X′WX)?1X′Wy

W

Weight matrix W=diag(w1,...,w n)

w i=BIG if residual is small

(point i is "consistent" with "model")w i=little if resid-ual is large

(point i is "inconsistent" with "model")

Deciding on BIG and little residuals:

?s_r=measure of spread of residuals (e.g., ...)

?BIG residual if|resid|>6?s_r:

w i=[1?(resid/6s2_r)]2|resid|<6s r(*)

=0otherwise

Iteratively re-weighted LS:

1. Fit OLS→?β(0)→r(0)=y?X?β(0)

2Calculate W(1 )[i,i]using r(0)i for resid for w i(*)

3. Fit?βvia WLS (weighted LS):→?β(1 )→

r(1 )=y?X?β(1 )

4Calculate W(2 )[i,i]using r(1 )i for resid for w i

5. Fit?βvia WLS using W(2 ):→?β(2 )→r(2 )=y?X?β(2 )

|beta(k+1)??β(k)|<τ(tolerance)

Iterate until?

Notes:

1. Starting with OLS is NOT robust, but it’s easy.

2. Eqn(*) is called bisquare weight function:

w i=1when residual=0

w i=0when |residual|>6s_r

falls smoothly to zero

3. The value "6" is a "tuning parameter"https://www.wendangku.net/doc/0616627949.html,monly

4≤c≤6.

4. Any weight function that smoothly falls to zero

yields robust estimate ofβ.Biweight regression is

especially ef?cient.

5. Easy to do computationally.See rlm(MASS):

rlm(formula, data, weights, ..., subset, na.action,

method = c("M", "MM", "model.frame"), ...)

rlm(x, y,weights, w = rep(1, nrow(x)), init = "ls",

-10-

psi = psi.huber,method = c("M", "MM"), ... )

method: currently either M-estimation or MM-estimation

or (for the ’formula’ method only) ?nd the model frame. MM-estimation is M-estimation with Tukey’s biweight initialized by a speci?c S-estimator.See ’Details’

Psi functions are supplied for the Huber,Hampel and Tukey bisquare proposals as ’psi.huber’, ’psi.hampel’, ‘psi.bisquare’. Selecting ’method = "MM"’ selects a speci?c set of options which ensures that the estimator has a high breakdown point. The initial set of coef?cients and the ?nal scale are selected by an S-estimator with ’k0 = 1.548’; this gives

(for n >> p) breakdown point 0.5. The ?nal estimator is

an M-estimator with Tukey’s biweight and ?xed scale that will inherit this breakdown point provided ’c > k0’; this

is true for the default value of ’c’ that corresponds to

95% relative ef?ciency at the normal.Case weights are

not supported for ’method = "MM"’.

相关文档