Bootstrap and Jackknife Calculations in R

Version 6April 2004

These notes work through a simple example to show how one can program R to do both jackknife and bootstrap sampling.We start with bootstrapping.

Bootstrap Calculations

R has a number of nice features for easy calculation of bootstrap estimates and con?dence

intervals.To see how to use these features,consider the following 25observations:8.26 6.3310.4 5.27 5.35 5.61 6.12 6.19







12.28 5.6




Suppose we wish to estimate the coef?cient of variation,CV =√

Var /x .Let’s do this with

a bootstrap estimator.

First,let’s put the data into a vector,which we will call x ,

>x <-c(8.26, 6.33,10.4, 5.27, 5.35, 5.61, 6.12, 6.19, 5.2,7.01,8.74,7.78,7.02,6, 6.5, 5.8, 5.12,7.41, 6.52, 6.21,12.28, 5.6, 5.38, 6.6,8.74)

Now let’s de?ne a functon in R ,which we will call CV ,to compute the coef?cient of variation,

>CV <-function(x)sqrt(var(x))/mean(x)

So,let’s compute the CV



To generate a single bootstrap sample from this data vector,we use the command


which generates a bootstrap sample of the data vector x by sampling with replacement.Hence,to compute the CV using a single bootstrap sample,


The particular value that R returns for you will be different as the sample is random.Some other useful commands:

>sum(x)returns the sum of the elements in x >mean(x)returns the mean of the elements in x

>var(x)returns the sample variance,i.e.,

i (x ?x )2/(n ?1)

>length(x)returns the number of items in x (i.e.,the sample size n )

Note that the sum command is fairly general,for example


i (x ?x )2

So,lets now generate 1000bootstrap samples.We ?rst need to specify a vector of real values of lenght 1000,which we will call boot

>boot <-numeric(1000)

We now generate 1000samples,and assign the CV for bootstrap sample i as the i th element in the vector boot ,using a for loop

for (i in 1:1000)boot[i]<-CV(sample(x,replace=T))

The mean and variance of this collection of bootstrap samples are easily obtained using the mean and var commands (again,your values may differ),


A plot of the histogram of these values follows using


Likewise,the value corresponding to the (say)upper 97.5


while the value corresponding to the lower 2.5%follows from


Recall from the notes that the estimate of the bias is given by the difference between the mean of the bootstrap values and the initial estimate,


and an bootstrap-corrected estimate of the CV is just the original estimate minus the bias, >CV(x)-bias


Assuming normality,the approximate95%con?dence interval is given by


(or adjusting for the bias an lower and upper values of

>CV(x)-bias- 1.96*sqrt(var(boot))


>CV(x)-bias+ 1.96*sqrt(var(boot))


Efron’s con?dent limit(Equation11on resampling notes)has an upper and lower value of






While Hall’s con?dence limits(Equation12)has an upper and lower value of >2*CV(x)-quantile(boot,0.025)





Jackknife Calculations

We now turn to jackkni?ng the sample.Recall from the randomization notes that this involves two steps.First,we generate a jackknife sample which has value x i removed and then compute the i th partial estimate of the test statistic using this sample,


(x1···x i?1,x i,···x n)


We then turn this i th partial estimate into the i th pseudovalue θ?i using(Equation5c in random notes)


=n θ?(n?1) θi


where θis the estimate using the full data.

Let’s see how to code this in R using the previous vector x of data with our test statistic again being the coef?cient of variation(and hence our function CV previously de?ned). We?rst focus on generating the i th partial estimate and i th pseudovalue.We need to take the original data vector x and turn it into a vector(which we denote jack)of lenght n?1 as follows.First,we need to specify to R that we are creating the jackknife sample vector of the n?1sampled points


As before,we will use the command lenght(x)in place of n.We also need to specify to R that we will be generating a vector pseudo of the n pseudovalues


Next,we need to?ll in the elements of the jack sample vector as follows.For ji,the j?1th element of jack is the j th element of x.We can state all this using a logical if..else statement within a for loop,

for(j in1:length(x))if(j

else if(j>i)jack[j-1]<-x[j]

We can then compute the i th pseudovalue(for the CV)as follows:


Finally,we top this all off by looping through the n possible i values,giving the?nal code as



for(i in1:length(x))

{for(j in1:length(x))



Note the use of the parenthesis({,})to delimit the appropriate elements in each loop.The mean and variance of the pseudovalues are easily found using





Likewise,a histogram of the pseudovalues is generated using


Recall that the mean of the pseudovalues is the bootstrap estimator,while var(pseudo)/n is the variance of this estimator,



An approximate 95%con?dence interval is given by

mean(pseudo)±t 0.975,n ?1 var(pseudo)/n Using R ,the upper and lower limits become



Giving the approximate 95%jackknife con?dence interval as 0.150to 0.372.

Here’s a summary of the various estimated values,variances,and con?dence intervals


Estimated CV Variance 95%interval Original Estimate 0.252Jackknife 0.2620.00290.150-0.373Bootstrap



Bootstrap (normality)0.178-0.351Bootstrap (Efron)0.153-0.318Bootstrap (Hall)



