文档库 最新最全的文档下载
当前位置:文档库 › Li_Deep_Sparse_Representation_2015_CVPR_paper

Li_Deep_Sparse_Representation_2015_CVPR_paper

Li_Deep_Sparse_Representation_2015_CVPR_paper
Li_Deep_Sparse_Representation_2015_CVPR_paper

Deep Sparse Representation for Robust Image Registration Yeqing Li?1,Chen Chen?1,Fei Yang2,and Junzhou Huang1

1University of Texas at Arlington

2Facebook Inc.

Abstract

The de?nition of the similarity measure is an essential component in image registration.In this paper,we propose a novel similarity measure for registration of two or more images.The proposed method is motivated by that the opti-mally registered images can be deeply sparsi?ed in the gra-dient domain and frequency domain,with the separation of a sparse tensor of errors.One of the key advantages of the proposed similarity measure is its robustness to severe in-tensity distortions,which widely exist on medical images, remotely sensed images and natural photos due to the dif-ference of acquisition modalities or illumination conditions. Two ef?cient algorithms are proposed to solve the batch im-age registration and pair registration problems in a uni?ed framework.We validate our method on extensive challeng-ing datasets.The experimental results demonstrate the ro-bustness,accuracy and ef?ciency of our method over9tra-ditional and state-of-the-art algorithms on synthetic images and a wide range of real-world applications.

1.Introduction

Image registration is a fundamental task in image pro-cessing and computer vision[29,23,20].It aims to align two or more images into the same coordinate system,and then these images can be processed or compared.Accu-racy and robustness are two of the most important metrics to evaluate a registration method.It has been shown that a mean geometric distortion of only0.3pixel will result in noticeable effect on a pixel-to-pixel image fusion pro-cess[3].Robustness is de?ned as the ability to get close to the accurate results on different trials under diverse con-ditions.Based on the feature used in registration,exist-ing methods can be classi?ed into feature-based registra-tion(e.g.,[28,16,15])and pixel-based registration(e.g., [10,6,26,25]).Feature-based methods rely on the land-marks extracted from the images.However,extracting re-?indicates equal contributions.Corresponding author:Junzhou Huang. Email:jzhuang@https://www.wendangku.net/doc/3e12000373.html,.This work was partially supported by NSF IIS-1423056,CMMI-1434401,CNS-1405985.

liable features is still an open problem and an active topic of research[20].In this paper,we are interested in image registration by directly using their pixel values.In addition, we wish to successfully register the images from a variety of applications in subpixel-level accuracy,as precisely as possible.

One key component for image registration is the energy function to measure(dis)similarity.The optimized similar-ity should lead to the correct spatial alignment.However,?nding a reliable similarity measure is quite challenging due to the unpredicted variations of the input images.In many real-world applications,the images to be registered may be acquired at different times and locations,under var-ious illumination conditions and occlusions,or by differ-ent acquisition modalities.As a result,the intensity?elds of the images may vary signi?cantly.For instance,slow-varying intensity bias?elds often exist in brain magnetic resonance images[22];the remotely sensed images may even have inverse contrast for the same land objects,as mul-tiple sensors have different sensitivities to wavelength spec-trum[24].Unfortunately,many existing pixel-based simi-larity measures are not robust to these intensity variations,

e.g.,the widely used sum-of-squared-difference(SSD)[23].

Recently,the sparsity-inducing similarity measures have been repeatedly successful in overcoming such registration dif?culties[17,19,27,9].In RASL[19](robust alignment by sparse and low-rank decomposition),the images are vec-torized to form a data matrix.The transformations are es-timated to seek a low rank and sparse representation of the aligned images.Two online alignment methods,ORIA[27] (online robust image alignment)and t-GRASTA[9](trans-formed Grassmannian robust adaptive subspace tracking al-gorithm),are proposed to improve the scalability of RASL.

All of these methods assume that the large errors among the images are sparse(e.g.,caused by shadows,partial occlu-sions)and separable.However,as we will show later,many real-world images contain severe spatially-varying intensity distortions.These intensity variations are not sparse and therefore dif?cult to be separated by these methods.As a result,the above measures may fail to?nd the correct align-ment and thus are less robust in these challenging tasks.

1

The residual complexity(RC)[17]is one of the best

measures for registering two images corrupted by severe in-

tensity distortion[8],which uses the discrete cosine trans-

form(DCT)to sparsify the residual of two images.For a

batch of images,RC has to register them pair-by-pair and

the solution may be sub-optimal.In addition,DCT and in-

verse DCT are required in each iteration,which slows down

the overall speed of registration.Finally,although RC is

robust to intensity distortions,the ability of RC to handle

partial occlusions is unknown.

Unlike previous works that vectorize each image into a

vector[19,27,9],we arrange the input images into a3D

tensor to keep their spatial structure.With this arrange-

ment,the optimally registered image tensor can be deeply

sparsi?ed into a sparse frequency tensor and a sparse error

tensor(see Fig.1for more details).Severe intensity distor-

tions and partial occlusions will be sparsi?ed and separated

out in the?rst and second layers,while any misalignment

will increase the sparseness of the frequency tensor(third

layer).We propose a novel similarity measure based on

such deep sparse representation of the natural https://www.wendangku.net/doc/3e12000373.html,-

pared with the low rank similarity measure which requires

a batch of input images,the proposed similarity measure

still works even when there are only two input images.An

ef?cient algorithm based on the Augmented Lagrange Mul-

tiplier(ALM)method is proposed for the batch mode,while

the gradient descent method with backtracking is presented

to solve the pair registration problem.Both algorithms have

very low computational complexity in each iteration.We

compare our method with9traditional and state-of-the-art

algorithms on a wide range of natural image datasets,in-

cluding medical images,remotely sensed images and pho-

tos.Extensive results demonstrate that our method is more

robust to different types of intensity variations and always

achieves higher sub-pixel accuracy over all the tested meth-

ods.

2.Image registration via deep sparse represen-

tation

In this paper,we use bold letters denote multi-

dimensional data.For example,x denotes a vector,X de-

notes a matrix and X is a3D or third-order tensor.X(i,j,t)

denotes the entry in the i-th row,j-th column and t-th slice. X(:,:,t)denotes the whole t-th slice,which is therefore a ma-trix.The 1norm is the summation of absolute values of all

entries,which applies to vector,matrix and tensor.

2.1.Batch mode

We introduce our deep sparsity architecture in the in-

verse order for easy understanding.Suppose we have a

batch of grayscale images I1,I2,...,I N∈R w×h to be registered,where N denotes the total number of images.

1st layer

2nd layer

3rd layer

Sparse

decomposition

+

Sparsifying in the frequency domain

Sparsifying in the gradient domain

…...

…...

Figure1.Deep sparse representation of the optimally registered images.First we sparsify the image tensor into the gradient tensor (1st layer).The sparse error tensor is then separated out in the2nd layer.The gradient tensor with repetitive patterns are sparsi?ed in the frequency domain.Finally we obtain an extremely sparse fre-quency tensor(composed of Fourier coef?cients)in the3rd layer.

First,we consider the simplest case that all the input im-ages are identical and perturbed from a set of transforma-tionsτ={τ1,τ2,...,τN}.

We arrange the input images into a3D tensor D∈R w×h×N,with

D(:,;,t)=I t,t=1,2,...,N,(1) After removing the transformation perturbations,the slices show repetitive patterns.Such periodic signals are ex-tremely sparse in the frequency domain.Ideally the Fourier coef?cients from the second slice to the last slice should be all zeros.We can minimize the 1norm of the Fourier coef?cients to seek the optimal transformations:

min

A,τ

||F N A||1,s.t.D?τ=A,(2)

where F N denotes the Fourier transform in the third direc-tion.

The above model can be hardly used on practical cases, due to the corruptions and partial occlusions in the images. Similar as previous work[19],we assume the noise is neg-ligible in magnitude as compared to the error caused by oc-clusions.Let E be the error tensor.We can separate it from the image tensor if it is sparse enough.Similar,we use the 1norm to induce sparseness:

min

A,E,τ

||F N A||1+λ||E||1,s.t.D?τ=A+E,(3) whereλ>0is a regularization parameter.

1000

2000

3000

SSD

Translation (pixels)

F

u

n

c

t

i

o

n

v

a

l

u

e

500

1000

1500

RC

Translation (pixels)

F

u

n

c

t

i

o

n

v

a

l

u

e

2000

4000

6000

8000

SAD

Translation (pixels)

F

u

n

c

t

i

o

n

v

a

l

u

e

-600

-400

-200

CC

Translation (pixels)

F

u

n

c

t

i

o

n

v

a

l

u

e

1

2

3

5CD2

Translation (pixels)

F

u

n

c

t

i

o

n

v

a

l

u

e

-1

1

2

3

5MS

Translation (pixels)

F

u

n

c

t

i

o

n

v

a

l

u

e

-4000

-3000

-2000

-1000

MI

Translation (pixels)

F

u

n

c

t

i

o

n

v

a

l

u

e

500

1000

Proposed

Translation (pixels)

F

u

n

c

t

i

o

n

v

a

l

u

e

(a)

(b)

Figure2.A toy registration example with respect to horizontal translation using different similarity measures(SSD[23],RC[17],SAD [23],CC[13],CD2[6],MS[18],MI[26]and the proposed pair mode).(a)The Lena image(128×128).(b)A toy Lena image under a severe intensity distortion.Blue curves:registration between(a)and(a);red curves:registration between(b)and(a).

The above approach requires that the error E is sparse.

However,in many real-world applications,the images are

corrupted with spatially-varying intensity distortions.Ex-

isting methods such as RASL[19]and t-GRASTA[9]may

fail to separate these non-sparse errors.The last stage of

our method comes from the intuition that the locations of

the image gradients(edges)should almost keep the same,

even under severe intensity distortions.Therefore,we reg-

ister the images in the gradient domain:

min

A,E,τ

||F N A||1+λ||E||1,s.t.?D?τ=A+E,(4)

where?D=

(?x D)2+(?y D)2denotes the gradient

tensor along the two spatial directions.This is based on a

mild assumption that the intensity distortion?elds of natural

images often change smoothly.

With this rationale,the input images can be sparsely rep-

resented in a three layer architecture,which is shown in Fig.

1.We call it deep sparse representation of https://www.wendangku.net/doc/3e12000373.html,par-

ing with existing popular low rank representation[19],our

modeling has two major advantages.First,the low rank

representation treats each image as a1D signal,while our

modeling exploits the spatial prior information(piece-wise

smoothness)of natural images.Second,when the number

of input images is not suf?cient to form a low rank matrix,

our method is still effective.Next,we will demonstrate how

does our method register only two input images.

2.2.Pair mode

For registering a pair of images,our model can be simpli-

?ed and the registration can be accelerated.After two-point

discrete Fourier transform(DFT),the?rst entry is the sum

and the second entry is the difference.The difference term

is much sparser than the sum term when the two images

have been registered.We can discard the sum term to seek

a sparser representation.Let I1be the reference image,and

I2be the source image to be registered.The problem(4)

can be simpli?ed to

min

A1,A2,E,τ

||A1?A2||1+λ||E||1,

s.t.?I1=A1,?I2?τ=A2+E.(5)

Both 1norms in(5)implies the same property,i.e.,sparse-

ness of the residual image E.Therefore,we can further

simplify the above energy function:

min

τ

||?I1??I2?τ||1.(6)

It’s interesting that(6)is equivalent to minimizing the to-

tal variation(TV)of the residual image.The TV has been

successfully utilized in many image reconstruction[12,11]

and non-rigid registration[14]problems.

We compare the proposed similarity measure with SSD

[23],RC[17],sum-of-absolute value(SAD)[23],correla-

tion coef?cient(CC)[13],CD2[6],MS[18]and mutual

information(MI)[26]on a toy example.The Lena image

is registered with itself with respect to the horizontal trans-

lations.The blue curves in Fig.2show the responses of

different measures,all of which can?nd the optimal align-

ment at the zero translation.After adding intensity distor-

tions and rescaling,the appearance of source image shown

in Fig.2(b)is not consistent with that of the original Lena

image.The results denoted by the red curves show that only

RC and the proposed pair mode can handle this intensity

distortion while other methods fail.

3.Algorithms

3.1.Batch mode

Problem (4)is dif?cult to solve directly due to the non-linearity of the transformations τ.We use the local ?rst order Taylor approximation for each image:

?I t ?(τt + τt )≈?I t ?τt +J t ? τt

(7)

for t =1,2,...,N ,where J t =?

(?I t ?ζ)|ζ=τt ∈R w ×h ×p when τt is de?ned by p parameters.The Tensor-Vector Product of the last term is de?ned by:

De?nition 1.Tensor-Vector Product .The product of a tensor A ∈R n 1×n 2×n 3and a vector b ∈R n 3is a matrix C ∈R n 1×n 2.It is given by C =A ?b ,where C (i,j )= n 3

t =1A (i,j,t )b (t ),for i =1,2,...,n 1and j =1,2,...,n 2.Based on this,the batch mode (4)can be rewritten as:

min A ,E , τ

||F N A ||1+λ||E ||1,

s.t.?D ?τ+J ? τ=A +E ,

(8)

This constrained problem can be solved by the aug-mented Lagrange multiplier (ALM)algorithm [19,4].The augmented Lagrangian problem is to iteratively update A ,E , τand Y by

(A k +1,E k +1, τk +1)=arg min A ,E , τ

L (A ,E , τ,Y ),

Y k +1=Y k +μk h (A k ,E k , τk ),(9)

where k is the iteration counter and

L (A ,E , τ,Y )=+||F N A ||1

+λ||E ||1+μ

2||h (A ,E , τ)||2F ,(10)where the inner product of two tensors is the sum of all the

element-wise products and

h (A ,E , τ)=?D ?τ+J ? τ?A ?E .

(11)

A common strategy to solve (9)is to minimize the function against one unknown at one time.Each of the subproblem has a closed form solution:

A k +1=T 1/μk (?D ?τ+J ? τ+1μk Y k

?E k )E k +1

=T λ/μk (?D ?τ+J ? τ+1

μk Y k ?A k +1)

τk +1t =J T t ?(A k +1(:,:,t )+E k +1

(:,:,t )??D (:,:,t )?τ

?

1μk Y k (:,:,t )

),for t =1,2,...,N (12)

where the T α()denotes the soft thresholding operation with threshold value α.In the third equation of (12),we use

the Tensor-Matrix Product and Tensor Transpose de?ned as follows:

De?nition 2.Tensor-Matrix Product .The product of a tensor A ∈R n 1×n 2×n 3and a matrix B ∈R n 2×n 3is a vector c ∈R n 1.It is given by c =A ?B ,where c (i )= n 2j =1 n 3

t =1A (i,j,t )B (j,t ),for i =1,2,...,n 1.De?nition 3.Tensor Transpose .The transpose of a ten-sor A ∈R n 1×n 2×n 3is the tensor A T ∈R n 3×n 1×n 2.The registration algorithm for the batch mode is sum-marized in Algorithm 1.Let M =w ×h be the number of pixels of each image.We set λ=1/√

M and μk =1.25k μ0in the experiments,where μ0=1.25/||?D ||2.For the inner loop,applying the fast Fourier transform (FFT)costs O (N log N ).All the other steps cost O (MN ).Therefore,the total computation complexity of our method is O (N log N +MN ),which is signi?cantly faster than O (N 2M )when applying SVD decomposition in RASL (if M N ).

Algorithm 1Image registration via DSR -batch mode input:Images I 1,I 2,...,I N ,initial transformations τ1,τ2,...,τN ,regularization parameter λ.repeat

1)Compute J t =?

?ζ(?I t ?ζ)|ζ=τt ,t =1,2,...,N ;2)

Warp and normalize the gradient images:

?D ?τ=[?I 1?τ1

||?I 1?τ1||F ;...;?I N ?τN ||?I N ?τN ||F ];

3)Use (12)to iteratively solve the minimization problem of ALM:

A ?,E ?, τ?=arg min L (A ,E , τ,Y );4)Update transformations:τ=τ+ τ?;until Stop criteria 3.2.Pair mode

Similar as that in the batch mode,we have:

?I 2?(τ+ τ)≈?I 2?τ+J ? τ

(13)

where J ∈R w ×h ×p denotes the Jacobian.Thus,the pair mode (6)is to minimize the energy function with respect to τ:

E ( τ)=||?I 1??I 2?τ?J ? τ||1

(14)

The 1norm in (14)is not smooth.We can have a tight ap-proximation for the absolute value:|x |=√

x 2+ ,where is a small constant (e.g.10?10).Let r =?I 1??I 2?τ?J ? τ,and we can obtain the gradient of the energy function by the chain rule:

?E ( τ)=J T ?r √

r ?r +

(15)

where ?denotes the Hadamard product.Note that the divi-sion in (15)is element-wise.

Gradient descent with backtracking is used to minimize the energy function (14),which is summarized in Algo-rithm 2.We set the initial step size μ0=1and η=0.8.The computational complexity of each iteration is O (M ),which is much faster than O (M log M )in RC when fast cosine transform (FCT)is applied [17].Similar as the batch mode,we use the normalized images to rule out the trivial solutions.We use a coarse-to-?ne hierarchical registration architecture for both the batch mode and pair mode.

Algorithm 2Image registration via DSR -pair mode input:I 1,I 2,η<1,τ,μ0.repeat

1)Warp and normalize I 2with τ;2)μ=μ0;

3)Compute τ=?μ?E (0);4)If E ( τ)>E (0),

set μ=ημand go back to 3);

5)Update transformation:τ=τ+ τ;until Stop criteria

4.Experimental results

In this section,we validate our method on a wide range of applications.We compare our batch mode with RASL [19]and t-GRASTA [9],and compare our pair mode with RC [17]and SSD [23].One of the most important advan-tages of our method is its robustness and accuracy on natu-ral images under spatially-varying intensity distortions.As shown in [17]and Fig.2,SAD [23],CC [13],CD2[6],MS [18],MI [26]are easy to fail in such cases.We do not include them in the following experiments.All exper-iments are conducted on a desktop computer with Intel i7-3770CPU with 12GB RAM.

4.1.Batch image registration

To evaluate the performance of our batch mode,we use a popular database of naturally captured images [1].We choose the four datasets with the largest lighting variations:”NUTS”,”MOVI”,”FRUITS”and ”TOY”.These datasets are very challenging to register,as they have up to 20dif-ferent lighting conditions and are occluded by varying shad-ows.Random translations on both directions are applied on the four datasets,which are drawn from a uniform distribu-tion in a range of 10pixels.

After registration on the ”NUTS”dataset,the two com-ponents of each algorithm is shown in Fig.3.RASL [19]and t-GRASTA [9]fail to separate the shadows and large errors,while we can successfully ?nd the deep sparse rep-resentation of the optimally registered images.The average

(a)(b)

(c)(d)

(e)(f)

Figure 3.Batch image registration on the NUTS datasets.(a)The low rank component by RASL.(b)The sparse errors by RASL.(c)The subspace representation by t-GRASTA.(d)The sparse errors by t-GRASTA.(e)The visualization of A by our method.(f)The sparse error E by our method.

(a)

(b)

(c)

(d)

Figure 4.Registration results on the ”NUTS”dataset.(a)The aver-age image of perturbed images.(b)The average image by RASL.(c)The average image by t-GRASTA.(d)The average image by our method.

of perturbed images and results are shown in Fig.4,where the average image by the proposed method has signi?cantly sharper edges than those by the two existing methods.The quantitative comparisons on the four datasets are listed in Table 1over 20random runs.The overall average errors of our method are consistently lower than those of RASL

and t-GRASTA.More importantly,only our method can al-ways achieve subpixel accuracy.For 20images with size 128×128pixels,the registration time is around 7sec-onds for both RASL and our method,while t-GRAST costs around 27seconds.RASL should be much slower on larger datasets due to the higher complexity of SVD,although we did not test.

RASL t-GRASTA Proposed NUTS 0.670/2.443 1.153/3.8420.061/0.488MOVI 0.029/0.0970.568/2.9650.007/0.024FRUITS 0.050/0.107 1.094/4.4950.031/0.076TOY

0.105/0.373

0.405/2.395

0.038/0.076

Table 1.The mean/max registration errors in pixels of RASL,t-GRASTA and our method on the four lighting datasets.The ?rst image is ?xed to evaluate the errors.

We evaluate these three methods on the Multi-PIE face database [7].This database contains 20images of each sub-ject captured at different illumination conditions.We add random arti?cial rotations in a range of 10?and transla-tions in 10pixels on the ?rst 100subjects from the Session 1.As the optimal alignment is not unique (e.g,all images shift by 1pixel),we compare the standard derivation (STD)of the transformations after registration.Ideally,the STD should be zero when all the perturbations have been exactly removed.Fig.5shows the average registration results over 20runs for each subject.Our method is more accurate than RASL and t-GRASTA for almost every subject.

00.2

0.4

0.6

0.8

1

1.2

1.4

Subject

R o t a t i o n S T D

RASL

t-GRASTA Proposed

00.2

0.4

0.6

0.8

1

1.2

1.4Subject X -t r a n s l a t i o n S T D

RASL

t-GRASTA Proposed

00.10.20.30.40.50.60.70.8Subject

Y -t r a n s l a t i o n S T D

RASL

t-GRASTA Proposed

(a)

(b)

(c)(d)

Figure 5.(a)An example input of the Multi-PIE image database.(b)The STD (in degrees)of rotations after registration.(c)The STD (in pixels)of X-translation after registration.(d)The STD (in pixels)of Y-translation after registration.

4.2.Pair image registration

4.2.1Simulations

For quantitative comparisons,we evaluate SSD,RC and the proposed method on the Lena image with random intensity distortions (Fig.2)and random af?ne transformations (with a similar range as the previous settings).The number of Gaussian intensity ?elds K is from 1to 6.The reference image without intensity distortions is used as ground-truth.The root-mean-square error (RMSE)is used as the metric for error evaluation of both image intensities and transfor-mations.We run this experiment 50times and the results are plotted in Fig.6.It can be observed that the proposed method is consistently more accurate than SSD and RC,with different intensity distortions.

01234567

?0.02

00.020.040.060.080.10.12

0.140.16

K I n t e n s i t y R M S E

SSD RC DTV

(a)01234567

0.2

0.40.60.811.21.41.61.82

K

T r a n s f o r m a t i o n R M S E

SSD RC DTV

(b)

Figure 6.Registration performance comparisons with random transformation perturbations and random intensity distortions.(a)Intensity RMSE on the Lena image.(b)Transformation (af?ne)RMSE on the Lena image.

4.2.2Multisensor remotely sensed image registration Multisensor image registration is a key preprocessing oper-ation in remote sensing,e.g.,for image fusion [5],change detection.The same land objects may be acquired at differ-ent times,under various illumination conditions by different sensors.Therefore,it is very possible that the input images have signi?cant dissimilarity in terms of intensity values.Here,we register a panchromatic image to a multispectral image acquired by IKONOS multispectral imaging satellite [21],which have been pre-registered at their capture resolu-tions.The multispectral image has four bands:blue,green,red and near-infrared,with 4meter resolution (Fig.7(a)).The Pan image has 1meter resolution (Fig.7(b)).The dif-ferent image resolutions make this problem more dif?cult.From the difference image in Fig.7(c),we can observe that there exists misalignment in the northwest direction.

We compare our method with SSD [23]and RC [17],and the results are shown in Fig.7(d)-(f).It is assumed that the true transformation is formed by pure translation.Although we do not have the ground-truth,from the difference image,it can be clearly observed that our method can reduce the

(b)(a)(c)(d)(e)

A

B

Figure 8.Registration of an aerial photograph and a digital orthophoto.From left to right,the images are:the reference image,the source image,the overlay by MATLAB,the overlay by RC,the overlay by our method.The second row shows the zoomed-in areas of streets A and B.Best viewed in ×2sized color pdf ?le.

(b)

(a)(f)

(e)(d)

(c)Figure 7.Registration of a multispectral image and a panchromatic image.(a)The reference image.(b)The source image.(c)The difference image before registration.(d)The difference image by SSD.(e)The difference image by RC.(f)The difference image by our method.Visible misalignments are highlighted by the yellow circles.Best viewed in ×2sized color pdf ?le.

misalignment.In contrast,SSD and RC are not able to ?nd better alignments than the preregistration method.

We register an aerial photograph to a digital orthophoto.

The reference image is the orthorecti?ed MassGIS georeg-istered orthophoto [2].The source image is a digital aerial photograph,which does not have any particular alignment or registration with respect to the earth.The input images and the results are shown in Fig.8.MATLAB uses man-ually selected control points for registration,while RC and our registrations are automatic.At the ?rst glance,all the methods obtain registration with good quality.A closer look shows that our method has higher accuracy than the others.In the source image,two lanes can be obviously observed in streets A and B.After registration and composition,street B in the result by MATLAB and street A in the result by RC are blurry due to the misalignment.Our method is robust to the local mismatches of vehicles.

5.Conclusion and discussion

In this paper,we have proposed a novel similarity mea-sure for robust and accurate image registration.It is moti-vated by the deep sparse representation of the optimally reg-istered images.The bene?t of the proposed method is three fold:(1)compared with existing approaches,it can handle severe intensity distortions and partial occlusions simulta-neously;(2)it can be used for registration of two images or a batch of images,with various types of transformations;(3)its low computational complexity makes it scalable to large datasets.We have conducted extensive experiments to test our method on multiple challenging datasets.The promis-ing results demonstrate the robustness and accuracy of our method over the state-of-the-art batch registration methods and pair registration methods,respectively.We also show that our method can be used to reduce the registration er-rors in many real-world applications.

Due to the local linearization in the optimization,our method as well as all the compared methods cannot handle large transformations.However,this is not a big issue for

many real-world applications.For example,the remotely sensed images can be coarsely georegistered by their geo-graphical coordinates.For images with large transforma-tions,we can use the FFT-based algorithm[25]to coarsely register the images and then apply our method as a re?ne-ment.Therefore,we did not test the maximum amount of transformations that our method can handle.So far,the pro-posed method can only be used for of?ine registration.How to extend this method to the online mode is an interesting topic of future research.

References

[1]https://www.wendangku.net/doc/3e12000373.html,/~vgg/research/af?ne/.5

[2]https://www.wendangku.net/doc/3e12000373.html,/help/images/register-an-aerial-

photograph-to-a-digital-orthophoto.html.7

[3]P.Blanc,L.Wald,T.Ranchin,et al.Importance and effect

of co-registration quality in an example of pixel to pixel fu-sion process.In2nd International Conference“Fusion of Earth Data:merging point measurements,raster maps and remotely sensed images”,pages67–74,1998.1

[4] E.J.Cand`e s,X.Li,Y.Ma,and J.Wright.Robust principal

component analysis?Journal of the ACM,58(3):11,2011.4 [5] C.Chen,Y.Li,W.Liu,and J.Huang.Image fusion with

local spectral consistency and dynamic gradient sparsity.In IEEE Conference on Computer Vision and Pattern Recogni-tion(CVPR),pages2760–2765,2014.6

[6] B.Cohen and I.Dinstein.New maximum likelihood mo-

tion estimation schemes for noisy ultrasound images.Pattern Recognition,35(2):455–463,2002.1,3,5

[7]R.Gross,I.Matthews,J.Cohn,T.Kanade,and S.Baker.

Multi-pie.Image and Vision Computing,28(5):807–813, 2010.6

[8]V.Hamy,N.Dikaios,S.Punwani,A.Melbourne,https://www.wendangku.net/doc/3e12000373.html,ti-

foltojar,J.Makanyanga,M.Chouhan,E.Helbren,A.Menys, S.Taylor,et al.Respiratory motion correction in dynamic mri using robust data decomposition registration–application to dce-mri.Medical image analysis,18(2):301–313,2014.2 [9]J.He,D.Zhang,L.Balzano,and T.Tao.Iterative grassman-

nian optimization for robust image alignment.Image and Vision Computing,32(10):800–813,2014.1,2,3,5 [10]J.Huang,X.Huang,and D.Metaxas.Simultaneous im-

age transformation and sparse representation recovery.In IEEE Conference on Computer Vision and Pattern Recogni-tion(CVPR),pages1–8,2008.1

[11]J.Huang,S.Zhang,H.Li,and https://www.wendangku.net/doc/3e12000373.html,posite split-

ting algorithms for convex https://www.wendangku.net/doc/3e12000373.html,puter Vision and Image Understanding,115(12):1610–1622,2011.3 [12]J.Huang,S.Zhang,and D.Metaxas.Ef?cient MR image

reconstruction for compressed MR imaging.Medical Image Analysis,15(5):670–679,2011.3

[13]J.Kim and J.A.Fessler.Intensity-based image registration

using robust correlation coef?cients.IEEE Transactions on Medical Imaging,23(11):1430–1444,2004.3,5

[14]Y.Li,C.Chen,J.Zhou,and J.Huang.Robust image regis-

tration in the gradient domain.In Proceedings of the Inter-national Symposium on Biomedical Imaging(ISBI).2015.3[15]J.Ma,W.Qiu,J.Zhao,Y.Ma,A.Yuille,and Z.Tu.Robust

L2E estimation of transformation for non-rigid registration.

IEEE Transactions on Signal Processing,63(5):1115–1129, 2015.1

[16]J.Ma,J.Zhao,J.Tian,A.L.Yuille,and Z.Tu.Robust point

matching via vector?eld consensus.IEEE Transactions on Image Processing,23(4):1706–1721,2014.1

[17] A.Myronenko and X.Song.Intensity-based image registra-

tion by minimizing residual complexity.IEEE Transactions on Medical Imaging,29(11):1882–1891,2010.1,2,3,5,6 [18] A.Myronenko,X.Song,and D.J.Sahn.Maximum like-

lihood motion estimation in3d echocardiography through non-rigid registration in spherical coordinates.In Functional Imaging and Modeling of the Heart,pages427–436.2009.

3,5

[19]Y.Peng,A.Ganesh,J.Wright,W.Xu,and Y.Ma.RASL:

Robust alignment by sparse and low-rank decomposition for linearly correlated images.IEEE Transactions on Pat-tern Analysis and Machine Intelligence,34(11):2233–2246, 2012.1,2,3,4,5

[20] A.Sotiras,C.Davatzikos,and N.Paragios.Deformable med-

ical image registration:A survey.IEEE Transactions on Medical Imaging,32(7):1153–1190,2013.1

[21]Space-Imaging.IKONOS scene po-37836.Geoeye IKONOS

Scene Data,2000.6

[22] C.Studholme,C.Drapaca,B.Iordanova,and V.Carde-

nas.Deformation-based mapping of volume change from serial brain MRI in the presence of local tissue contrast change.IEEE Transactions on Medical Imaging,25(5):626–639,2006.1

[23]R.Szeliski.Image alignment and stitching:A tutorial.Foun-

dations and Trends R in Computer Graphics and Vision, 2(1):1–104,2006.1,3,5,6

[24] C.Thomas,T.Ranchin,L.Wald,and J.Chanussot.Syn-

thesis of multispectral images to high spatial resolution:A critical review of fusion methods based on remote sensing physics.IEEE Transactions on Geoscience and Remote Sensing,46(5):1301–1312,2008.1

[25]G.Tzimiropoulos,V.Argyriou,S.Zafeiriou,and T.Stathaki.

Robust FFT-based scale-invariant image registration with image gradients.IEEE Transactions on Pattern Analysis and Machine Intelligence,32(10):1899–1906,2010.1,8 [26]P.Viola and W.M.Wells III.Alignment by maximization

of mutual information.International Journal of Computer Vision,24(2):137–154,1997.1,3,5

[27]Y.Wu,B.Shen,and H.Ling.Online robust image align-

ment via iterative convex optimization.In IEEE Conference on Computer Vision and Pattern Recognition(CVPR),pages 1808–1814,2012.1,2

[28]Y.Zheng,E.Daniel,A.A.Hunter III,R.Xiao,J.Gao,

H.Li,M.G.Maguire,D.H.Brainard,and https://www.wendangku.net/doc/3e12000373.html,nd-

mark matching based retinal image alignment by enforcing sparsity in correspondence matrix.Medical image analysis, 18(6):903–913,2014.1

[29] B.Zitova and J.Flusser.Image registration methods:a sur-

vey.Image and vision computing,21(11):977–1000,2003.

1

相关文档