文档库 最新最全的文档下载
当前位置:文档库 › Cross-Scale Cost Aggregation for Stereo Matching 2014 Cross-Scale Cost Aggregation forStereoMatching

Cross-Scale Cost Aggregation for Stereo Matching 2014 Cross-Scale Cost Aggregation forStereoMatching

Cross-Scale Cost Aggregation for Stereo Matching 2014 Cross-Scale Cost Aggregation forStereoMatching
Cross-Scale Cost Aggregation for Stereo Matching 2014 Cross-Scale Cost Aggregation forStereoMatching

Cross-Scale Cost Aggregation for Stereo Matching

Kang Zhang 1,Yuqiang Fang 2,Dongbo Min 3,Lifeng Sun 1,Shiqiang Yang 1,Shuicheng Yan 2,Qi Tian 4

1

Department of Computer Science,Tsinghua University,Beijing,China

2

Department of Electrical and Computer Engineering,National University of Singapore,Singapore

3

Advance Digital Science Center,Singapore

4

Department of

Computer Science,University of Texas at San Antonio,Texas,USA

https://https://www.wendangku.net/doc/1b13174234.html,/rookiepig/CrossScaleStereo

Abstract

Human beings process stereoscopic correspondence across multiple scales.However,this bio-inspiration is ignored by state-of-the-art cost aggregation methods for dense stereo correspondence.In this paper,a generic cross-scale cost aggregation framework is proposed to al-low multi-scale interaction in cost aggregation.We ?rstly reformulate cost aggregation from a uni?ed optimization perspective and show that different cost aggregation meth-ods essentially differ in the choices of similarity kernels.Then,an inter-scale regularizer is introduced into optimiza-tion and solving this new optimization problem leads to the proposed framework.Since the regularization term is inde-pendent of the similarity kernel,various cost aggregation methods can be integrated into the proposed general frame-work.We show that the cross-scale framework is important as it effectively and ef?ciently expands state-of-the-art cost aggregation methods and leads to signi?cant improvements,when evaluated on Middlebury,KITTI and New Tsukuba datasets.

1.Introduction

Dense correspondence between two images is a key

problem in computer vision [12].Adding a constraint that the two images are a stereo pair of the same scene,the dense correspondence problem degenerates into the stereo matching problem [23].A stereo matching algorithm gen-erally takes four steps:cost computation ,cost (support)ag-gregation ,disparity computation and disparity re?nement [23].In cost computation,a 3D cost volume (also known as disparity space image [23])is generated by computing matching costs for each pixel at all possible disparity lev-els.In cost aggregation,the costs are aggregated,enforcing piecewise constancy of disparity,over the support region of each pixel.Then,disparity for each pixel is computed with local or global optimization methods and re?ned by vari-Conventional Cost Aggregation

Cost Computation at Different Scales

x L

x

L

x

(BF)(GF)

(ST)(NL)

Cross-Scale Cost Aggregation

?

Scan-line Subsegment ?

Figure 1.Cross-Scale Cost Aggregation.Top-Left:enlarged-view of a scan-line subsegment from Middlebury [23]Teddy stereo pair;Top-Right:cost volumes ({C s }S s =0)after cost computa-tion at different scales,where the intensity +gradient cost func-tion is adopted as in [21,33,16].Horizontal axis x indicates different pixels along the subsegment,and vertical axis L repre-sents different disparity labels.Red dot indicates disparity gener-ated by current cost volume while green dot is the ground truth;Bottom-Right:cost volumes after applying different cost aggre-gation methods at the ?nest scale (from top to bottom:NL [33],ST [16],BF [36]and GF [21]);Bottom-Left:cost volumes after integrating different methods into our cross-scale cost aggregation framework,where cost volumes at different scales are adopted for aggregation.(Best viewed in color.)

ous post-processing methods in the last two steps respec-tively.Among these steps,the quality of cost aggregation has a signi?cant impact on the success of stereo algorithms.It is a key ingredient for state-of-the-art local algorithms [36,21,33,16]and a primary building block for some top-performing global algorithms [34,31].Therefore,in this paper,we mainly concentrate on cost aggregation .Most cost aggregation methods can be viewed as joint ?ltering over the cost volume [21].Actually,even simple linear image ?lters such as box or Gaussian ?lter can be used for cost aggregation,but as isotropic diffusion ?lters,they tend to blur the depth boundaries [23].Thus,a num-ber of edge-preserving ?lters such as bilateral ?lter [28]and

1

a r X i v :1403.0316v 1 [c s .C V ] 3 M a r 2014

guided image?lter[7]were introduced for cost aggrega-tion.Yoon and Kweon[36]adopted the bilateral?lter into cost aggregation,which generated appealing disparity maps on the Middlebury dataset[23].However,their method is computationally expensive,since a large kernel size(e.g. 35×35)is typically used for the sake of high disparity ac-curacy.To address the computational limitation of the bilat-eral?lter,Rhemann et al.[21]introduced the guided image ?lter into cost aggregation,whose computational complex-ity is independent of the kernel size.Recently,Yang[33] proposed a non-local cost aggregation method,which ex-tends the kernel size to the entire image.By computing a minimum spanning tree(MST)over the image graph,the non-local cost aggregation can be performed extremely fast. Mei et al.[16]followed the non-local cost aggregation idea and showed that by enforcing the disparity consistency us-ing segment tree instead of MST,better disparity maps can be achieved than[33].

All these state-of-the-art cost aggregation methods have made great contributions to stereo vision.A common prop-erty of these methods is that costs are aggregated at the ?nest scale of the input stereo images.However,human be-ings generally process stereoscopic correspondence across multiple scales[17,15,14].According to[14],information at coarse and?ne scales is processed interactively in the correspondence search of the human stereo vision system. Thus,from this bio-inspiration,it is reasonable that costs should be aggregated across multiple scales rather than the ?nest scale as done in conventional methods(Figure1).

In this paper,a general cross-scale cost aggregation framework is proposed.Firstly,inspired by the formulation of image?lters in[18],we show that various cost aggre-gation methods can be formulated uniformly as weighted least square(WLS)optimization problems.Then,from this uni?ed optimization perspective,by adding a Generalized Tikhonov regularizer into the WLS optimization objective, we enforce the consistency of the cost volume among the neighboring scales,i.e.inter-scale consistency.The new op-timization objective with inter-scale regularization is con-vex and can be easily and analytically solved.As the intra-scale consistency of the cost volume is still maintained by conventional cost aggregation methods,many of them can be integrated into our framework to generate more robust cost volume and better disparity map.Figure1shows the effect of the proposed framework.Slices of the cost vol-umes of four representative cost aggregation methods,in-cluding the non-local method[33](NL),the segment tree method[16](ST),the bilateral?lter method[36](BF)and the guided?lter method[21](GF),are visualized.We use red dots to denote disparities generated by local winner-take-all(WTA)optimization in each cost volume and green dots to denote ground truth disparities.It can be found that more robust cost volumes and more accurate disparities are produced by adopting cross-scale cost aggregation.Exten-sive experiments on Middlebury[23],KITTI[4]and New Tsukuba[20]datasets also reveal that better disparity maps can be obtained using cross-scale cost aggregation.In sum-mary,the contributions of this paper are three folds:

?A uni?ed WLS formulation of various cost aggrega-tion methods from an optimization perspective.

?A novel and effective cross-scale cost aggregation framework.

?Quantitative evaluation of representative cost aggrega-tion methods on three datasets.

The remainder of this paper is organized as follows.In Section2,we summarize the related work.The WLS for-mulation for cost aggregation is given in Section3.Our inter-scale regularization is described in Section4.Then we detail the implementation of our framework in Section5. Finally experimental results and analyses are presented in Section6and the conclusive remarks are made in Section7.

2.Related Work

Recent surveys[9,29]give suf?cient comparison and analysis for various cost aggregation methods.We refer readers to these surveys to get an overview of different cost aggregation methods and we will focus on stereo matching methods involving multi-scale information,which are very relevant to our idea but have substantial differences.

Early researchers of stereo vision adopted the coarse-to-?ne(CTF)strategy for stereo matching[15].Disparity of a coarse resolution was assigned?rstly,and coarser dispar-ity was used to reduce the search space for calculating?ner disparity.This CTF(hierarchical)strategy has been widely used in global stereo methods such as dynamic program-ming[30],semi-global matching[25],and belief propaga-tion[3,34]for the purpose of accelerating convergence and avoiding unexpected local minima.Not only global meth-ods but also local methods adopt the CTF strategy.Un-like global stereo methods,the main purpose of adopting the CTF strategy in local stereo methods is to reduce the search space[35,11,10]or take the advantage of multi-scale related image representations[26,27].While,there is one exception in local CTF approaches.Min and Sohn [19]modeled the cost aggregation by anisotropic diffusion and solved the proposed variational model ef?ciently by the multi-scale approach.The motivation of their model is to denoise the cost volume which is very similar with us,but our method enforces the inter-scale consistency of cost vol-umes by regularization.

Overall,most CTF approaches share a similar property. They explicitly or implicitly model the disparity evolution process in the scale space[27],i.e.disparity consistency across multiple scales.Different from previous CTF meth-ods,our method models the evolution of the cost volume in

intra-scale consistency

inter-scale consistency

? ?

?

? ?

?

? ?

?

?

?

?

?

Figure 2.The ?owchart of cross-scale cost aggregation:{?C s }S s =0

is obtained by utilizing a set of input cost volumes,{C s }S s =0,to-gether.Corresponding variables {i s }S s =0,{j s }S

s =0and {l s }S s =0are visualized.The blue arrow represents an intra-scale consis-tency (commonly used in the conventional cost aggregation ap-proaches),while the green dash arrow denotes an inter-scale con-sistency.(Best viewed in color.)

the scale space,i.e.cost volume consistency across multi-ple scales.From optimization perspective,CTF approaches narrow down the solution space,while our method does not alter the solution space but adds inter-scale regularization into the optimization objective.Thus,incorporating multi-scale prior by regularization is the originality of our ap-proach.Another point worth mentioning is that local CTF approaches perform no better than state-of-the-art cost ag-gregation methods [10,11],while our method can signi?-cantly improve those cost aggregation methods [21,33,16].

3.Cost Aggregation as Optimization

In this section,we show that the cost aggregation can be formulated as a weighted least square optimization prob-lem.Under this formulation,different choices of similarity kernels [18]in the optimization objective lead to different cost aggregation methods.

Firstly,the cost computation step is formulated as a func-tion f :R W ×H ×3×R W ×H ×3→R W ×H ×L ,where W ,H are the width and height of input images,3represents color channels and L denotes the number of disparity lev-els.Thus,for a stereo color pair:I ,I ∈R W ×H ×3,by applying cost computation:

C =f (I ,I ),

(1)

we can get the cost volume C ∈R W ×H ×L ,which repre-sents matching costs for each pixel at all possible disparity levels.For a single pixel i =(x i ,y i ),where x i ,y i are pixel locations,its cost at disparity level l can be denoted as a scalar,C (i,l ).Various methods can be used to compute the cost volume.For example,the intensity +gradient cost

function [21,33,16]can be formulated as:

C (i,l )=

(1?α)·min( I (i )?I (i l ) ,τ1)+α·min( ?x I (i )??x I (i l ) ,τ2).(2)

Here I (i )denotes the color vector of pixel i .?x is the

grayscale gradient in x direction.i l is the corresponding pixel of i with a disparity l ,i.e.i l =(x i ?l,y i ).αbalances the color and gradient terms and τ1,τ2are truncation values.The cost volume C is typically very noisy (Figure 1).Inspired by the WLS formulation of the denoising problem [18],the cost aggregation can be formulated with the noisy input C as:

?C (i,l )=arg min z

1Z i j ∈N i

K (i,j ) z ?C (j,l ) 2,(3)

where N i de?nes a neighboring system of i .K (i,j )is the similarity kernel [18],which measures the similarity be-tween pixels i and j ,and ?C

is the (denoised)cost volume.Z i = j ∈N i K (i,j )is a normalization constant.The solu-tion of this WLS problem is:

?C (i,l )=1Z i

j ∈N i

K (i,j )C (j,l ).(4)

Thus,like image ?lters [18],a cost aggregation method corresponds to a particular instance of the similarity kernel.For example,the BF method [36]adopted the spatial and photometric distances between two pixels to measure the similarity,which is the same as the kernel function used in the bilateral ?lter [28].Rhemann et al .[21](GF )adopted the kernel de?ned in the guided ?lter [7],whose computa-tional complexity is independent of the kernel size.The NL method [33]de?nes a kernel based on a geodesic distance between two pixels in a tree structure.This approach was further enhanced by making use of color segments,called a segment-tree (ST )approach [16].A major difference be-tween ?lter-based [36,21]and tree-based [33,16]aggrega-tion approaches is the action scope of the similarity kernel,i.e.N i in Equation (4).In ?lter-based methods,N i is a local window centered at i ,but in tree-based methods,N i is a whole image.Figure 1visualizes the effect of differ-ent action scope.The ?lter-based methods hold some local similarity after the cost aggregation,while tree-based meth-ods tend to produce hard edges between different regions in the cost volume.

Having shown that representative cost aggregation meth-ods can be formulated within a uni?ed framework,let us recheck the cost volume slices in Figure 1.The slice,com-ing from Teddy stereo pair in the Middlebury dataset [24],consists of three typical scenarios:low-texture,high-texture and near textureless regions (from left to right).The four state-of-the-art cost aggregation methods all perform very well in the high-texture area,but most of them fail in either

low-texture or near textureless region.For yielding highly accurate correspondence in those low-texture and near tex-tureless regions,the correspondence search should be per-formed at the coarse scale[17].However,under the for-mulation of Equation(3),costs are always aggregated at the ?nest scale,making it impossible to adaptively utilize infor-mation from multiple scales.Hence,we need to reformu-late the WLS optimization objective from the scale space perspective.

4.Cross-Scale Cost Aggregation Framework

It is straightforward to show that directly using Equa-tion(3)to tackle multi-scale cost volumes is equivalent to performing cost aggregation at each scale separately. Firstly,we add a superscript s to C,denoting cost vol-umes at different scales of a stereo pair,as C s,where s∈{0,1,...,S}is the scale parameter.C0represents the cost volume at the?nest scale.The multi-scale cost volume C s is computed using the downsampled images with a fac-tor ofηs.Note that this approach also reduces the search range of the disparity.The multi-scale version of Equa-tion(3)can be easily expressed as:

?v=arg min

{z s}S

s=0

S

s=0

1

Z s

i s

j s∈N i s

K(i s,j s) z s?C s(j s,l s) 2.

(5)

Here,Z s i s=

j s∈N i s

K(i s,j s)is a normalization con-

stant.{i s}S s=0and{l s}S s=0denote a sequence of corre-sponding variables at each scale(Figure2),i.e.i s+1=i s/ηand l s+1=l s/η.N i s is a set of neighboring pixels on the s th scale.In our work,the size of N i s remains the same for all scales,meaning that more amount of smooth-ing is enforced on the coarser scale.We use the vector ?v=[?C0(i0,l0),?C1(i1,l1),···,?C S(i S,l S)]T with S+1 components to denote the aggregated cost at each scale.The solution of Equation(5)is obtained by performing cost ag-gregation at each scale independently as follows:

?s,?C s(i s,l s)=

1

Z s

i s

j s∈N i s

K(i s,j s)C s(j s,l s).(6)

Previous CTF approaches typically reduce the disparity search space at the current scale by using a disparity map estimated from the cost volume at the coarser scale,often provoking the loss of small disparity details.Alternatively, we directly enforce the inter-scale consistency on the cost volume by adding a Generalized Tikhonov regularizer into Equation(5),leading to the following optimization objec-tive:

?v=arg min

{z s}S

s=0(

S

s=0

1

Z s

i s

j s∈N i s

K(i s,j s) z s?C s(j s,l s) 2

S

s=1

z s?z s?1 2),(7)

whereλis a constant parameter to control the strength of

regularization.Besides,similar with?v,the vector?v=

[?C0(i0,l0),?C1(i1,l1),···,?C S(i S,l S)]T also has S+1

components to denote the cost at each scale.The above

optimization problem is convex.Hence,we can get the so-

lution by?nding the stationary point of the optimization ob-

jective.Let F({z s}S s=0)represent the optimization objec-

tive in Equation(7).For s∈{1,2,...,S?1},the partial

derivative of F with respect to z s is:

?F

?z s

=

2

Z s

i s

j s∈N i s

K(i s,j s)(z s?C s(j s,l s))

+2λ(z s?z s?1)?2λ(z s+1?z s)

=2(?λz s?1+(1+2λ)z s?λz s+1??C s(i s,l s)).(8)

Setting?F

?z s

=0,we get:

?λz s?1+(1+2λ)z s?λz s+1=?C s(i s,l s).(9)

It is easy to get similar equations for s=0and s=S.

Thus,we have S+1linear equations in total,which can be

expressed concisely as:

A?v=?v.(10)

The matrix A is an(S+1)×(S+1)tridiagonal constant ma-

trix,which can be easily derived from Equation(9).Since

A is tridiagonal,its inverse always exists.Thus,

?v=A?1?v.(11)

The?nal cost volume is obtained through the adaptive com-

bination of the results of cost aggregation performed at dif-

ferent scales.Such adaptive combination enables the multi-

scale interaction of the cost aggregation in the context of

optimization.

Finally,we use an example to show the effect of inter-

scale regularization in Figure3.In this example,without

cross-scale cost aggregation,there are similar local minima

in the cost vector,yielding erroneous https://www.wendangku.net/doc/1b13174234.html,rma-

tion from the?nest scale is not enough but when inter-scale

regularization is adopted,useful information from coarse

scales reshapes the cost vector,generating disparity closer

to the ground truth.

5.Implementation and Complexity

To build cost volumes for different scales(Figure2),we

need to extract stereo image pairs at different scales.In

our implementation,we choose the Gaussian Pyramid[2],

which is a classical representation in the scale space theory.

The Gaussian Pyramid is obtained by successive smoothing

and subsampling(η=2).One advantage of this represen-

tation is that the image size decreases exponentially as the

scale level increases,which reduces the computational cost

of cost aggregation on the coarser scale exponentially.

S+NL Teddy

NL

Disparity Level

Figure 3.The effect of inter-scale regularization.On the right side,we visualize two cost vectors of a single pixel (pixel loca-tion (295,49))of Teddy stereo pair.The blue line denotes the cost vector computed by NL [33]method.The green line is the cost vector after applying cross-scale cost aggregation (S+NL ).The red cross represents the minimal cost location for each cost vector and the vertical dash line denotes the ground truth disparity.On the left side,image and disparity patches centering on this pixel are shown.(Best viewed in color.)

Algorithm 1Cross-Scale Cost Aggregation Input:Stereo Color Image I ,I .

1.Build Gaussian Pyramid I s ,I s

,s ∈{0,1,...,S }.2.Generate initial cost volume C s for each scale by cost computation according to Equation (1).3.Aggregate costs at each scale separately according

to Equation (6)to get cost volume ?C s .4.Aggregate costs across multiple scales according to

Equation (11)to get ?nal cost volume ?C

s .Output:Robust cost volume:?C

0.The basic work?ow of the cross-scale cost aggregation

is shown in Algorithm 1,where we can utilize any existing cost aggregation method in Step 3.The computational com-plexity of our algorithm just increases by a small constant factor,compared to conventional cost aggregation methods.Speci?cally,let us denote the computational complexity for conventional cost aggregation methods as O (mW HL ),where m differs with different cost aggregation methods.The number of pixels and disparities at scale s are W H 4s and L 2s

respectively.Thus the computational complexity of Step 3increases at most by 17,compared to conventional cost aggregation methods,as explained below:

S s =0 m W HL 8s ≤lim S →∞ S s =0

mW HL 8s =8

7mW HL.(12)

Step 4involves the inversion of the matrix A with a size of (S +1)×(S +1),but A is a spatially invariant matrix,with each row consisting of at most three nonzero elements,and thus its inverse can be pre-computed.Also,in Equa-tion (11),the cost volume on the ?nest scale,?C

0(i 0,l 0),is used to yield a ?nal disparity map,and thus we need to

compute only

?C 0

(i 0

,l 0

)=S s =0

A ?1(0,s )?C

s (i s ,l s ),(13)

not ?v =A ?1?v .This cost aggregation across multiple scales

requires only a small amount of extra computational load.In the following section,we will analyze the runtime ef?-ciency of our method in more details.

6.Experimental Result and Analysis

In this section,we use Middlebury [23],KITTI [4]and New Tsukuba [20]datasets to validate that when integrat-ing state-of-the-art cost aggregation methods,such as BF [36],GF [21],NL [33]and ST [16],into our framework,there will be signi?cant performance improvements.Fur-thermore,we also implement the simple box ?lter aggrega-tion method (named as BOX ,window size is 7×7)to serve as a baseline,which also becomes very powerful when inte-grated into our framework.For NL and ST ,we directly use the C++codes provided by the authors 1,2,and thus all the parameter settings are identical to those used in their imple-mentations.For GF ,we implemented our own C++code by referring to the author-provided software (implemented in MATLAB 3)in order to process high-resolution images from KITTI and New Tsukuba datasets ef?ciently.For BF ,we implemeted the asymmetric version as suggested by [9].The local WTA strategy is adopted to generate a disparity map.In order to compare different cost aggregation meth-ods fairly,no disparity re?nement technique is employed,unless we explicitly declare.S is set to 4,i.e.totally ?ve scales are used in our framework.For the regularization pa-rameter λ,we set it to 0.3for the Middlebury dataset,while setting it to 1.0on the KITTI and New Tsukuba datasets for more regularization,considering these two datasets contain a large portion of textureless regions.

6.1.Middlebury Dataset

The Middlebury benchmark [24]is a de facto standard for comparing existing stereo matching algorithms.In the benchmark [24],four stereo pairs (Tsukuba ,Venus ,Teddy ,Cones )are used to rank more than 100stereo matching al-gorithms.In our experiment,we adopt these four stereo pairs.In addition,we use ‘Middlebury 2005’[22](6stereo pairs)and ‘Middlebury 2006’[8](21stereo pairs)datasets,which involve more complex scenes.Thus,we have 31stereo pairs in total,denoted as M31.It is worth mentioning that during our experiments,all local cost aggregation meth-ods perform rather bad (error rate of non-occlusion (non-occ )area is more than 20%)in 4stereo pairs from Middle-1https://www.wendangku.net/doc/1b13174234.html,.hk/~qiyang/publications/cvpr-12/code/2https://www.wendangku.net/doc/1b13174234.html,/resource/page/segment-tree.html 3https://www.ims.tuwien.ac.at/publications/tuw-202088

(a)BOX(14.23%

)(b)NL(8.60%

)(c)ST(9.78%

)(d)BF(10.24%

)(e)GF(8.25%

)

(f)S+BOX(11.18%

)(g)S+NL(5.74%

)(h)S+ST(6.22%

)(i)S+BF(8.17%

)(j)S+GF(6.99%) Figure4.Disparity maps of Teddy for all cost aggregation methods(with no disparity re?nement techniques).The non-occ error rate is shown in each subtitle.Red pixels indicate erroneous pixels,where the absolute disparity error is larger than1.(Best viewed in color.) bury2006dataset,i.e.Midd1,Midd2,Monopoly and Plas-

tic.A common property of these4stereo pairs is that they

all contain large textureless regions,making local stereo

methods fragile.In order to alleviate bias towards these

four stereo pairs,we exclude them from M31to generate

another collection of stereo pairs,which we call M27.We

make statistics on both M31and M27(Table1).We adopt

the intensity+gradient cost function in Equation(2),which

is widely used in state-of-the-art cost aggregation methods

[21,16,33].

In Table1,we show the average error rates of non-occ

region for different cost aggregation methods on both M31

and M27datasets.We use the pre?x‘S+’to denote the

integration of existing cost aggregation methods into cross-

scale cost aggregation framework.Avg Non-occ is an aver-

age percentage of bad matching pixels in non-occ regions,

where the absolute disparity error is larger than1.The re-

sults are encouraging:all cost aggregation methods see an

improvement when using cross-scale cost aggregation,and

even the simple BOX method becomes very powerful(com-

parable to state-of-the-art on M27)when using cross-scale

cost aggregation.Disparity maps of Teddy stereo pair for

all these methods are shown in Figure4,while others are

shown in the supplementary material due to space limit.

Furthermore,to follow the standard evaluation metric of

the Middlebury benchmark[24],we show each cost aggre-

gation method’s rank on the website(as of October2013)

in Table1.Avg Rank and Avg Err indicate the aver-

age rank and error rate measured using Tsukuba,Venus,

Teddy and Cones images[24].Here each method is com-

bined with the state-of-the-art disparity re?nement tech-

nique from[33](For ST[16],we list its original rank re-

ported in the Middlebury benchmark[24],since the same

results was not reproduced using the author’s C++code).

The rank also validates the effectiveness of our framework.

Method

Avg Non-occ(%)Avg Avg Time

M31M27Rank Err(%)(s)

BOX15.4510.759.6 6.20.11

S+BOX13.098.5551.9 5.930.15

NL[33]12.229.4441.2 5.480.29

S+NL11.498.7339.4 5.20.37

ST[16]11.528.9531.6 5.350.2

S+ST10.518.0727.9 4.970.29

BF[36]12.268.7748.1 5.8960.53

S+BF10.958.0440.7 5.5670.62

GF[21]10.5 6.8440.5 5.64 1.16

S+GF9.39 6.2037.7 5.51 1.32

Table1.Quantitative evaluation of cost aggregation methods on

the Middlebury dataset.The pre?x‘S+’denotes our cross-scale

cost aggregation framework.For the rank part(column4and5),

the disparity results were re?ned with the same disparity re?ne-

ment technique[33].

We also reported the running time for Tsukuba stereo pair

on a PC with a2.83GHz CPU and8GB of memory.As

mentioned before,the computational overhead is relatively

small.To be speci?c,it consists of the cost aggregation

of?C s(s∈{0,1,···,S})and the computation of Equa-

tion(13).

6.2.KITTI Dataset

The KITTI dataset[4]contains194training image pairs

and195test image pairs for evaluating stereo matching al-

gorithms.For the KITTI dataset,image pairs are captured

under real-world illumination condition and almost all im-

age pairs have a large portion of textureless regions,e.g.

walls and roads[4].During our experiment,we use the

whole194training image pairs with ground truth disparity

maps available.The evaluation metric is the same as the

KITTI benchmark[5]with an error threshold3.Besides,

Method Out-Noc Out-All Avg-Noc Avg-All BOX22.51%24.28%12.18px12.95px S+BOX12.06%14.07% 3.54px 4.57px NL[33]24.69%26.38% 4.36px 5.54px S+NL25.41%27.08% 4.00px 5.20px ST[16]24.09%25.81% 4.31px 5.47px S+ST24.51%26.22% 3.82px 5.02px GF[21]12.50%14.51% 4.64px 5.69px S+GF9.66%11.73% 2.19px 3.36px

Table2.Quantitative comparison of cost aggregation methods on KITTI dataset.Out-Noc:percentage of erroneous pixels in non-occluded areas;Out-All:percentage of erroneous pixels in total; Avg-Noc:average disparity error in non-occluded areas;Avg-All: average disparity error in total.

Method Out-Noc Out-All Avg-Noc Avg-All

BOX31.08%37.70%7.37px10.72px

S+BOX18.82%26.50% 3.92px7.44px

NL[33]21.88%26.72% 4.12px 6.40px

S+NL19.84%24.50% 3.65px 5.73px

ST[16]21.68%27.07% 4.33px7.02px

S+ST18.99%24.16% 3.60px 5.96px

GF[21]23.42%30.34% 6.35px9.86px

S+GF14.40%21.78% 3.10px 6.38px Table3.Quantitative comparison of cost aggregation methods on New Tsukuba dataset.

since BF is too slow for high resolution images(requiring more than one hour to process one stereo pair),we omit BF from evaluation.

Considering the illumination variation on the KITTI dataset,we adopt Census Transform[37],which is proved to be powerful for robust optical?ow computation[6].We show the performance of different methods when integrated into cross-scale cost aggregation in Table2.Some interest-ing points are worth noting.Firstly,for BOX and GF,there are signi?cant improvements when using cross-scale cost aggregation.Again,like the Middlebury dataset,the simple BOX method becomes very powerful by using cross-scale cost aggregation.However,for S+NL and S+ST,their per-formances are almost the same as those without cross-scale cost aggregation,which are even worse than that of S+BOX. This may be due to the non-local property of tree-based cost aggregation methods.For textureless slant planes, e.g.roads,tree-based methods tend to overuse the piecewise constancy assumption and may generate erroneous fronto-parallel planes.Thus,even though the cross-scale cost ag-gregation is adopted,errors in textureless slant planes are not fully addressed.Disparity maps for all methods are pre-sented in the supplementary material,which also validate our analysis.

A

v

e

r

a

g

e

n

o

n

-

o

c

c

e

r

r

o

r

(

%

)

Regularization paramter

S+BOX S+NL S+ST S+BF S+GF

Figure5.The effect of varying inter-scale regularization parameter for different methods.

6.3.New Tsukuba Dataset

The New Tsukuba Dataset[20]contains1800stereo pairs with ground truth disparity maps.These pairs consist of a one minute photorealistic stereo video,generated by moving a stereo camera in a computer generated3D scene. Besides,there are4different illumination conditions:Day-light,Fluorescent,Lamps and Flashlight.In our experi-ments,we use the Daylight scene,which has a challenging real world illumination condition[20].Since neighboring frames usually share similar scenes,we sample the1800 frames every second to get a subset of60stereo pairs,which saves the evaluation time.We test both intensity+gradient and Census Transform cost functions,and intensity+gradi-ent cost function gives better results in this dataset.Dispar-ity level of this dataset is the same as the KITTI dataset,i.e. 256disparity levels,making BF[36]too slow,so we omit BF from evaluation.

Table3shows evaluation results for different cost ag-gregation methods on New Tsukuba dataset.We use the same evaluation metric as the KITTI benchmark[5](error threshold is3).Again,all cost aggregation methods see an improvement when using cross-scale cost aggregation.

6.4.Regularization Parameter Study

The key parameter in Equation(7)is the regularization parameterλ.By adjusting this parameter,we can control the strength of inter-scale regularization as shown in Fig-ure5.The error rate is evaluated on M31.Whenλis set to 0,inter-scale regularization is prohibited,which is equiva-lent to performing cost aggregation at the?nest scale.When regularization is introduced,there are improvements for all methods.Asλbecomes large,the regularization term dom-inates the optimization,causing the cost volume of each scale to be purely identical.As a result,?ne details of dis-parity maps are missing and error rate increases.One may note that it will generate better results by choosing differ-entλfor different cost aggregation methods,though we use consistentλfor all methods.

7.Conclusions and Future Work

In this paper,we have proposed a cross-scale cost ag-gregation framework for stereo matching.This paper is not intended to present a completely new cost aggregation method that yields a highly accurate disparity map.Rather, we investigate the scale space behavior of various cost ag-gregation methods.Extensive experiments on three datasets validated the effect of cross-scale cost aggregation.Almost all methods saw improvements and even the simple box?l-tering method combined with our framework achieved very good performance.

Recently,a new trend in stereo vision is to solve the cor-respondence problem in continuous plane parameter space rather than in discrete disparity label space[1,13,32]. These methods can handle slant planes very well and one probable future direction is to investigate the scale space behavior of these methods.

8.Acknowledgement

This work was supported by XXXXXXXXXXXX. References

[1]M.Bleyer,C.Rhemann,and C.Rother.PatchMatch stereo-stereo

matching with slanted support windows.In BMVC,2011.

[2]P.J.Burt.Fast?lter transform for image processing.CGIP,1981.

[3]P.Felzenszwalb and D.Huttenlocher.Ef?cient belief propagation for

early vision.In CVPR,2004.

[4] A.Geiger,P.Lenz,and R.Urtasun.Are we ready for autonomous

driving?The KITTI vision benchmark suite.In CVPR,2012. [5] A.Geiger,P.Lenz,and R.Urtasun.The KITTI Vision Bench-

mark Suite.https://www.wendangku.net/doc/1b13174234.html,/datasets/kitti/eval stereo?ow.

php?benchmark=stereo,2012.

[6] D.Hafner,O.Demetz,and J.Weickert.Why is the census transform

good for robust optic?ow computation?In SSVM,2013.

[7]K.He,J.Sun,and X.Tang.Guided image?ltering.In ECCV,2010.

[8]H.Hirschmuller and D.Scharstein.Evaluation of cost functions for

stereo matching.In CVPR,2007.

[9] A.Hosni,M.Bleyer,and M.Gelautz.Secrets of adaptive support

weight techniques for local stereo matching.CVIU,2013.

[10]W.Hu,K.Zhang,L.Sun,and https://www.wendangku.net/doc/1b13174234.html,parisons reducing for

local stereo matching using hierarchical structure.In ICME,2013.

[11]Y.-H.Jen,E.Dunn,P.Fite-Georgel,and J.-M.Frahm.Adaptive scale

selection for hierarchical stereo.In BMVC,2011.

[12] C.Liu,J.Yuen,and A.Torralba.SIFT?ow:dense correspondence

across scenes and its applications.TPAMI,2011.

[13]J.Lu,H.Yang,D.Min,and M.N.Do.Patch match?lter:ef?-

cient edge-aware?ltering meets randomized search for fast corre-spondence?eld estimation.In CVPR,2013.

[14]H.A.Mallot,S.Gillner,and P.A.Arndt.Is correspondence search

in human stereo vision a coarse-to-?ne process?Biological Cyber-netics,1996.

[15] D.Marr and T.Poggio.A computational theory of human stereo

vision.Proceedings of the Royal Society of London.Series B.Bio-logical Sciences,1979.

[16]X.Mei,X.Sun,W.Dong,H.Wang,and X.Zhang.Segment-tree

based cost aggregation for stereo matching.In CVPR,2013. [17]M.D.Menz and R.D.Freeman.Stereoscopic depth processing in

the visual cortex:a coarse-to-?ne mechanism.Nature neuroscience, 2003.[18]https://www.wendangku.net/doc/1b13174234.html,anfar.A tour of modern image?ltering:new insights and meth-

ods,both practical and theoretical.IEEE Signal Processing Maga-zine,2013.

[19] D.Min and K.Sohn.Cost aggregation and occlusion handling with

WLS in stereo matching.TIP,2008.

[20]M.Peris,A.Maki,S.Martull,Y.Ohkawa,and K.Fukui.Towards a

simulation driven stereo vision system.In ICPR,2012.

[21] C.Rhemann,A.Hosni,M.Bleyer,C.Rother,and M.Gelautz.

Fast cost-volume?ltering for visual correspondence and beyond.In CVPR,2011.

[22] D.Scharstein and C.Pal.Learning conditional random?elds for

stereo.In CVPR,2007.

[23] D.Scharstein and R.Szeliski.A taxonomy and evaluation of dense

two-frame stereo correspondence algorithms.IJCV,2002.

[24] D.Scharstein and R.Szeliski.Middlebury Stereo Vision Website.

https://www.wendangku.net/doc/1b13174234.html,/stereo/,2002.

[25]H.Simon and K.Reinhard.Evaluation of a new coarse-to-?ne strat-

egy for fast semi-global stereo matching.Advances in Image and Video Technology,2012.

[26]M.Sizintsev.Hierarchical stereo with thin structures and trans-

parency.In CCCRV,2008.

[27]L.Tang,M.K.Garvin,K.Lee,W.L.M.Alward,Y.H.Kwon,and

M.D.Abr`a moff.Robust multiscale stereo matching from fundus images with radiometric differences.TPAMI,2011.

[28] C.Tomasi and R.Manduchi.Bilateral?ltering for gray and color

images.In ICCV,1998.

[29] F.Tombari,S.Mattoccia,L.Di Stefano,and E.Addimanda.Classi-

?cation and evaluation of cost aggregation methods for stereo corre-spondence.In CVPR,2008.

[30]G.Van Meerbergen,M.Vergauwen,M.Pollefeys,and L.Van Gool.

A hierarchical symmetric stereo algorithm using dynamic program-

ming.IJCV,2002.

[31]Z.-F.Wang and Z.-G.Zheng.A region based stereo matching algo-

rithm using cooperative optimization.In CVPR,2008.

[32]K.Yamaguchi,D.McAllester,and R.Urtasun.Robust monocular

epipolar?ow estimation.In CVPR,2013.

[33]Q.Yang.A non-local cost aggregation method for stereo matching.

In CVPR,2012.

[34]Q.Yang,L.Wang,R.Yang,H.Stew′e nius,and D.Nist′e r.Stereo

matching with color-weighted correlation,hierarchical belief propa-gation,and occlusion handling.TPAMI,2009.

[35]R.Yang and M.Pollefeys.Multi-resolution real-time stereo on com-

modity graphics hardware.In CVPR,2003.

[36]K.-J.Yoon and I.S.Kweon.Adaptive support-weight approach for

correspondence search.TPAMI,2006.

[37]R.Zabih and J.Wood?ll.Non-parametric local transforms for com-

puting visual correspondence.In ECCV,1994.

相关文档