文档库 最新最全的文档下载
当前位置:文档库 › 单图像深度估计论文Wang_Towards_Unified_Depth_2015_CVPR_paper

单图像深度估计论文Wang_Towards_Unified_Depth_2015_CVPR_paper

单图像深度估计论文Wang_Towards_Unified_Depth_2015_CVPR_paper
单图像深度估计论文Wang_Towards_Unified_Depth_2015_CVPR_paper

Towards Uni?ed Depth and Semantic Prediction from a Single Image Peng Wang1Xiaohui Shen2Zhe Lin2Scott Cohen2Brian Price2Alan Yuille1 1University of California,Los Angeles2Adobe Research

Abstract

Depth estimation and semantic segmentation are two fundamental problems in image understanding.While the two tasks are strongly correlated and mutually bene?cial, they are usually solved separately or sequentially.Moti-vated by the complementary properties of the two tasks,we propose a uni?ed framework for joint depth and semantic prediction.Given an image,we?rst use a trained Convo-lutional Neural Network(CNN)to jointly predict a global layout composed of pixel-wise depth values and semantic labels.By allowing for interactions between the depth and semantic information,the joint network provides more ac-curate depth prediction than a state-of-the-art CNN trained solely for depth prediction[6].To further obtain?ne-level details,the image is decomposed into local segments for region-level depth and semantic prediction under the guid-ance of global layout.Utilizing the pixel-wise global pre-diction and region-wise local prediction,we formulate the inference problem in a two-layer Hierarchical Conditional Random Field(HCRF)to produce the?nal depth and se-mantic map.As demonstrated in the experiments,our ap-proach effectively leverages the advantages of both tasks and provides the state-of-the-art results.

1.Introduction

Depth estimation and semantic segmentation from a sin-gle image are two fundamental yet challenging tasks in computer vision.While they address different aspects in scene understanding,there exist strong consistencies among the semantic and geometric properties of image regions. When the information from one task is available,it would provide valuable prior knowledge to guide the other one.

In the depth estimation literature,semantic information has long been used as a high-level guidance[14,15,23, 11,29].Certain semantic classes have strong geometric implications.For example,the ground is usually a hor-izontal plane in a canonical view,while the building fa-cades are mostly vertical surfaces[14].However,these ap-proaches either assume the semantic labels are known[29], or perform semantic segmentation to generate the seman-tic labels[23].Since the two tasks are performed sequen-tially,the errors in the predicted semantic labels are in-

evitably propagated to the depth results.On the other hand, in semantic segmentation,with the increasing availability of RGBD data from additional depth sensors,many meth-ods use depth as another channel to regularize the segmen-tation[28,31,12]and have achieved much better perfor-mance than using RGB images alone.

Since the two tasks are mutually bene?cial,extensive in-vestigations have been done towards jointly solving them in videos[2,8,19,34],in which3D information can be easily obtained through structure from motion.However, the efforts in jointly tackling the two problems from a sin-gle image are preliminary[21],mostly because the infer-ence of both tasks are more ill-posed in a single image.It is not trivial to formulate the joint inference problem,in which the two tasks could bene?t each other.This paper is another step towards this direction.Unlike previous ap-proaches[21],in which the consistency between the seman-tic and geometric property is limited to local segments or objects,we propose a uni?ed framework to incorporate both global context from the whole image and local prediction from regions,through which the consistency between depth and semantic information is automatically learned through joint training.

Fig.1illustrates the framework of our approach.We for-mulate the joint inference problem in a two-layer Hierarchi-cal Conditional Random Field(HCRF).The unary poten-tials in the bottom layer are pixel-wise depth values and se-mantic labels,which are predicted by a Convolutional Neu-ral Network(CNN)trained globally from the whole image, while the unary potentials in the upper layer are region-wise depth and semantic maps,which come from another CNN-based regressor trained on local regions.The output of the global CNN,though coarse,provides very accurate global scale and semantic guidance,while the local regressor gives more details in depth and semantic boundaries.The mu-tual interactions between depth and semantic information are captured through the joint training of the CNNs,and are further enforced in the joint inference of HCRF.

We evaluated our method on the NYU v2dataset[31]on both depth estimation and semantic segmentation.By infer-ence using our joint global CNN,the depth prediction im-proves over the depth only CNN by an average8%relative 1

Segments Joint

Global CNN

Joint inference

Regional CNN

Vertical

Figure1.Framework of our approach for joint depth and semantic prediction.As described in Sec.1,given an image,we obtain region-wise and pixel-wise potential from a regional and a global CNN respectively.The?nal results are jointly inferred through the Hierarchical CRF.We keep the color legend consistent in the paper.

gain,and also outperforms the state-of-the-art.After incor-porating local predictions,the?nal depth maps produced by the HCRF are signi?cantly improved in terms of visual quality,with much clearer structures and boundaries.Mean-while in semantic segmentation,we further show that our joint approach outperforms R-CNN[10]that is currently known to be the most effective method for semantic seg-mentation,by10%relatively in average IOU.

To sum up,the contribution of this paper is three-fold: 1.We propose a uni?ed framework for joint depth and

semantic prediction from a single image.The consis-tency of the two tasks is learned through joint training, and enforced in different stages throughout the frame-work to boost the performance of both tasks.

2.We formulate the problem in a two-layer HCRF to en-

force synergy between global and local predictions, where the global layouts are used to guide the lo-cal predictions and reduce local ambiguities,while the local results provide detailed region structures and boundaries.

3.Through extensive evaluation,we demonstrate that

jointly addressing the two problems in our framework bene?ts both tasks,and achieves the state-of-the-art.

1.1.Related work

The literature of depth estimation and semantic segmen-tation is very rich when considering them as two indepen-dent tasks.Interestingly,though developed separately,the techniques used to solve the two tasks are quite similar. MRF-based approaches are common choices in semantic segmentation[1,36],while they have also been explored in depth prediction[30,14].Data-driven approaches based on non-parametric transfer are another popular trend in both scene parsing[5,32,33,35]and depth estimation[17,24]. Recently,CNN have shown its effectiveness in both tasks. In[6],a two-level CNN is learned to directly predict the depth maps,which signi?cantly outperforms the previous state-of-the-arts.Similar progress has also been achieved in semantic segmentation[3,7,10,4].Inspired by these work,we also use CNN to train our model for joint global and local prediction.

Noticing the correlations between the two problems, some methods try to use the information from one task to regularize the other.Nevertheless,the interaction be-tween the depth and semantic information is mostly a one-way channel in previous work.Several methods try to get-ter better semantic segmentation results given RGB-D data [28,31,12,13],while others take the predicted semantic labels to estimate depth[23,15].However,in order to solve one problem,these methods rely on either the ground-truth data,or an independent solution to the other problem.Their results therefore are heavily limited by the availability of the ground-truth data or the quality of the previous step.

While promising,the joint inference of these two tasks to to enforce consistency between them is an under-explored direction in the literature.In[11],the consistency between the geometric and semantic properties of segments are built, in which each semantic segment is also predicted to be one of the three geometric classes:horizontal,vertical,and sky. However,such a geometric classi?cation is still too coarse to produce an accurate depth map,and too loose to con-strain the semantic prediction.Moreover,the consistency between the two components is limited to local regions. Ladicky et.al[21]jointly train a canonical classi?er con-sidering both the loss from semantic and depth labels of the objects.However,they use local regions with hand-crafted features for prediction,which is only able to generate very coarse depth and semantic maps,with many local prediction distortions over large backgrounds.Unlike these methods, we capture the mutual information through joint training in a uni?ed framework,which captures more synergy between semantic and depth prediction.In addition,from a global

to local strategy,we achieve long range context to gener-ate global reasonable results while maintaining segments boundary information.Finally,our trained CNNs provide robust estimation under the large appearance variation of images and segments.As a result,our model achieves bet-ter results both quantitatively and qualitatively.

2.Formulation

As shown in previous image segmentation work[1,20], semantic inference should consider both short-range pixel-wise interactions and high-order context.Similarly,the con-sistency in depth and semantic prediction should also be en-forced both globally and locally.To this end,instead of a standard pixel-wise Conditional Random Field,we pro-pose a two-layer Hierarchical Conditional Random Field (HCRF)[1,20]to formulate the joint depth and semantic prediction problem.

As shown in Fig.1,our HCRF is composed of two lay-ers of nodes and edges.In the bottom layer,the nodes are the pixels in the image I.For each pixel i∈I,we would like to predict its depth value d i and semantic label l i.We use x i={d i,l i}to denote the inference output at pixel i. Meanwhile in the upper layer,we decompose image I to lo-cal segments,and use the segments to represent the nodes. Similarly,we would like to infer the depth and semantic labels y s={d s,l s}for each segment s∈S,where S denotes the set of segments after decomposition.We use R s to denote all the pixels inside segment s,and use X s to denote the predicted labels of R s.Apparently there are three kinds of edges in the HCRF,the pair-wise edges be-tween neighboring pixels,the edges between neighboring segments,and the edges connecting R s and s.Given such a model,the energy for minimization is formulated as:

min X

i∈I

ψi(x i)+λie

i,j∈I

ψi,j(x i,x j),

+λy min

Y

s∈S

ψs(X s,y s)+λce

s,t∈S

ψs,t(y s,y t)

,(1)

whereψi(x i)is the pixel-level unary potential in the bot-tom layer,ψi,j(x i,x j)is the pair-wise edge potential be-tween pixels,andψs,t(y s,y t)is the edge potential between segments in the upper layer.λy is a balancing parameter.In addition,the cross-layer potential termψs(X s,y s)usually could be further decomposed as:

ψs(X s,y s)=φs(y s)+

i∈R s

φs(y s,x i),(2)

whereφs(y s)is the unary potential of segments in the up-per layer,andφs(y s,x i)is the edge potential between seg-ment s and the pixel i inside segment s.

In our model,the potential terms introduced in Eqn.(1) and Eqn.(2)are de?ned as follows:

Unary potentials.As illustrated in Fig.1,the pixel-level potentialψi(x i)is provided by a CNN trained globally on the whole image,which jointly predicts pixel-wise depth values and probabilities of semantic labels.The details of the global CNN training and prediction will be introduced in Section3.Similarly,the segment-level potentialφs(y s) in Eqn.(2)is generated by a CNN-based regressor trained on local regions,with details described in Section4.

Edge potentials.For pixel-wise edge potentials,we only consider neighboring pixels,and de?ne

ψi,j(x i,x j)=1{l i=l j}(exp(?edge(i,j))+e d(d i,d j)),

(3)

where1{l i=l j}is a switching function which enables penalizing when the semantic labels of i and j are differ-ent.edge(·)is the output from a semantic edge detection method[22],and e d(d i,d j)=exp(?[ d i?d j 1?t d]+), where[x]+=max{x,0}represents the hinge loss.This term generally enforces pairwise smoothness,except when there is a strong semantic edge or possible depth discontinu-ity between i and j.The de?nition of e d(d i,d j)gives credit for assigning different labels when the depth difference is greater than a threshold t d.

For the segment-wise edge potentials,we only consider neighboring segments as well.For each segment s,we cal-culate the mean and variance of the pixel RGB values inside the segment to get its local appearance feature f s.Mean-while,for each pair of neighboring segments s and t,we cal-culate the geodesic distance between them dist g(s,t)based on the semantic edge map produced by[22].We then get the appearance-based distance between two segments:

dist a(s,t)=dist g(s,t)+λa f s?f t ,(4)

whereλa is a balancing weight.We also de?ne the depth-based distance between the two segments dist d(s,t)to be the average of pixel-wise depth difference within the over-lapping boundary areas of the two segments.Then the edge potential between two segmentsψs,t(y s,y t)is de?ned as:

ψs,t(y s,y t)=1{l s=l t}(exp(?dist a(s,t))+e d(s,t)),

+w(l s,l t)dist d(s,t),(5)

where e d(s,t)=exp(?[dist d(s,t)?t d]+)has the similar functionality as in Eqn.(3)that allows different semantic la-bels if the depth change between the two segments is large. w(l s,l t)is a smoothness weight matrix which is learned from the data,in which higher value of w(l s,l t)requires a higher depth smoothness between segments s,t when their semantic labels l s,l t are consistent,and vice versa.

For the cross-layer edge potentialsφs(y s,x i)between the segments and the pixels,we simply enforce consistency when the pixels are inside the segment,and have no con-straints if the pixels do not belong to the segment.

Given the above de?nition,we see that the pixel-level unary potentials encode coarse global layout,while the

Figure2.An example of the global network output.Middle:Depth

map.Right:Semantic probability map.

segment-level unary potentials focus on local region details.

The edge potentials incorporate the consistency between the

depth and semantic labels.Therefore through joint infer-

ence,our model is able to better exploit the interactions be-

tween global and local predictions,as well as between depth

and semantic information.We will describe the inference

procedure in details in Section5.

3.Joint Global Depth and Semantic Prediction

In this section,we describe how we train a CNN with

the whole image as input to predict pixel-wise depth and

semantic maps,which are used as the pixel-level unary po-

tentials in our HCRF model.

CNN has shown its effectiveness in predicting not only

discrete class labels[18]but also structured continuous

maps.In[6],with the use of ground-truth depth data,a

CNN is trained to directly predict a depth map using the

whole image as input,which achieves global context.In-

spired by this work,we extend it to a CNN that directly pre-

dicts pixel-wise depth values jointly with semantic labels

from the whole image.

We follow the CNN structure in[6]in the earlier layers.

However,in addition to the depth nodes in the?nal layer,

we further introduce semantic nodes to predict the seman-

tic labels.Formally,our loss function during the network

training is composed of two parts:

loss(X,X?)=

1

n

n

i=1

(log d i?log d?i)2+λl

?1

n

n

i=1

log(P(l?i)),

and P(l?i)=exp(z i,l?

i

)/

l i

exp(z i,l

i

),(6)

where d i and l i are the predicted depth values and se-

mantic labels,while d?i and l?i are the ground truth.z i,l

i

is

the output of the semantic node corresponding to pixel i.

Since the training data with ground truth semantic labels

are very limited compared with raw RGB-D data,we?rst

train the network to only predict depth values using RGB-

D training data(i.e.,drop the semantic nodes in the?nal

layer),and then?ne-tune the network with added semantic

nodes using the RGB-D data with available semantic labels.

Once trained,given an input image,the network will pre-

dict a depth map and a probability map of each pixel be-

longing to a semantic label.Since it is trained globally,

the predicted maps are quite coarse(Fig.2).Nevertheless,

they provide very accurate global scale and semantic lay-

out,which helps avoid prediction errors caused by local ap-

pearance ambiguities.Moreover,as will be shown in the

experiments,the joint-prediction network after?ne-tuning

provides more accurate depth maps than the network trained

to predict depth alone,which demonstrates that semantic in-

formation can regularize the CNN that bene?t depth predic-

tion.

We use d i to denote the depth value at pixel i predicted

by the global CNN,and use P(l i)to denote the predicted

probabilities of semantic labels at pixel i.The pixel-wise

unary term in Eqn.(1)can be written as:

ψi(x i)=?log(P(l i))+λi d i?d i 1.(7)

4.Joint Local Depth and Semantic Prediction

While the depth and semantic maps predicted by the

global CNN accurately capture the scene layout,they still

lack details in local regions.Therefore in order to recover

scene structures and object boundaries,we decompose the

image into segments by over-segmentation[25],and predict

the semantic label and depth map for each segment.The

predicted results are then used as the segment-level unary

potentials in our HCRF to complement the global results.

The training and prediction of depth and semantic la-

bels in local segments are not as straightforward as in the

global inference.First we need to?nd a proper way to rep-

resent the depth and semantic labels inside the segment,i.e.,

y s={d s,l s}in Sec.2.For semantic labels,we use the ma-

jority of the pixel-wise semantic labels to represent the seg-

ment label l s,which is a generally valid assumption.How-

ever for depth,it is too coarse to use a single depth value to

represent d s.Meanwhile,when cropping out the local seg-

ment from the image,the global scale information is lost,

and it is dif?cult to tell its absolute depth values by look-

ing at the segment alone.A more feasible task would be

to predict a relative depth trend inside the segment.There-

fore we transform the absolute depth map of the segment

to a normalized relative depth map by subtracting the abso-

lute depth value at the segment center d c and re-scaling it

to have range[0,1].Given the normalized depth map,the

depth value at center d c and the scale change sc,we can ex-

actly recover the absolute depth values of each pixel in the

segment d i=d n?sc+d c,where d n are the relative depth

values in the normalized depth map.Therefore,in the local

prediction stage,we would like to estimate the normalized

depth map of the segment,while[d c,sc]are two unknown

variables that we would infer in the HCRF.

4.1.Normalized Joint Templates

Even if we normalize the depth map of the segment,it

is still dif?cult to train a regressor from the image to the

map.This is because the depth of local segments is highly

ambiguous when solely judging from its local appearance.

Furniture

Object

Figure3.Examples of joint semantic and depth templates for local

segments.The normalized depth maps in each row are associated

with their corresponding semantic labels.

Nevertheless,the patterns in the depth maps of local seg-

ments are less diverse,e.g.often like a plane or a corner,

and therefore could be captured by a limited number of

templates.Thus,we formulate the local depth estimation

problem as a prediction of the composition from a set of

normalized templates,which largely constrains the learning

space.

To generate the templates,we use both the semantic and depth ground truth to ensure consistency.To avoid the dom-

inance of segments from a large semantic class,we?rst

cluster the segments according to their semantic labels.For

the segments with the same semantic label,we cluster their

normalized depth maps using L1distance metric to gener-

ate a set of templates.Fig.3illustrates a subset of our joint

templates,which provides meaningful patterns like a plane,

a corner or a curved surface.

4.2.Joint Template Regression

Given a segment s and a set of templates T j,we would like to learn the af?nities of this segment to the templates.

The af?nity during training is de?ned as:

a(s,T j)=1{l s=l T

j }S d(s,T j)/max

k

S d(s,T k)

S d(s,T j)=exp(? d s?d T

j

1).(8) where d denotes the values in the normalized depth maps. Intuitively,when the semantic labels are different,the af?n-ity of the segment to the template is zero,otherwise it is de-termined by the similarity of their normalized depth maps. We use CNN as the local training model as well,which takes the warped bounding box of the segments as input, with loss function de?ned as the sum of sigmoid cross en-tropy loss over the af?nities,i.e.:

l(a s,a?s)=?1

N t

N t

i=1

(a i log a?i+(1?a i)log(1?a?i)),

where a s=[a1,···,a N

t ]are the af?nities of segment s

to all the templates.Based on the loss de?ned,our local CNN is learned through?ne-tuning the global CNN in Sec-tion3.After regression,we choose top N(N=2in exper-iments)templates with the highest af?nities and aggregate their normalized depth values as well as the semantic labels to the segment with their af?nities as weights.The averaged results are the prediction of the normalized depth map and the probability of semantic labels of that segment.

The depth and semantic ambiguity caused by local seg-ment appearance is still a problem in template regression. Therefore we use three techniques to further reduce ambi-guity.First,the output of the global CNN in Sec.3gives us a very good global layout to regularize the local predic-tion.Second,masking out the background outside a seg-ment as in R-CNN[10],could reduce confusions when two segments share a same bounding box.Therefore,for a seg-ment,we take the fc6layer output of the local CNN both from its bounding box and masked region,and concate-nate it with the global prediction within the corresponding bounding box to form our feature vector.We train a Support Vector Regressor upon that feature to predict a segment’s af?nities to the templates.Third,the ambiguity of predic-tion will decrease when the segments are larger.Therefore instead of performing the regression on small segments pro-duced by over-segmentation,we cluster them to generate multi-scale large segments(30,50,100segments in three scales respectively).Consider that a small segment s is cov-ered by a larger segment s L,we can map the depth and se-mantic predictions of s L back to segment s.The?nal depth and semantic prediction of s is a weighted averaging of the results from multiple s L covering s.The details of getting larger segments is in our supplementary material.

Fig.4gives two predicted examples from the learned model.We can see our depth prediction is robust to im-age variations and does not depend on particular structures, and has the potential to overcome the dif?culties met in tra-ditional line and vanishing point detection methods[30].

Given the normalized depth map,as mentioned earlier, we can represent the depth values in the segment using two parameters:center depth d c and scale factor sc.We hereby de?ne the segment-level unary potentialφs(y s)as:

φs(y s)=?log(P(l s))+λd( d c?d gc 1+ sc?sc g 1)(9) where P(l s)is the predicted probability of semantic labels on segment s.d gc is the absolute depth from the global depth prediction at the segment center,and sc g is the depth scale from the global prediction within the segments bound-ing box.Intuitively,we want d c and sc to be close to the one predicted by global CNN,which can also be regarded as the message passed from the pixel-level potential.Once d c and sc are inferred,we can combine them with the normalized depth map to get the absolute depth for each pixel in this segment,which can be used to calculate the edge potentials between the segments in Eqn.(5),as well as to enforce the

Segment Ground truth Top 2 prediction

Object Object

Object Vertical

Vertical Vertical

Figure 4.Illustration of local prediction results from two dif?cult segments (located in the red box).Our prediction is robust to the complex scenario or even blurred cases.

consistency between the global pixel-wise prediction and local segment-level prediction.

5.Joint HCRF Inference

To do inference over the Joint HCRF,direct inference over the joint space of semantic label and depth through loopy belief propagation (LBP)[26]costs a long time for convergence.We consider a more ef?cient alternating opti-mization strategy by minimizing one when ?xing the other.Semantic inference given the depth.Given the esti-mated depth,we ?rst perform LBP to infer the semantic labels in the segment level,and then pass the predictions of local segments to their covering pixels.We then infer the labels in the pixel level,which can be solved through MAP.Depth inference given the semantic label .Similarly,we ?rst infer the depth variables in the segment-level,namely,the center depth d c and the scale factor sc .The inference of continuous depth variables are impractical for LBP.Thus,we quantize the center depth d c of a segment to be a set of discrete offsets (in our experiment,we set it to be 20uniformly distributed values within range [?r d ,r d ])from the respective value predicted in the global model,and the scale sc to be a shift of respective global scale (10in-tervals within [?r s ,r s ]and then truncate the values within the range [min sc ,max sc ]).Theoretically,our quantization follows the same spirit of particle belief propagation [27].In our experiments,our global predictions are already very good.Therefore we use the global prediction as our initialization,and perform 1iteration by ?rst estimating se-mantic labels and then predicting depth.It already produces the state-of-the-art results,and more iterations brings very little improvement in our experiments.To further accelerate the algorithm,we use graph cut to ef?ciently solve pixel-wise semantic labeling.In pixel-level depth inference,we ?nd the smoothness term makes little difference in the ?nal solution.Thus,the depth inference is reduced to a linear combine of global prediction and local prediction consid-ering the weight λy in Eqn.(1)which is very easy to learn through maximum likelihood using the ground truth depth.

6.Experiments

Data.We evaluate our method on the NYU v2dataset [31]which contains images taken by Kinect cam-

(b)

(a)Vertical

Ceiling

Object

Furniture

Ground

Figure 5.We map the detailed semantic classes in (a)to ?ve main semantic classes in (b).

era in 464indoor scenes.We use the of?cial train/test split,using 249scenes for training the global depth pre-diction.After evening the distribution (1200images per scene),the total number of depth images are 200K.The joint depth and semantic label set contains 1449images,and it is partitioned into 795training images and 654test-ing images.Due to the limited number of data,we also use images from the NYU v1dataset that are not overlapped with the 654testing images for training.There are 894annotated semantic labels in the dataset.In order to bet-ter ensuring consistency between depth maps and seman-tic labels with limited data,we mapped the semantic labels into 5categories conveying strong geometric properties,i.e.{Ground,V ertical,Ceiling,F urnitures,Objects }.Fig.5illustrates our mapped labels.When train the global CNN,we do the data augmentation similar to the method in [6],which gives us 2million depth images for training.Implementation details.The structure of the global CNN is the same as the one in [6],and the resolution for semantic output is 20×26,yielding 3120additional output nodes.We use caffe [16]for our network implementation.For inference over our graphical model,we use the LBP tool provided by Meltzer 1.

For the parameters balancing unary and edge potentials,in Eqn.(1),λy =4,which is learned through ML as stated in Sec.5.λie =3,λce =2,which are learned through cross-validation.For the parameters balancing the semantic and depth,we adjust them to make their numerical ranges comparable.Speci?cally,λl =0.05in Eqn.(6),λi =λd =10in Eqn.(7)and Eqn.(9),λa =0.1in Eqn.(4).For the threshold t d ,we set it to be 0.2m .

In Sec.4.1,when clustering the templates,the num-bers in ?ve semantic class are [40,40,40,60,60]respec-tively.We keep C =0.3when learning the SVR.To bal-ance different features in SVR,we normalize each feature with L 2norm,and concatenate all the features and weight each type of feature based on its relative feature length,i.e.w i = j L j /L i where L i is the length of feature type i .

1http://www.cs.huji.ac.il/

talyam/inference.html

Image Global depth Joint depth

Figure7.Qualitative visualization of two level depth prediction.

To infer the depth in Sec.5,we set r d=0.5m,r c=0.25m

and[min sc,max sc]=[0.05m,0.5m].In addition,the

learned weight matrix of w(s c,s k)in Eqn.(5)is attached

in the supplementary material.By our matlab implemen-

tation,it takes about4days to learn our models,and the

testing time for our algorithm is around40s for a480×640

under a desktop with3.4GHz processor and a K-40GPU.

6.1.Quantitative results

Depth estimation To evaluate the depth prediction,we

take various available metrics from the previous work[24,

6]to measure different aspects of the depth results.For-

mally,given the predicted absolute depth of a pixel d x

and the ground truth d?x,the evaluation metrics are:(1)

Abs relative difference(Rel):

1

N

x

|d x?d?

x

|

d?

x

;(2)Square

relative difference(Rel(sqr)):1

N

x

|d x?d?

x

|2

d?

x

;(3)Aver-

age log10error:1

N

x

|log10(d x)?log10(d?x)|;(4)

RMSE(linear):

1

N

x

|d x?d?x|2;(5)RMSE(log):

1

N

x

|log(d x)?log(d?x)|2;(6)Threshold:%of d x s.t.

max(d x

d?

x

)

We compare our results with?ve most recent methods,

i.e.Make3D[30],Depth Transfer[17],DC Depth[24],

Canonical Depth[21]and Depth CNN[6].We follow the

test setting exactly as that in Depth CNN2[6].

Tab.1shows the quantitative results from all the algo-

rithms.Our?nal algorithm,i.e.Joint HCRF,outperforms

the state-of-the art Depth CNN[6]with a noticeable mar-

gin.The results of our Global Depth CNN are compara-

ble to the one produced by[6].We think the difference

is mostly because we use a geometric preserving cropping

for data augmentation(described in our supplementary ma-

terial),yielding improvements on the metrics of Rel and

RMSE.However,we did not use the scale invariance loss

and do pre-training on imagenet as[6],which might lead

2For the results of Make3D,Canonical Depth and Depth CNN,we copy

the results that reported in[6].However,we?nd the setting of DC Depth

is different in terms of evaluated image size.Thus,we asked the author

for their results for a fair comparison.For Depth Transfer,we down-

loaded their code,and re-

trained the model to generate all the results.

(a)Depth Global Joint Global

(b)Semantic only Joint HCRF

(c)Depth only Joint HCRF

Figure8.Examples showing the intuitions behind joint prediction.

to dropping of theδmetric.By?ne-tuning the network to

jointly predict depth and semantic labels,the joint global

CNN is better than the depth-only CNN in7out of8met-

rics.It shows that the semantic labels regularized the depth

prediction through the CNN training,which bene?ts the

depth estimation.By enforcing the global and local con-

sistency in our joint HCRF,although the quantitative results

are slightly better than the global joint CNN,in Fig.7and

Fig.3in our supplementary material,we show that it pro-

vides a signi?cant improvement in visual quality both in

semantic segmentation and depth estimation.The results

from HCRF have sharper transitions at the surface bound-

aries and align to local details.The same phenomenon is

also mentioned in[6].Thus a better metric to measure the

visual quality is worth investigating in the future work.

Semantic prediction.To evaluate the semantic segmenta-

tion,we take the both the popularly used Intersection Over

Union(IOU)and pixel accuracy percentage as evaluation

metrics.We take the state-of-the-art segmentation method

R-CNN[10]for comparison to show the effectiveness of

our joint prediction.To obtain R-CNN results,we use the

author’s code3,and follow the exactly same training strat-

egy for the segmentation stated in their paper.For a fair

comparison,we apply our trained model for region-wise

features,and apply the same CRF as we did for local su-

perpixels without considering the depth information.

Tab.2shows the compared results,and our joint estima-

tion provides the best performance.As shown in the second

row,adding only the semantic guidance from global CNN

improves the performance about2.5%,which shows the

bene?ts of the interaction between global guidance and lo-

cal prediction.By adding depth information into the frame-

work,the accuracy is further improved,which proves the

complementary of the depth and semantic information.We

also tried to use a global jointly trained CNN to directly pre-

dict the semantic labels.However,such a global prediction

3https://https://www.wendangku.net/doc/7b18150500.html,/rbgirshick/rcnn

Image DC Depth[24] https://www.wendangku.net/doc/7b18150500.html,N[6]Ours depth Depth GT Ours semantic Semantic GT Figure6.Qualitative comparison with other approaches.Depth maps are normalized by their respective max depth(Best viewed in color).

Lower is better Higher is better Criteria Rel Rel(sqr)Log10RMSE(linear)RMSE(log)δ<1.25δ<1.252δ<1.253 Make3D[30]0.3490.492- 1.2140.4090.4470.7450.897 Depth Transfer[17]0.3500.5390.134 1.10.3780.4600.7420.893 DC Depth[24]0.3350.4420.127 1.060.3620.4750.7700.911 Canonical Depth[21]-----0.5420.8290.940 Depth CNN Coarse[6]0.2280.223-0.8710.2830.6180.8910.969 Depth CNN Fine[6]0.2150.212-0.9070.2850.6110.8870.971 Global CNN-Depth only0.2070.2160.1040.8230.2840.5500.8610.969 Global CNN-Joint0.2260.2080.0950.7500.2660.5930.8890.976 Joint HCRF0.2200.2100.0940.7450.2620.6050.8900.970 Table1.Quantitative comparison between our method and other state-of-the-art baseline on the NYU v2dataset.

Method Ground Vertical Ceiling Furniture Object Mean IOU Pix acc.

R-CNN[10]CRF57.83764.06216.51317.845.53640.34968.312 Semantic HCRF61.84066.34415.97726.29143.12142.71569.351 Joint HCRF63.79166.15420.03325.39945.62444.20070.287 Table2.Quantitative comparison between our method and R-CNN[10]on image segmentation task of NYU v2dataset.

only achieves30.5%in mean IOU,which is considerably

lower than the results of our HCRF.The segmentation from

the joint global CNN is very blurry,while HCRF provides

much clearer boundaries.

6.2.Qualitative results

In Fig.6,we further visually show the depth comparison

results between our method,DC Depth[24]and DCNN[6],

and the segmentation comparing with the ground truth.In

Fig.6,we can see that DC Depth uses small local segments

which suffers from local distortions due to lack of global

cues.DCNN does not have the constraint from semantic,

thus the prediction may be negatively in?uenced by appear-

ance variation,e.g.the refrigerator in the second image,

and the re?ection on the ground at right-bottom of the third

image.In our case,our approach jointly considers both the

global prediction and local details,and leverages the bene?t

from depth and semantic prediction,and therefore achieves

more consistent depth changes with the ground truth.

In Fig.7,we show that comparing with global depth out-

put,the joint output provides more detailed structures in the

scene,yielding visually more satis?ed results.In addition,

in Fig.8,we illustrate the intuition behind the joint informa-

tion of depth and semantic labels by doing experiments of

removing one from the model and test the other.In Fig.8(a),

for global prediction,by adding the semantic constraint,the

distortion of depth CNN prediction is?xed because of the

smoothness constraint enforced by the“vertical”label.In

Fig.8(b),by considering local depth change and depth dis-

continuity,the model is able to handle the appearance con-

fusion in semantic segmentation.In Fig.8(c),for?ne-level

depth estimation,by adding semantic segments,the depth

map are better aligned with object boundaries.

7.Conclusion

We propose a uni?ed approach to jointly estimate depth

and semantic labels from a single image.We formulate the

problem in a hierarchical CRF which embeds the potential

from a global CNN and a local regional CNN.Through joint

inference,our algorithm achieves promising results in both

depth and semantic estimation.In future work,we will ex-

tend to outdoor scenarios such as the KITTI dataset[9].

Acknowledgements.This work is supported by NSF CCF-1317376and ARO62250-CS,and partially done when the?rst author was an intern at Adobe References

[1]X.Boix,J.M.Gonfaus,J.van de Weijer,A.D.Bag-

danov,J.S.Gual,and J.Gonzlez.Harmony potentials -fusing global and local scale for semantic image seg-mentation.In IJCV,pages83–102,2012.2,3

[2]G.J.Brostow,J.Shotton,J.Fauqueur,and R.Cipolla.

Segmentation and recognition using structure from motion point clouds.In ECCV(1),pages44–57,2008.

1

[3]J.Carreira,R.Caseiro,J.Batista,and C.Sminchis-

escu.Semantic segmentation with second-order pool-ing.In ECCV(7),pages430–443,2012.2

[4]L.-C.Chen,G.Papandreou,I.Kokkinos,K.Murphy,

and A.Yuille.Semantic image segmentation with deep convolutional nets and fully connected crfs.In ICLR,2015.2

[5]D.Eigen and R.Fergus.Nonparametric image parsing

using adaptive neighbor sets.In CVPR,pages2799–2806,2012.2

[6]D.Eigen,C.Puhrsch,and R.Fergus.Depth map pre-

diction from a single image using a multi-scale deep network.In NIPS.2014.1,2,4,6,7,8

[7]C.Farabet,C.Couprie,L.Najman,and Y.LeCun.

Learning hierarchical features for scene labeling.In TPAMI,pages1915–1929,2013.2

[8]A.Flint,D.W.Murray,and I.Reid.Manhattan scene

understanding using monocular,stereo,and3d fea-tures.In ICCV,pages2228–2235,2011.1

[9]A.Geiger,P.Lenz,and R.Urtasun.Are we ready for

autonomous driving?the kitti vision benchmark suite.

In CVPR,pages3354–3361,2012.9

[10]R.Girshick,J.Donahue,T.Darrell,and J.Malik.Rich

feature hierarchies for accurate object detection and semantic segmentation.In CVPR,2014.2,5,7,8 [11]S.Gould,R.Fulton,and D.Koller.Decomposing a

scene into geometric and semantically consistent re-gions.In ICCV,pages1–8,2009.1,2

[12]S.Gupta,R.Girshick,P.Arbelaez,and J.Malik.

Learning rich features from RGB-D images for object detection and segmentation.In ECCV.2014.1,2 [13]C.Hane,C.Zach,A.Cohen,R.Angst,and M.Polle-

feys.Joint3d scene reconstruction and class segmen-tation.In CVPR,pages97–104,2013.2

[14]D.Hoiem,A.A.Efros,and M.Hebert.Recovering

surface layout from an image.In IJCV,pages151–172,2007.1,2[15]D.Hoiem,A.A.Efros,and M.Hebert.Recovering

occlusion boundaries from an image.In IJCV,pages 328–346,2011.1,2

[16]Y.Jia,E.Shelhamer,J.Donahue,S.Karayev,J.Long,

R.Girshick,S.Guadarrama,and T.Darrell.Caffe: Convolutional architecture for fast feature embedding.

arXiv preprint arXiv:1408.5093,2014.6

[17]K.Karsch, C.Liu,and S.B.Kang.Depthtrans-

fer:Depth extraction from video using non-parametric sampling.Pattern Analysis and Machine Intelligence, IEEE Transactions on,2014.2,7,8

[18]A.Krizhevsky,I.Sutskever,and G.E.Hinton.Im-

agenet classi?cation with deep convolutional neural networks.In NIPS,pages1106–1114,2012.4 [19]A.Kundu,Y.Li,F.Daellert,F.Li,and J.M.Rehg.

Joint semantic segmentation and3d reconstruction from monocular video.In ECCV,2014.1

[20]https://www.wendangku.net/doc/7b18150500.html,dicky,C.Russell,P.Kohli,and P.H.S.Torr.As-

sociative hierarchical crfs for object class image seg-mentation.In ICCV,pages739–746,2009.3 [21]https://www.wendangku.net/doc/7b18150500.html,dicky,J.Shi,and M.Pollefeys.Pulling things

out of perspective.June2014.1,2,7,8

[22]M.Leordeanu,R.Sukthankar,and C.Sminchisescu.

Ef?cient closed-form solution to generalized bound-ary detection.In ECCV(4),pages516–529,2012.3 [23]B.Liu,S.Gould,and D.Koller.Single image depth

estimation from predicted semantic labels.In CVPR, pages1253–1260,2010.1,2

[24]M.Liu,M.Salzmann,and X.He.Discrete-continuous

depth estimation from a single image.In CVPR,June 2014.2,7,8

[25]M.-Y.Liu,O.Tuzel,S.Ramalingam,and R.Chel-

lappa.Entropy rate superpixel segmentation.In CVPR,pages2097–2104,2011.4

[26]K.P.Murphy,Y.Weiss,and M.I.Jordan.Loopy belief

propagation for approximate inference:An empirical study.In CoRR,2013.6

[27]J.Peng,T.Hazan,D.Mcallester,and R.Urtasun.Con-

vex max-product algorithms for continuous mrfs with applications to protein folding.In ICML,2011.6 [28]X.Ren,L.Bo,and D.Fox.Rgb-(d)scene labeling:

Features and algorithms.In CVPR,pages2759–2766, 2012.1,2

[29]B.C.Russell and A.Torralba.Building a database of

3d scenes from user annotations.In CVPR,2009.1 [30]A.Saxena,M.Sun,and A.Y.Ng.Make3d:Learn-

ing3d scene structure from a single still image.In IEEE Trans.Pattern Anal.Mach.Intell.,pages824–840,2009.2,5,7,8

[31]N.Silberman,D.Hoiem,P.Kohli,and R.Fergus.In-

door segmentation and support inference from rgbd images.In ECCV(5),pages746–760,2012.1,2, 6

[32]G.Singh and J.Kosecka.Nonparametric scene pars-

ing with adaptive feature relevance and semantic con-text.In CVPR,pages3151–3157,2013.2

[33]J.Tighe and https://www.wendangku.net/doc/7b18150500.html,zebnik.Superparsing-scalable non-

parametric image parsing with superpixels.In IJCV, pages329–349,2013.2

[34]K.Yamaguchi,D.McAllester,and R.Urtasun.Ef-

?cient joint segmentation,occlusion labeling,stereo and?ow estimation.In ECCV,2014.1

[35]J.Yang,B.Price,S.Cohen,and M.-H.Yang.Context

driven scene parsing with attention to rare classes.In CVPR,2014.2

[36]S.Zheng,M.-M.Cheng,J.Warrell,P.Sturgess,V.Vi-

neet,C.Rother,and P.H.S.Torr.Dense semantic image segmentation with objects and attributes.In CVPR,2014.2

最新高考满分作文议论文5篇

最新高考满分作文议论文5篇 成人是人生必经的环节,然而这个环节的定义总是不十分明确,其精度应做到何种水准也往往无人关注,只有出现问题了,才会激起我们反思:何谓成人? 对于李杨的想法,其实我们都能理解,谁不希望有尊严地活着呢?在北京买房,几乎成为了一种身份和地位的象征。让自己过上好生活就是有成就的标准,这其实是太过普遍的心理。我们应当反思,是谁造就了这个少年天才如今为了自己的功利而刁难父母的行为。 我想,或许多半原因都是孩子父母的自作自受。少年天才的形成绝对不可能是一个孩子所自愿。不足10岁,正处于贪玩的年龄,而"玩"是锻炼孩子想象力与创造力的最佳方式。那么,是谁强迫他过早地去接触高深的知识,扭曲他正常的成长历程,硬生生地被"炒作"成一个少年天才?毋庸置疑,是孩子的生身父母。我认为,当今父母大多形成了一种望子成龙的病态心理。在这种心理的作用下,父母的功利与短浅的见识,在孩子幼年时就植入了问题的种子,最终导致如今的后果。 不过我们还要反思,教育就没有问题么?我想也不是的。教育体

制太过模式化,从来都把考试设计成孩子的最大业务,不断灌输"与考试有关"的内容。渐渐地,孩子们都明白了一个不用说即是合理的事实:考试成绩的好坏是一种身份和地位的象征。至于什么全面发展,那都只是一纸空谈。 归根到底,是人们忽略了"成人"的精度。如果从上一代开始就能深思这个问题并做出改变,推行货真价实的素质教育;如果社会的教育方向是让孩子们积蓄能力,放眼未来,也许少一个李杨似的超常少年,却可以成就更多真正有利于社会发展的人才。 "成人"二字说起来容易,但在物欲横流的当今时代,成人的标准到底应该是什么呢?在我看来,"成人"是应该用一生来研讨总结的课题。我们不应该片面地把年龄作为衡量是否成人的唯一标准。"人性的光辉"才应当成为检验作为人是否合格的尺度。成人的精度,在于能为社会的进步而存在,而并非是一时的功利满足与物质享受。 高考满分作文议论文2 平凡的一小球绒花,在风中摇晃,没有玫瑰的妖艳,没有百合的芬芳,配不上"花中君子"的大名,不可能与"濯清涟而不妖"的莲花相提并论,蝴蝶蜜蜂也是拍拍翅膀就掠过去。或许只是惹人打喷嚏,令人厌烦的花儿!可是在空中轻盈舞动的蒲公英,就是能触动人心中最柔软之处。

基于深度的图像修复实验分析报告

基于深度的图像修复实验报告

————————————————————————————————作者:————————————————————————————————日期: 2

基于深度的图像修复 一.实验目的 1、学会单幅图像的修补 2、学会结合彩色图像和深度图像的图像修补 二.实验原理 1图像修补简介----单幅图像修补 图像中常有缺失或者损坏的部分,即空白区域或者有误的区域。图像修补就是根据这些区域周围的信息完成对空白区域的填充,以实现图像的恢复。 基本方法 图像修补的基本方法示例

示例方法2 选取空白点周围的一片区域,对区域内的参考点进行加权求和,其结果用于对空白点的修补。若所选窗口太小,即窗口中无参考点,则将窗口扩大。 2图像修补简介----利用深度图的图像修补 1图像的前景与背景

实际场景中存在前景与背景的区别,前景会遮挡背景,而且前景与背景往往差距比较大。 2深度图 用于表示3D空间中的点与成像平面距离的灰度图。0~255表示,灰度值越大,表示场景距离成像平面越近,反之,灰度值越小,表示场景距离成像平面越远。 前景的灰度值大,背景的灰度值小。 如下左彩色图,右深度图 3普通的图像修补区分不了图像的前景和背景,简单的加权求和填补空白点的方法会导致前景和背景的混杂。引入深度图之后,可以利用深度图区分图像的前景和背景,在对背景进行修补的时候,可以利用深度图滤除那些前景参考点的影响,从而使背景的空白点只由背景点加权求和得到,前景亦然。

三.实验步骤 1读入一个像素点,判断其是否为空白点。 2若不是空白点,则跳过该点,判断下一个点。 3若该点是空白点,则用前面所述的方法进行加权修补。 4对图像中的每一个点都进行如此处理,直至全图处理完毕,则图像修补完成。 四.实验要求 1独立编码完成实验单幅图像修补利用深度图的图像修补。 2 比较实验结果的差别,并分析原因,完成实验报告。 五.用MATLAB编写实验代码 对于单幅图像以及结合深度图像的修补,其实区别就是是否考虑了深度图像的灰度权重(其实就是0和1),虽然效果图区别很小,但是通过深度图还是可以反映出其立体三维空间的。为了能较好的对比,我把两种方法的比较融合在一条件语句中,在下面的深度图像代码中用红色字体标注。同时深度图像变量用绿色字体标注。

深度图像的平面分割算法

深度图像的平面分割算法 王恒1,赵亮 摘 要: 三维激光扫描系统使用深度图像来重建城市建筑模型,现有激光点云数据处理系统程序直接 进行海量点云数据建模较为困难。因为实际模型往往含有复杂表面几何特征,如果利用深度图像直接进 行拟合,则会造成建筑模型的数学表示和拟合算法处理的难度加大,甚至无法使用数学表达式描述三维 模型。 深度图像拟合或深度图像分割,是将具有相同几何特征的点云数据划分同一个区域并进行曲面拟合。深度图像分割主要有两种方法:一种是基于边缘分割的方法,另一种基于区域生长的方法。由于深度图像获取方式的特点,其点云数据往往不连续含有较多的噪声。使用基于边缘分割算法,仅当所处理点云数据具有连续性并且噪声点比较少的情况下,方能有效实现边界点分割。因此深度图像的分割方法也较多的集中在基于区域的分割方法。利用高斯曲率和平均曲率及方向将点附近的曲面类型分为8种,对数据点进行初步分类,从初步分类的点集中找一个合适的生长点,用二次多项式函数来进行区域生长,实现了深度图像数据的分块与曲面拟合。本文提出的算法利用了建筑物包含大量平面的特点,将分割问题具体到了平面分割,从而避免了低精度估计高斯曲率和复杂的二次曲面拟合,完成了一种高效简洁的算法用来识别深度图像的平面特征。该算法借鉴了数字图像处理方法首先平面拟合邻近n*n区域的点集,计算出邻域点集的协方差矩阵及其特征值和特征向量;如果绝对值最小特征值小于阈值,则可以认为该点的局部平面拟合成功;最小特征值对应的特征向量为局部法向量。接着从左到右沿扫描线遍历深度图像中的每个坐标点,按照聚类分析的原则对该点和其上方、右上方、右方的三个坐标点的法向量按照具有相同方向和同处一个平面的两个相似度准则进行比对,来获得两个局部平面是否同处一个平面。本文提出的算法将深度图像分割成为若干个集合,每个集合的任意点都同处一个平面。本文根据深度图像的数据特征给出了合理高效的数据结构和算法,并以图例说明深度图像平面分割算法的有效性。 关键词:点云数据;深度图像;平面分割

高考议论文优秀题目作文集锦

高考议论文优秀题目作文集锦 议论文作为高考作文的一种形式,它是学生表述自己独特见解,培养自己独特风格的主要阵地。以下是小编整理的高考议论文优秀题目,希望可以帮到大家! 高考议论文优秀题目:亮出你的笑容微笑是人类最美丽的表情。她是以自信架起的希望灯塔,是弱小者手心的一片爱的阳光,是乞食者心中的一块甜美的奶酪,是智者融化冰山的熊熊烈火。无论你是经受着风吹雨打,还是沐浴着阳光雨露;无论你是已攀上了顶峰,还是被困于巨谷深渊,生命的微笑都能感化潮湿的心情,抹去不悦的色彩。 还记得海伦凯勒吗?当她的生命在黑暗里碰壁时,正是沙莉文老师那微微的一笑,使她感悟到了阳光般的温暖。她说:温暖的阳光照在我的脸上,我的手指触到了鲜花和叶子,我意识到春天来临了。 美国钢铁大王卡耐基说:微笑是一种神奇的电波,它会使别人在不知不觉中同意你。在一次盛大的宴会上,一个平日对卡耐基很有意见的商人在背地里大肆抨击卡耐基,当卡耐基站在人群中听到他高谈阔论的时候,他还不知道,这使得宴会主人非常尴尬,而卡耐基却安详地站在那里,脸上挂着微笑,等到抨击他的人发现他的时候,那人感到非常难堪,正想从人丛中钻出去,卡耐基的脸上依然堆着笑容,走上去亲热地跟他握手,好像完全没有听见他讲自己的坏话一样。后

来,此人成为了卡耐基的好朋友。正如雨果所说:微笑就是阳光,它能消除人们脸上的冬色。 微笑不仅能让人驱走心灵的阴霾,还会让人变得友善。有一次,一位窘困不堪的乞食者将手伸到了屠格涅夫面前,但屠格涅夫找遍身上的每一个角落,什么也没有。于是他紧紧握住乞者的手,微笑地说:兄弟,很抱歉,今天我忘记带了。乞讨者眼里荡漾着异样的光芒,感动地说:这个手心,这个微笑,就是周济! 阳光总在风雨后,不管失败还是痛苦,我们如果能快乐地笑一笑,高歌生活多么好,蓝天白云多么美,那我们就会获得微笑的幸福,甚至能拥有金灿灿的硕果。朋友,快快亮出你的笑容吧! 高考议论文优秀题目:让人工智能变成单刃剑以前我们谈起科技的进步和网络的运用,总是会说它是一把双刃剑,有利也有弊。而如今,面对日益发达的人工智能,我却要说:如今,摆在我们眼前的任务是,让它变成一把单刃剑。 让人工智能变成单刃剑,是要我们用正确的态度去面对它。就像险胜阿尔法狗一局的李世石说:人机大战并没有让我感到失败的痛苦,反而是有更理解棋的快乐。就像是三局连败的天才棋手柯洁说:阿尔法狗让我更理解围棋的奥妙。我们在面对人工智能越来越迅猛的发展时,也要有更积极的态度和更清醒的认识,不能一味的夸赞,人工智能如何优秀,如何无敌也不能盲目的贬低人类看清人类,我们要知道的是阿法狗只是一个机器,一个我们人类创造出来的玩意儿,他没有头脑,没有情感,甚至没有智商他的智商,不过就是我们研发时输入

基于深度图像技术的手势识别方法

基于深度图像技术的手势识别方法 曹雏清,李瑞峰,赵立军 (哈尔滨工业大学机器人技术与系统国家重点实验室,哈尔滨 150001) 摘 要:针对复杂环境下的手势识别问题,提出一种基于深度图像技术的手势识别方法。利用深度图像信息从复杂环境中提取手势区域,综合手势的表观特征,建立决策树实现手势的识别。对常见的9种手势在复杂背景条件下进行测试,实验结果表明,手势的平均识别率可达到98.4%,速度达到每秒25帧。 关键词:手势识别;深度图像;表观特征;复杂背景;决策树 Hand Posture Recognition Method Based on Depth Image Technoloy CAO Chu-qing, LI Rui-feng, ZHAO Li-jun (State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China) 【Abstract 】Aiming at the problem of hand posture recognition from complex backgrounds, this paper proposes a hand posture recognition method based on depth image technoloy. The hand posture region is extracted from complex background via depth image. Appearance features are integrated to build the decision tree for hand posture recognition. Nine common postures with complex background are tested. Experimental results demonstrate that recognition rate is 98.4% and speed rate achieves 25 frames per second. 【Key words 】hand posture recognition; depth image; appearance feature; complex background; decision tree DOI: 10.3969/j.issn.1000-3428.2012.08.006 计 算 机 工 程 Computer Engineering 第38卷 第 8期 V ol.38 No.8 2012年4月 April 2012 ·博士论文· 文章编号:1000—3428(2012)08—0016—03文献标识码:A 中图分类号:TP391 1 概述 随着机器人技术的发展,智能服务机器人开始逐步融入人类的生活,人机交互技术的发展是智能服务机器人研究的重要组成部分。人机交互的研究方向主要有表情、声音和手势识别技术。其中,手势是一种自然直观的人机交流模式,在信息表达上比表情更明确、信息量更丰富。 在人机交互中,基于视觉的手势识别是实现新一代人机交互的关键技术。基于视觉的手势识别通过图像信息,让机器人获取人的手势姿态信息,对不同的手势信息进行分类。近年来,学者们对基于视觉的手势识别进行研究,文献[1]提出特征空间匹配法,文献[2]采用提取同类手势的SIFT 特征方法进行手势识别,文献[3]提出手势图解析分类法,文献[4]针对复杂背景采用空间分布特征对手势进行识别。 复杂背景下手势识别的研究[4-5]一般使用肤色模型实现手势区域分割,虽然可以区分肤色和非肤色区域,但是实际应用中图像都包含有肢体和面部肤色区域,它们对手势区域提取有着较大的影响,同时光线也直接影响肤色模型。现有的文献中并没有考虑强光和弱光环境下的手势识别情况,在实际应用中往往伴随光线的变化问题,这些问题都直接影响了人机交互的实际效果。本文提出一种基于深度图像技术的手势识别方法,从深度图像信息中提取手势区域,运用几何方法提取手势表观特征并分类,从而实现复杂环境下基于手势的人机交互。 2 2.1 手势识别系统 基于深度图像信息的手势识别系统框架如图1所示。深度图像的采集采用非接触测量方式,获取的场景中深度信息不受物体自身的颜色、纹理特征以及背景环境光线强弱的影响。本文选用微软公司的Kinect 进行深度图像的采集,获取 深度信息值,转换到灰度值图像,经过手势区域分割、特征提取、手势识别,转换为手势信息,提供人机交互使用。 图1 面向人机交互的手势识别框架 2.2 手势区域分割 2.2.1 深度图像成像原理 深度图像技术在近年来逐步得到广泛的关注,Kinect 采集640×480像素的深度图像速度可以达到30 f/s ,深度的分辨率在5 mm 左右。本文把Kinect 采集到的场景深度值转换到灰度值空间,实际场景的RGB 图像如图2所示。在深度图像中像素点灰度值的大小对应于场景中点的不同深度值,如图3所示,图像中的灰度值越大,表明该区域距离深度摄像头的距离越近。 图2 场景RGB 图像 图3 深度图像 基金项目:国家自然科学基金资助项目(61075081);机器人技术与 系统国家重点实验室课题基金资助项目(SKLRS200802A02) 作者简介:曹雏清(1982-),男,博士研究生,主研方向:模式识 别,人机交互;李瑞峰,教授、博士生导师;赵立军,讲师 收稿日期:2011-07-11 E-mail :caochuqing@https://www.wendangku.net/doc/7b18150500.html,

8 基于几何光学的单幅二维图像深度估计

第五章基于几何光学的单幅二维图像深度估计 第五章基于几何光学的单幅二维图像深度估计 由上一章的内容可知,图像大小恒常性计算的关键在于正确地估计二维图像的深度。二维图像深度估计也是计算视觉中的重点与难点。视觉心理学家通过经验观察和对人的统计实验,总结了人类视觉系统深度感知规律。在上一章的实验表明,应用这些规律建立的单幅二维图像深度模型基本上是有效的,但也存在一些没有很好解决的矛盾,如各种深度线索间的冲突。其次,这些规律是建立在人的主观实验之上的,本质上也需要进一步从物理学的角度进行解释。再次,虽然照相机与人眼在光学成像原理上是基本相同的,但在实现细节上还是存在一些差异。所以本章从几何光学出发,提出了一种基于几何光学的二维图像深度计算方法,并与上一章的基于心理学的深度模型实验结果进行比较,探讨心理学结论应用到计算机视觉问题中的适应性问题。 5.1 引言 尽管学者已从不同的角度对二维图像深度估计问题进行了卓有成效的研究,基于单幅图像(Single-image based)的深度计算仍然是一个挑战性问题。现有的各种方法都存在一定的局限性。用阴影求深度方法(Depth from shading)依赖太多的假定[Forsyth 2003, pp80-85][Castelan 2004][严涛2000]。在这些假定中,多数假定与客观世界的自然场景不完全一致。用模型求深度的方法(Depth from model)需要物体或场景模型的先验知识[Jelinek 2001][Ryoo 2004][Wilczkowiak 2001]。当物体或场景很难建模,或者模型库变得很大时,这种方法就会失效。用机器学习求深度的方法(Depth from learning)要对大量的范例进行训练[Torralba 2002][Battiato 2004][Nagai 2002],而且它们的泛化能力是很弱的。用主动视觉求深度方法(Depth from active vision)如编码结构光(Coded structured light)、激光条纹(Laser stripe scanning)扫描等需要昂贵的辅助光源设备来产生显著的对应点(对应元素)[Forsyth 2003, pp467-491][Wong 2005][Nehab 2005]。它轻易解决了图像体视匹配(Image stereo matching)难题,代价是丢失了物体或场景的其它的重要表面属性,如强度、颜色、纹理等。各种方法的比较见本章表5-4。 然而,人类视觉系统能轻易地、完美地感知单幅图像深度,即使只用一只眼睛看图片时也是如此。而且,人类视觉系统在完成这项任务时,好像毫不费 65

高中优秀议论文范文【十篇】

高中优秀议论文范文【十篇】 【篇一】高中优秀议论文范文后悔是一帖中药 有人说:“后悔是比损失更大的损失,比错误更大的错误,所以请不要后悔。”我认为这种说法欠妥。人非圣贤,孰能无过?说错话,做错事,走错路,遇到挫折,经历失败……这些都是在所难免的,每当此时,一般人都会产生懊恼、后悔的情绪,但这并没什么大不了,有时,后悔还真不一定就是坏事呢! 人不后悔,紧绷的神经之弦何以缓解?生活本就不易,不可能万事都称心如意,遂人所愿。也不是人人都有先见之明,都有做好每件事情的把握。既然如此,那么人生磕磕绊绊,遇到挫折、失败也是在所难免的。那些追求十全十美,不想留有遗憾的人,做任何事都会过于小心翼翼,神经也时刻紧绷着。这样往往适得其反,把事情办糟。也就是说,如果一个人不允许自己后悔,那么他的心理压力就得不到释放,反而更容易出错。后悔也是有好处的,它能够缓解压力与人们过度紧绷的神经之弦。 女子体操运动员桑兰,她对自己的要求十分严格,从不允许自己失败和后悔。跳马本是她的强项,可在一次比赛前,她为了发挥出自己的水平,一遍又一遍地试跳,结果最后一跳时,因神经高度紧张,头先着地,造成腹部以下高位截瘫。假如她当时不这样过度严格苛求自己,允许自己失败,或许就不会发生这样的惨剧。事过多年,她才坦言:“不想让自己失败和后悔,有时是非常可怕的,我要是少试跳一次就好了。”这也是她给后人总结出的经验教训。 人不后悔,何以在失败中吸取经验和教训?失败并不可怕,可怕的是失败后不吸取经验教训。失败后的后悔,实际上是一种总结经验教训的过程。

有个海员,因工作需要,他经常晚归。年迈的父亲常等他至深夜,每天都会问:“饿不饿,要不要来碗面?”其实父亲是想和儿子聊聊天,但海员并不懂得珍惜,总生硬地拒绝。一天,他接到父亲病故的噩耗,可他却要15天后才能返航,他内心十分愧疚、后悔,因为他从没和父 亲好好聊过天,也不曾抽时间多陪陪父亲。子欲养而亲不在,这是人 生的哀痛。幸运的是,他还有母亲,有过一次后悔的反思,他放弃了 航海员这份高薪的工作,回到了母亲自边。他终于明白,一定不能把 对父亲的遗憾再留给母亲了,他要好好地侍奉母亲。 俗话说,幸福使人长久单纯,不幸却使人迅速成长。失败之后的 后悔虽然让人不愉快,但恰恰是这种情绪,才使人明白如何才能让后 悔的事不再发生。 掀起“林旋风”的林书豪,也以前历过无数次的失败,甚至险些 与NBA擦肩而过。每次比赛后,他总是后悔自己没能发挥出水平,但 他却能在后悔中总结经验。终于,在一次主力受伤,自己临时替补上 场时,他抓住了这个宝贵机会,超常发挥了自己的水平,最终一战成名。 所以,后悔不是全无作用的,它就像一帖中药,虽苦涩,虽不能 立即见效,但它温婉、浸润,最终能把病医好。后悔这贴良方,能让 人们在持续地反省中进步。 【篇二】高中优秀议论文范文打破成规,创造精彩 从古至今,规则一向都是人们所敬畏和遵守的,但《汤姆·索亚 历险记》中的汤姆·索亚和哈克贝利·费恩却不太遵守规则。他俩是“调皮鬼”,讨厌牧师骗人的鬼话,不喜欢学校枯燥刻板的教育方式……他们总想法子打破限制学生的各种规则。 不过,在危险降临之际,面对正义与邪恶的较量,那些怯弱的 “好孩子”、“优秀生”都躲了起来,而汤姆·索亚和哈克贝利·费 恩却能挺身而出,勇敢地挡在前头,体现他们果敢的魅力。他们勇于 打破成规的精神是令人敬佩的。

高中议论文范文800字(共7篇)

篇一:2011优秀的高中800字作文范文 朝春柳韧·愁丝千百段 一曲新词酒一杯,去年天气旧亭台,夕阳西下几时回?无可奈何花落去,似曾相识燕归来,小园香径独徘徊。 ——晏殊《浣溪沙》 洋洋洒洒,沐浴着朝阳的光艳,柔风轻轻地荡着韧如丝的新柳。湖波翻动着水光,倒映着岸边的山景悠悠然飘出一段白绸,把酒握书卷,飘飘然前行。 观赏着早春的风景,看燕在梁间戏舞,观黄莺在树上欢歌,映衬着我惆怅的心境,不知不觉,已然抚琴而坐,和着春光绿景,吟起悲凉之作。虽人在春中,心却还是秋时。 无意中漫步,沿着园中的新绿,默默地又来到分别之处,那亭,依然在伫立在春风朝阳之中,犹如旧时的欢娱之景。想那时,友人聚在此处,把酒当歌,吟诗作画,何等快活!而如今却没有留下一片回忆。一阵风悠然而起,扬起的柳絮打乱我的思绪,猛然间回到眼前,今日早已不同于昔时。叹一声,无奈中前游。 走走停停,回想曾经一幕幕,满心怅恨,才了解,失去后才懂得珍惜,却也晓得为时已晚。友人如天边晓亮的晨晖,一瞬间悄然而去,而且一去不返。 偶然间抬头仰望,却看那山水之外上午一轮红日,壶中酒已无点滴,与这落日相映,手中竹卷也散开来。才知道,茫然中我又过了一日,望那即将归隐的阳光,心中忽地腾起一般的红云。 不知何时,一只月也悄然地转上枝头,弯弯的月牙,隐匿在乌云之中,朦胧中增添了一缕忧愁,依稀记起秋天的落花,想自己也是一片寂寞的花瓣,随风飘落,沉埋于尘土之中。几只雏燕在空中划着一道道完美的弧线,也荡起我一翻翻的思愁。 春花的暗香萦回在春风中,人已然醉去,吟一句“昔人已乘黄鹤去,此地空余黄鹤楼”,拂袖抽身而回,夹着愁思,千百段…… 篇二:高中作文800字 高中作文800字.txt假如有一天你想哭打电话给我不能保证逗你笑但我能陪着你一起哭。坚强的基本,就是微笑。面具戴久了丶就成了皮肤≈<< “精神的三间小屋”>> 毕淑敏在《精神的三间小屋》中说:精神有三间小屋。第一间,盛放我们的爱和恨;第二间,盛放我们的事业;第三间,盛放我们自己。 在我们的心中也有三间小屋。 第一间,盛放我们的苦难。南太平洋的小岛幼龟,只有经老鹰不断地啄食,在竞争中得以生存。而没有丝毫苦难意识的龟群,终将成为雄鹰的腹中之食。没有经过流水冲蚀的卵石,不会发出夺目的光彩;没有经过风雨洗礼的树木,不会在风雨中愈益苍翠;没有经过苦难洗礼的人类,不会在自然面前更加挺拔。 苏轼一生颠沛流离,多次遭贬,但他的心中总是将苦难承载。“乌台诗案”后被贬黄州的苏东坡,依旧乐观地迎接苦难,在给友人李常的信中写“虽遭贬于此,遇事者有尊主泽民者,则为之录”。这才是一代伟人文豪的处世之学,将苦难放在心底,微笑面对生活。 古有言:三思而后行,充分地思考,才有扫清行进之中的障碍,庞涓未经深思,轻信孙膑所设之计,才有了“庞涓死于此树之下”的遗憾,可马懿未经熟虑,轻信孔明空城之计,才有了拥有重兵而致大败的千古遗恨。而正因为有了多次思考,敢于否定前人,才有了给阿基米德一个杠杆,他可以撬起整个地球的佳话,才有了给卡文迪许一杆巨称,他可以称出地球质量的传奇。 第三间,盛放我们自己。即使是年幼的龟,也可以在危难时刻缩回去,重新审视世界,作为有成熟心智的人类,更应摆正自己的位置,即使有苦有难,也可以进过思考,找寻出路。坚信,我是自己的主宰,虽然失望,但不绝望。

基于深度图像技术的手势识别方法

龙源期刊网 https://www.wendangku.net/doc/7b18150500.html, 基于深度图像技术的手势识别方法 作者:付学娜 来源:《电子技术与软件工程》2015年第04期 所谓手势是指人手或手臂同人手结合而产生的动作或姿势,作为人机交互的一项重要技术,手势识别通过在人手安置相关的硬件设备,并通过硬件设备中的计算模块对人手的位置和速度等信息进行获取,对于识别过程中的定位和跟踪均都具有良好的指导和保障作用。本文通过对手势识别系统与深度图像的成像原理进行阐述,进而结合手势区域分割的相关理论,对基于深度图像技术的手势识别方法展开了深入研究。 【关键词】深度图像技术手势识别圆形轨迹像素值变化点 随着科技的不断发展,基于视觉的手势识别已成为新一代人机交互中的核心技术。在借助相关图像信息的基础上,计算机可以对人手的各种姿态信息以及不同的手势信息尽心准确识别,有效提高了识别的速度与质量。本文以基于深度图像技术的手势识别作为研究对象,通过对手势识别系统及深度图像成像原理进行分析,从手势区域分割以及手势特征提取两方面出发,对深度图像技术下手势识别的方法做出了详细分析。 1 手势识别系统与深度图像成像原理 基于深度图像技术的手势识别系统主要包括了手势、深度图像、手势区域分割、手势特征提取以及手势识别和人机交互等,深度图像以非接触测量的方式对场景中的深度信息进行采集,而所采集的深度信息具有较强的稳定性和可靠性,即不受物体(人手)自身颜色、背景环境和纹理特征等因素的影响。本文以微软的Kinect作为图像采集和获取深度信息的工具,进而对手势识别展开分析。 基于Kinect下的深度图像技术下所采集的640×480深度图像信息的速度可达30f/s,且信息的分辨率维持在5mm左右,在应用方面具有较强的合理性。通过在相关场景采集的场景深度值进行转换,使其转移到灰度值空间,并使深度图像中所有的像素点灰度值大小与实际场景中不同的深度值相对应,进而显示成像。值得注意的是品拍摄区域与深度摄像头之间的距离与图像中的灰度值呈现出明显的负相关关系,即灰度值越大,距离越近。 2 基于深度图像技术的手势识别 2.1 手势区域分割 虽然具有相同深度的像素点,其在深度图像中所具有的灰度值也具有较高的一致性,但由于在每次对人手手势进行拍摄时,人手同深度摄像头间的距离存在差异。因此,无法利用单一的固定阈值对手势区域进行分割,故本文以灰度值直方图作为主要研究方法,进而确定出相关背景及手势区域分割的阈值。由于人手做出相关姿势的区域距离深度摄像头较近,且相对于整

深度图像的二值化

3.2 深度图像二值化 图像二值化是图像处理中的一项基本技术,也是很多图像处理技术的预处理过程。在颗粒分析、模式识别技术、光学字符识别(OCR)、医学数据可视化中的切片配准等应用中,图像二值化是它们进行数据预处理的重要技术。由于图像二值化过程将会损失原图像的许多有用信息,因此在进行二值化预处理过程中,能否保留原图的主要特征非常关键。在不同的应用中,图像二值化时阈值的选择是不同的。因此,自适应图像阈值的选取方法非常值得研究。研究者对图像二值化方法进行了讨论,在此基础上提出了一个新的图像二值化算法。该算法基于数学形态学理论,较好地保留了图像二值化时原图的边缘特征。本文主要研究二值化及灰度图像二值化方法。 3.2.1.灰度图像与二值图像 数字图像是将连续的模拟图像经过离散化处理后得到的计算机能够辨识的点阵 图像。在严格意义上讲,数字图像是经过等距离矩形网格采样,对幅度进行等间隔量化的二维函数。因此,数字图像实际上就是被量化的二维采样数组。一幅数字图像都是由若干个数据点组成的,每个数据点称为像素(pixel)。比如一幅 256×400,就是指该图像是由水平方向上256列像素和垂直方向上400行像素组成的矩形图。每一个像素具有自己的属性,如颜色(color)、灰度(grayscale)等,颜 色和灰度是决定一幅图像表现里的关键因素。数字图像又可分为彩色图像、灰度图像、二值图像。 3.2.1.1彩色图像 彩色图像是多光谱图像的一种特殊情况,对应于人类视觉的三基色即红(R)、绿(G)、蓝(B)三个波段,是对人眼的光谱量化性质的近似。彩色图像中的 每个像素的颜色有R、G、B三个分量决定,而每个分量有255种值可取,这样一个像素点可以有1600多万的颜色的变化范围。而灰度图像是R、G、B三个分量相同的一种特殊的彩色图像,一个像素点的变化范围为255种。图1-1为彩色图像。

最新整理高考议论文范文1000字三篇

高考议论文范文1000字三篇 拳拳赤子心_高中议论文900字 当楚国奸佞当道,国君昏庸,屈原只能含恨跃入汩罗江;当“虞常事件”波及到自身安危,苏武仍未忘记自己的使命;当天子体恤臣子,为其建筑府第,但霍去病豪情满怀,声称:“匈奴未灭,何以家为? 我知道,他们的拳拳赤子之心,是先贤的骄傲。屈原,苏武,霍去病……他们,以及千千万万,前仆后继的炎黄子孙在天国奏起他们的爱国乐章。 “众人皆醉我独醒,众人皆浊我独清。" 面对秦兵强大的力量,面对楚国*的倾轧与黑暗,怀有一腔爱国热情的香草美人,并没有向奸佞认输,而是坚守自己的爱国情怀。是的,他的爱国情怀让他不再卑微,他的身影留在了后世子孙的记忆中。 试想,假如屈原没有自居香草美人,与俗世同污,就不会有众多的诗词留传至今,也不会有端午节的粽子习俗,更不会留下先秦时期的爱国足迹。 “风吹草绿待牛羊,天暖雪融盼汉庭。” 苏武虽心怀壮志,但却因故而被羁押他国。匈奴的种种*,他不屑一顾;匈奴的种种折磨,他咬牙坚持;匈奴的种种威逼,他一一承受。他数十载年华里,只有一个信念:回到汉庭,交付使命。他醉心于牧羊,在春天,在夏日,在秋风,在冬雪中,每日

远眺汉都,盼望着早日回到祖国。 若是苏武没有忍受匈奴的威逼利诱,就不会有苏武牧羊的美谈,也不会有汉皇的赞赏,更不会留下坚贞的爱国呓语。 “匈奴南下草木瑟,黄金难受天子意。” 霍去病虽寿年有限,但却倾尽一生,戎马军旅,屡立奇功。在汉武大帝为表彰他的功绩,建筑了一座华丽府第后。他却豪言道:“匈奴未灭,何以家为?“热爱祖国的他,以一己之力,保卫着国家的一疆一土。他,是民族英雄。 正是因为霍去病不慕名利,为将尽忠职守,才得到了汉武帝的褒奖,才会得到百姓的爱戴,才会在青史中留下他的笑容。 如今的我们,虽然没有科学家丰富的科学知识;虽然没有先贤们卓越的才华;虽然没有先烈们抛头颅,洒热血的大无畏精神。但是我们仍旧是祖国的花朵,担负着未来的责任。我们只有回首过去,努力吸取先辈们的力量;只有驻足当下,努力丰富强大自身;只有展望未来,努力承担起人生的责任。我们才能无愧于心地说:我们是中华儿女! 当屈原采摘香草,寄托愁思时;当苏武眺望故土,双目含泪时;当霍去病翻身上马,勇斩一将时……我看到他们怀着拳拳赤子心,吟唱着爱国的乐章。 乐观,触动了我的心灵_高中议论文1000字 游弋于人海,踯躅于人世,我们会遇到许多挫折,遇到这样那样的问题。人生有太多不如意,但是生活还是要继续,要想从

基于CNN特征提取和加权深度迁移的单目图像深度估计

2019年4月图 学 学 报 April2019第40卷第2期JOURNAL OF GRAPHICS V ol.40No.2 基于CNN特征提取和加权深度迁移的 单目图像深度估计 温静,安国艳,梁宇栋 (山西大学计算机与信息技术学院,山西太原 030006) 摘要:单目图像的深度估计可以从相似图像及其对应的深度信息中获得。然而,图像匹配歧义和估计深度的不均匀性问题制约了这类算法的性能。为此,提出了一种基于卷积神经网络(CNN)特征提取和加权深度迁移的单目图像深度估计算法。首先提取CNN特征计算输入图像在数据集中的近邻图像;然后获得各候选近邻图像和输入图像间的像素级稠密空间形变函数; 再将形变函数迁移至候选深度图像集,同时引入基于SIFT的迁移权重SSW,并通过对加权迁移后的候选深度图进行优化获得最终的深度信息。实验结果表明,该方法显著降低了估计深度图的平均误差,改善了深度估计的质量。 关键词:单目深度估计;卷积神经网络特征;加权深度迁移;深度优化 中图分类号:TP 391 DOI:10.11996/JG.j.2095-302X.2019020248 文献标识码:A 文章编号:2095-302X(2019)02-0248-08 Monocular Image Depth Estimation Based on CNN Features Extraction and Weighted Transfer Learning WEN Jing, AN Guo-yan, LIANG Yu-dong (School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China) Abstract: The depth estimation of monocular image can be obtained from the similar image and its depth information. However, the performance of such an algorithm is limited by image matching ambiguity and uneven depth mapping. This paper proposes a monocular depth estimation algorithm based on convolution neural network (CNN) features extraction and weighted transfer learning. Firstly, CNN features are extracted to collect the neighboring image gallery of the input image. Secondly, pixel-wise dense spatial wrapping functions calculated between the input image and all candidate images are transferred to the candidate depth maps. In addition, the authors have introduced the transferred weight SSW based on SIFT. The final depth image could be obtained by optimizing the integrated weighted transferred candidate depth maps. The experimental results demonstrate that the proposed method can significantly reduce the average error and improve the quality of the depth estimation. Keywords: monocular depth estimation;convolution neural network features; weighted depth transfer; depth optimization 收稿日期:2018-09-07;定稿日期:2018-09-12 基金项目:国家自然科学基金项目(61703252);山西省高等学校科技创新项目(2015108) 第一作者:温静(1982 ),女,山西晋中人,副教授,博士,硕士生导师。主要研究方向为图像处理、计算机视觉等。E-mail:wjing@https://www.wendangku.net/doc/7b18150500.html,

高考满分作文议论文阅读

高考满分作文议论文阅读:不经意的善当香蕉皮躺在人行道上时,总有人把它拾起;当有怀着宝宝的妈妈走上车时,总有人已经准备好了座位;当听见有因为嘴馋卡在树上的喵喵声时;总有人停下来帮它解围……这些场景每天都发生在我们身边,你已经不再感到好奇,也无需好奇,只是一份温暖流过心间。这是一种善良,不经意间,已经外露。 人之初,性本善。古圣人早已为我们解释了这种奇妙的现象。善来自本心,即使你说你很冷漠,你也曾在别人无助的时候投出过坚定的眼神;即使你说你很自私,你也曾在别人困窘时伸出过一双温暖的手;即使你说你没有行过善,你也曾在街上投下过一枚小小的硬币。因为善来自本心,很小的事就可以体现,小到连自己都察觉不到,你也未曾思量过有所回报,只一份温暖流过,来自本心,已不足为奇。 一直被这样一个故事感动着:一个越狱的犯人又饿又渴,来到了一个水果摊前,摊主是位老婆婆,看见这穷囧的人,动了恻隐之心,二话不说拿起两个又红又大的苹果塞给这人。接下来的两天情况依然如此。就在一天的生意结束之后,老婆婆在收摊的时候发现了一张通缉令,那人不是别人,正是她送苹果的乞人。摊主报了警。就在犯人再次出现的时候被一举抓获了。没有怨恨亦没有惊恐,反而是欣慰的笑。而老婆婆得到了十万元的赏金…… 老婆婆的善良不仅唤醒了一颗迷失的心,也保全了自己,并得到了一笔意外之财。善良也许就是有这样神奇的功能,也许是不经意的,也许看上去微不足道,但无形间它已经改变了很多。

不要对这个社会失去信心,不要说善良难求。善良时时刻刻都在,存在于每个角落。小悦悦走后批评、愤怒、自我检讨的同时不要忘记还有个陈阿姨,彭宇案后怀疑、惊恐、忐忑的背后仍然还有不顾一切后果搀扶老人的大学生。 善良来自本心,不要吝惜你的善良,从心而动,尽情挥洒。这个世界会因为你的不经意更加美丽。

单目深度估计文献翻译unsupervised monocular depth estimation with left-right consistency

左右(视差)一致的非监督式单目深度估计 摘要 以学习为基础的方法已经在对单张图片的深度估计上取得了可观的结果。大多数现有的方法是将深度预测作为监督式的回归问题来处理,然而这种方式需要大量相应的真实深度数据用于训练。然而,单单从复杂环境中获取高质量的深度数据就已经很有难度了。我们将在本文中对已有方式进行创新,不再对深度数据进行训练,而是训练更容易获得的双目立体连续镜头。 我们提出了一种新颖的训练目标,即使在缺少真实深度数据的情况下,仍然能够使用卷积神经网络来完成单张图片的深度估计。利用极线几何限制,我们通过训练有图像重构损失函数的网络生成了视差图像。我们曾发现单独进行图像重构会导致深度图像质量很差。为了解决这个问题,我们提出了一个新颖的训练损失函数,可以使左右图像产生的视差趋于一致,以此来提高当前方式的表现和健壮度。我们的方法在KITTI 驾驶数据集上展示出艺术般的单目深度估计效果,甚至优于基于真实深度数据的监督式学习的效果。 1.简介 在计算机视觉领域,对图片进行深度估计已经有了很久的历史。目前的成熟方式依赖于连续动作、X 射线下的形状、双目和多视角立体模型。然而,多数的上述技术是基于可获取相关场景的观测数据的假设。其中,数据可能是多角度的,或者观测是在不同的光线环境下进行的。为了突破这个限制,近期涌现出大量在监督式学习下对单目深度识别的讨论。这些方法试图直接在线下通过大量真实深度数据训练的模型来对图像中的每一个像素进行深度估计。这些方法虽然已经取得巨大的成功,但是是建立在可获取大量图像数据集和相应的像素深度的情况下的。 在单张图像里获取不受外表干扰的场景形状是机器感知的基础问题。很多此类的应用,比如在计算机图形学中合成对象的插入、在计算机摄影学中对深度的合成、机器人抓握,会使用深度为线索进行人体姿

基于深度学习的图像深度估计及其应用研究

基于深度学习的图像深度估计及其应用研究场景深度估计是计算机视觉领域的一项重要课题。利用图像的深度信息,可以重构场景的三维结构信息,对机器人自主导航、物体识别与抓取等任务具有重要意义。 传统的视觉深度估计方法多利用场景的多视信息,通过三角几何对应关系从二维图像中恢复场景深度,计算量大且复杂。近年,随着深度学习的发展,利用卷积神经网络重构场景深度成为研究者关注的热点方向。 卷积神经网络可以利用图像数据及其配套的基准深度数据预先训练学习,在测试阶段可以实现端到端的全分辨率图像深度估计。该方法不仅速度快,实现简单,而且可实现场景的尺度恢复,有益于机器人的空间任务执行。 在此背景下,本文在深入研究近年基于卷积神经网络的深度估计方法基础上,提出创新性的端到端深度学习网络,实验证明所提方法可进一步提升算法性能。本文首先提出了一种端到端的学习方案,用于从稀疏深度图和RGB图像中预测尺度化的稠密深度图。 该方案中,首先利用稀疏采样生成稀疏深度图,然后将彩色图像和稀疏深度图作为网络输入,输出全分辨率深度图像。在训练过程中,稀疏深度图作为深度估计网络的监督信号来恢复场景的真实尺度。 为了更精确的估计场景深度,本文引入“correlation”层,人工模拟标准匹配过程来融合稀疏深度信息和彩色图像信息,即使用颜色信息来帮助提高基于稀疏深度图的预测精度。最后,利用精细化模块以全分辨率输出场景深度图像。 在NYU-Depth-V2和KITTI数据集上的实验结果表明,与前沿算法相比,该模型能够以全分辨率恢复尺度化的场景深度,具有更优的性能。本文提出了并行构

建的深度估计网络和相机位姿估计网络。 相机位姿估计网络以单目视频序列为输入,输出六自由度的相机相对位姿。深度估计网络以单目目标视图为输入,生成稠密的场景深度。 最后基于相机模型,生成合成视图,并把它作为监督信号联合训练两个并行的估计网络。与此同时,稀疏采样生成的稀疏深度图作为深度估计网络的另一个监督信号,帮助恢复其全局尺度。 深度估计网络获得的尺度信息又通过合成视图与目标视图的光度误差耦合传递给位姿估计网络。在测试阶段,深度估计器和位姿估计器可以分别独立的使用。 在KITTI数据集上对本文算法进行了实验评估,所提算法在多个指标上优于前沿算法。

图像深度与颜色类型

图像深度与颜色类型 2011-09-07 17:06:44| 分类:图像处理| 标签:|举报|字号大中小订阅四.图像深度与颜色类型< XMLNAMESPACE PREFIX ="O" /> 图像深度是指位图中记录每个像素点所占的位数,它决定了彩色图像中可出现的最多颜色数,或者灰度图像中的最大灰度等级数。图像的颜色需用三维空间来表示,如RGB颜色空间,而颜色的空间表示法又不是惟一的,所以每个像素点的图像深度的分配还与图像所用的颜色空间有关。以最常用的RGB颜色空间为例,图像深度与颜色的映射关系主要有真彩色、伪彩色和直接色。 (一)真彩色(true-color):真彩色是指图像中的每个像素值都分成R、G、B三个基色分量,每个基色分量直接决定其基色的强度,这样产生的颜色称为真彩色。例如图像深度为24,用R:G:B=8:8:8来表示颜色,则R、G、B各用8位来表示各自基色分量的强度,每个基色分量的强度等级为28=256种。图像可容纳224=16M 种颜色。这样得到的颜色可以反映原图的真实颜色,故称真彩色。 (二)伪彩色(pseudo-color):伪彩色图像的每个像素值实际上是一个索引值或代码,该代码值作为颜色查找表(CLUT,Color Look-Up Table)中某一项的入口地址,根据该地址可查找出包含实际R、G、B的强度值。这种用查找映射的方法产生的颜色称为伪彩色。用这种方式产生的颜色本身是真的,不过它不一定反映原图的颜色。在VGA显示系统中,调色板就相当于颜色查找表。从16色标准VGA调色板的定义可以看出这种伪彩色的工作方式(表06-03-2)。调色板的代码对应RGB颜色的入口地址,颜色即调色板中RGB混合后对应的颜色。 表06-03-216色标准VGA调色板

相关文档
相关文档 最新文档