文档库 最新最全的文档下载
当前位置:文档库 › fddb2010 A Benchmark for Face Detection in Unconstrained Settings

fddb2010 A Benchmark for Face Detection in Unconstrained Settings

FDDB:A Benchmark for Face Detection in Unconstrained Settings

Vidit Jain

University of Massachusetts Amherst Amherst MA01003

vidit@https://www.wendangku.net/doc/65788741.html,

Erik Learned-Miller University of Massachusetts Amherst Amherst MA01003

elm@https://www.wendangku.net/doc/65788741.html,

Abstract

Despite the maturity of face detection research,it re-mains dif?cult to compare different algorithms for face de-tection.This is partly due to the lack of common evaluation schemes.Also,existing data sets for evaluating face detec-tion algorithms do not capture some aspects of face appear-ances that are manifested in real-world scenarios.In this work,we address both of these issues.We present a new data set of face images with more faces and more accurate annotations for face regions than in previous data sets.We also propose two rigorous and precise methods for evaluat-ing the performance of face detection algorithms.We report results of several standard algorithms on the new bench-mark.

1.Introduction

Face detection has been a core problem in computer vi-sion for more than a decade.Not only has there been sub-stantial progress in research,but many techniques for face detection have also made their way into commercial prod-ucts such as digital cameras.Despite this maturity,algo-rithms for face detection remain dif?cult to compare,and are somewhat brittle to the speci?c conditions under which they are applied.One dif?culty in comparing different face detection algorithms is the lack of enough detail to repro-duce the published results.Ideally,algorithms should be published with suf?cient detail to replicate the reported per-formance,or with an executable binary.However,in the ab-sence of these alternatives,it is important to establish better benchmarks of performance.

For a data set to be useful for evaluating face detection, the locations of all faces in these images need to be anno-tated.Sung et al.[24]built one such data set.Although this data set included images from a wide range of sources including scanned newspapers,all of the faces appearing in these images were upright and https://www.wendangku.net/doc/65788741.html,ter,Rowley et al.[18]created a similar data set with images that included faces with in-plane rotation.Schneiderman et al.[20,21]combined these two data sets with an additional collection of pro?le face images,which is commonly known as the MIT+CMU data set.Since this resulting collection con-tains only grayscale images,it is not applicable for evalu-ating face detection systems that employ color information as well[6].Some of the subsequent face detection data sets included color images,but they also had several shortcom-ings.For instance,the GENKI data set[25]includes color images that show a range of head poses(yaw,pitch±45?. roll±20?),but every image in this collection contains ex-actly one face.Similarly,the Kodak[13],UCD[23]and VT-AAST[1]data sets included images of faces with oc-clusions,but the small sizes of these data sets limit their utility in creating effective benchmarks for face detection algorithms.

One contribution of this work is the creation of a new data set that addresses the above-mentioned issues.Our data set includes

?2845images with a total of5171faces;

?a wide range of dif?culties including occlusions,dif?-cult poses,and low resolution and out-of-focus faces;?the speci?cation of face regions as elliptical regions;

and

?both grayscale and color images.

Another limitation of the existing benchmarks is the lack of a speci?cation for evaluating the output of an algorithm on a collection of images.In particular,as noted by Yang et al.[28],the reported performance measures depend on the de?nition of a“correct”detection result.The de?nition of correctness can be subtle.For example,how should we score an algorithm which provides two detections,each of which covers exactly50%of a face region in an image? Since the evaluation process varies across the published re-sults,a comparison of different algorithms remains dif?-cult.We address this issue by presenting a new evaluation scheme with the following components:

?An algorithm to?nd correspondences between a face detector’s output regions and the annotated face re-gions.

?Two separate rigorous and precise methods for evaluat-ing any algorithm’s performance on the data set.These two methods are intended for different applications.

?Source code for implementing these procedures.

We hope that our new data set,the proposed evaluation scheme,and the publicly available evaluation software will make it easier to precisely compare the performance of al-gorithms,which will further prompt researchers to work on more dif?cult versions of the face detection problem.

The report is organized as follows.In Section2,we discuss the challenges associated with comparing different face detection approaches.In Section3,we outline the con-struction of our data set.Next,in Section4,we describe a semi-automatic approach for removing duplicate images in a data set.In Section5,we present the details of the an-notation process,and?nally in Section6,we present our evaluation scheme.

https://www.wendangku.net/doc/65788741.html,paring face detection approaches

Based of the range of acceptable head poses,face detec-tion approaches can be categorized as

?single pose:the head is assumed to be in a single,up-right pose(frontal[24,18,26]or pro?le[21]);?rotation-invariant:in-plane rotations of the head are allowed[8,19];

?multi-view:out-of-plane rotations are binned into a pre-determined set of views[7,9,12];

?pose-invariant:no restrictions on the orientation of the head[16,22].

Moving forward from previous comparisons[28]of ap-proaches that focus on limited head orientations,we intend to evaluate different approaches for the most general,i.e., the pose-invariant,face detection task.

One challenge in comparing face detection systems is the lack of agreement on the desired output.In particular,while many approaches specify image regions–e.g.,rectangular regions[26]or image patches with arbitrary shape[17]–as hypotheses for face regions,others idetify the locations of various facial landmarks such as the eyes[27].Still others give an estimate of head pose[16]as well.

The scope of this work is limited to the evaluation of region-based output alone(although we intend to follow this report in the near future with a similar evaluation of3D pose estimation algorithms).To this end,we annotate each face region with an ellipse of arbitrary size,shape,and orienta-tion,showing the approximate face region for each face in the https://www.wendangku.net/doc/65788741.html,pared to the traditional rectangular annota-tion of faces,ellipses are generally a better?t to face regions and still maintain a simple parametric shape to describe the face.We discuss the details of the annotation process in Section5.Note that our data set is amenable to any addi-tional annotations including facial landmarks and head pose information,which would be bene?cial for benchmarking the next generation of face detection algorithms.

Next we discuss the origins and construction of our database.

3.FDDB:Face Detection Data set and Bench-

mark

Berg et al.[2]created a data set that contains images and associated captions extracted from news articles(see Figure1).The images in this collection display large varia-tion in pose,lighting,background and appearance.Some of these variations in face appearance are due to factors such as motion,occlusions,and facial expressions,which are char-acteristic of the unconstrained setting for image acquisition. The annotated faces in this data set were selected based on the output of an automatic face detector.An evaluation of face detection algorithms on the existing set of annotated faces would favor the approaches with outputs highly cor-related with this base detection algorithm.This property of the existing annotations makes them unsuitable for evaluat-ing different approaches for face detection.The richness of the images included in this collection,however,motivated us to build an index of all of the faces present in a subset of images from this collection.We believe that benchmarking face detection algorithms on this data set will provide good estimates of their expected performance in unconstrained settings.

3.1.Construction of the data set

Figure2.Outline of the labeling process.Semi-automatic ap-proaches are developed for both of these steps.

The images in Berg et al.’s data set were collected from the Yahoo!news website,1which accumulates news arti-cles from different sources.Although different news or-ganizations may cover a news event independently of each other,they often share photographs from common sources such as the Associated Press or Reuters.The published 1https://www.wendangku.net/doc/65788741.html,

Figure1.Example images from Berg et al.’s data set.

photographs,however,may not be digitally identical to

each other because they are often modi?ed(e.g.,cropped

or contrast-corrected)before publication.This process has

led to the presence of multiple copies of near-duplicate im-

ages in Berg et al.’s data set.Note that the presence of such

near-duplicate images is limited to a few data collection do-

mains such as news photos and those on the internet,and

is not a characteristic of most practical face detection ap-

plication scenarios.For example,it is uncommon to?nd

near-duplicate images in a personal photo collection.Thus,

an evaluation of face detection algorithms on a data set with

multiple copies of near-duplicate images may not generalize

well across domains.For this reason,we decided to identify

and remove as many near duplicates from our collection as

possible.We now present the details of the duplicate detec-

tion.

4.Near-duplicate detection

We selected a total of3527images(based on the chrono-

logical ordering)from the image-caption pairs of Berg et

al.[2].Examining pairs for possible duplicates in this col-

lection in the na¨?ve fashion would require approximately

12.5million annotations.An alternative arrangement would

be to display a set of images and manually identify groups

of images in this set,where images in a single group are

near-duplicates of each other.Due to the large number of

images in our collection,it is unclear how to display all the

images simultaneously to enable this manual identi?cation

of near-duplicates in this fashion.

Identi?cation of near-duplicate images has been stud-

ied for web search[3,4,5].However,in the web search

domain,scalability issues are often more important than

the detection of all near-duplicate images in the collec-

tion.Since we are interested in discovering all of the near-

duplicates in our data set,these approaches are not directly

applicable to our task.Zhang et al.[29]presented a more

computationally intensive approach based on stochastic at-

tribute relational graph(ARG)matching.Their

approach

Figure3.Near-duplicate images.(Positive)The?rst two images

differ from each other slightly in the resolution and the color and

intensity distributions,but the pose and expression of the faces are

identical,suggesting that they were derived from a single photo-

graph.(Negative)In the last two images,since the pose is differ-

ent,we do not consider them as near-identical images.

was shown to perform well on a related problem of detect-

ing near-identical frames in news video databases.These

ARGs represent the compositional parts and part-relations

of image scenes over several interest points detected in an

image.To compute a matching score between the ARGs

constructed for two different images,a generative model

for the graph transformation process is employed.This ap-

proach has been observed to achieve high recall of near-

duplicates,which makes it appropriate for detecting similar images in our data set.

As with most automatic approaches for duplicate detec-tion,this approach has a trade-off among false positives and false negatives.To restrict the number of false posi-tives,while maintaining a high true positive rate,we follow an iterative approach(outlined in Algorithm1)that alter-nates between clustering and manual inspection of the clus-ters.We cluster(steps3-5of Algorithm1)using a spectral graph-clustering approach[15].Then,we manually label each non-singleton cluster from the preceding step as either uniform,meaning that it contains images that are all near duplicates of each other,or non-uniform,meaning that at least one pair of images in the cluster are not near duplicates of each other.Finally,we replace each uniform cluster with one of the images belonging to it.

For the clustering step,in particular,we construct a fully-connected undirected graph G over all the images in the collection,where the ARG-matching scores are used as weights for the edges between each pair of images.Follow-ing the spectral graph-clustering approach[15],we compute the(unnormalized)Laplacian L G of graph G as

L G=diag(d)?W G,(1) where d is the set of degrees of all the nodes in G,and W G is the adjacency matrix of G.A projection of the graph G into a subspace spanned by the top few eigenvectors of L G provides an effective distance metric between all pairs of nodes(images,in our case).We perform mean-shift clus-tering with a narrow kernel in this projected space to obtain clusters of images.

Algorithm1Identifying near-duplicate images in a collec-tion

1:Construct a graph G={V,E},where V is the set of images,and E are all pairwise edges with weights as the ARG matching scores.

2:repeat

3:Compute the Laplacian of G,L G.

4:Use the top m eigenvectors of L G to project each image onto R m.

5:Cluster the projected data points using mean-shift clustering with a small-width kernel.

6:Manually label each cluster as either uniform or non-uniform.

7:Collapse the uniform clusters onto their centroids, and update G.

8:until none of the clusters can be collapsed.

Using this procedure,we were able to arrange the im-ages according to their mutual similarities.Annotators were asked to identify clusters in which all images were derived from the same source.Each of these clusters was replaced by a single exemplar from the cluster.In this process we manually discovered103uniform clusters over seven iter-ations,with682images that were near-duplicates.Addi-tional manual inspections were performed to?nd an addi-tional three cases of duplication.

Next we describe our annotation of face regions.

5.Annotating face regions

As a preliminary annotation,we drew bounding boxes around all the faces in2845images.From this set of anno-tations,all of the face regions with height or width less than 20pixels were excluded,resulting in a total of5171face annotations in our

collection.

Figure4.Challenges in face labeling.For some image regions, deciding whether or not it represents a“face”can be challeng-ing.Several factors such as low resolution(green,solid),occlu-sion(blue,dashed),and pose of the head(red,dotted)may make this determination ambiguous.

For several image regions,the decision of labeling them as face regions or non-face regions remains ambiguous due to factors such as low resolution,occlusion,and head-pose (e.g.,see Figure4).One possible approach for handling these ambiguities would be to compute a quantitative mea-sure of the“quality”of the face regions,and reject the im-age regions with the value below a pre-determined thresh-old.We were not able,however,to construct a satisfactory set of objective criteria for making this determination.For example,it is dif?cult to characterize the spatial resolution needed to characterize an image patch as a face.Similarly, for occluded face regions,while a threshold based on the fraction of the face pixels visible could be used as a crite-rion,it can be argued that some parts of the face(e.g.,eyes) are more informative than other parts.Also,note that for the current set of images,all of the regions with faces look-ing away from the camera have been labeled as non-face regions.In other words,the faces with the angle between the nose(speci?ed as radially outward perpendicular to the

head)and the ray from the camera to the person’s head is less than 90degrees.Estimating this angle precisely from an image is dif?cult.

Due to the lack of an objective criterion for including (or excluding)a face region,we resort to human judgments for this decision.Since a single human decision for determin-ing the label for some image regions is likely to be inconsis-tent,we used an approach based on the agreement statistics among multiple human annotators.All of these face regions were presented to different people through a web interface to obtain multiple independent decisions about the validity of these image regions as face regions.The annotators were instructed to reject the face regions for which neither of the two eyes (or glasses)were visible in the image.They were also requested to reject a face region if they were unable to (qualitatively)estimate its position,size,or orientation.The guidelines provided to the annotators are described in Appendix A .

5.1.Elliptical Face

Regions

Figure 5.Shape of a human head .The shape of a human head (left )can be approximated as the union of two ellipsoids (right ).We refer to these ellipses as vertical and horizontal ellipsoids.

As shown in Figure 5,2the shape of a human head can be approximated using two three-dimensional ellipsoids.We call these ellipsoids the vertical and horizontal ellip-soids.Since the horizontal ellipsoid provides little informa-tion about the features of the face region,we estimate a 2D ellipse for the orthographic projection of the hypothesized vertical ellipsoid in the image plane.We believe that the re-sulting representation of a face region as an ellipse provides a more accurate speci?cation than a bounding box without introducing any additional parameters.

We speci?ed each face region using an ellipse parame-terized by the location of its center,the lengths of its major and minor axes,and its orientation.Since a 2D orthographic projection of the human face is often not elliptical,?tting an ellipse around the face regions in an image is challeng-ing.To make consistent annotations for all the faces in our

2Reproduced

with permission from Dimitar Nikolov,Lead Animator,

Haemimont Games.

Figure 6.Guidelines for drawing ellipses around face regions .The extreme points of the major axis of the ellipse are respectively matched to the chin and the topmost point of the hypothetical ver-tical ellipsoid used for approximating the human head (see Fig-ure 5).Note that this ellipse does not include the ears.Also,for a non-frontal face,at least one of the lateral extremes (left or right)of this ellipse are matched to the boundary between the face re-gion and the corresponding (left or right)ear.

The details of our speci?cations are included in Appendix A .

Figure 7.Sample Annotations .The two red ellipses specify the location of the two faces present in this image.Note that for a non-frontal face (right ),the ellipse traces the boundary between the face and the visible ear.As a result,the elliptical region includes pixels that are not a part of the face.

data set,the human annotators are instructed to follow the guidelines shown in Figure 6.Figure 7shows some sample annotations.The next step is to produce a consistent and reasonable evaluation criterion.

6.Evaluation

To establish an evaluation criterion for detection algo-rithms,we?rst specify some assumptions we make about their outputs.We assume that

?A detection corresponds to a contiguous image region.?Any post-processing required to merge overlapping or similar detections has already been done.

?Each detection corresponds to exactly one entire face, no more,no less.In other words,a detection cannot be considered to detect two faces at once,and two de-tections cannot be used together to detect a single face.

We further argue that if an algorithm detects multiple disjoint parts of a face as separate detections,only one of them should contribute towards a positive detection and the remaining detections should be considered as false positives.

To represent the degree of match between a detection d i and an annotated region l j,we employ the commonly used ratio of intersected areas to joined areas:

S(d i,l j)=area(d i)∩area(l j)

area(d i)∪area(l j)

.(2)

To specify a more accurate annotation for the image re-gions corresponding to human faces than is obtained with the commonly used rectangular regions,we de?ne an ellip-tical region around the pixels corresponding to these faces. While this representation is not as accurate as a pixel-level annotation,it is a clear improvement over the rectangular annotations in existing data sets.

To facilitate manual labeling,we start with an automated guess about face locations.To estimate the elliptical bound-ary for a face region,we?rst apply a skin classi?er on the image pixels that uses their hue and saturation values.Next, the holes in the resulting face region are?lled using a?ood-?ll implementation in MATLAB.Finally,a moments-based ?t is performed on this region to obtain the parameters of the desired ellipse.The parameters of all of these ellipses are manually veri?ed and adjusted in the?nal stage.

6.1.Matching detections and annotations

A major remaining question is how to establish a cor-respondence between a set of detections and a set of an-notations.While for very good results on a given image, this problem is easy,it can be subtle and tricky for large numbers of false positives or multiple overlapping detec-tions(see Figure8for an example).Below,we formulate this problem of matching annotations and detections as?nd-ing a maximum weighted matching in a bipartite graph(as shown in Figure9

).Figure8.Matching detections and annotations.In this image,the ellipses specify the face annotations and the?ve rectangles denote a face detector’s output.Note that the second face from left has two detections overlapping with it.We require a valid matching to accept only one of these detections as the true match,and to consider the other detection as a false positive.Also,note that the third face from the left has no detection overlapping with it,so no detection should be matched with this face.The blue rectangles denote the true positives and yellow rectangles denote the false positives in the desired matching.

Figure9.Maximum weight matching in a bipartite graph.We make an injective(one-to-one)mapping from the set of detected image regions d i to the set of image regions l i annotated as face regions.The property of the resulting mapping is that it maximizes the cumulative similarity score for all the detected image regions.

Let L be the set of annotated face regions(or labels)and D be the set of detections.We construct a graph G with the set of nodes V=L∪D.Each node d i is connected to each label l j∈L with an edge weight w ij as the score computed in Equation2.For each detection d i∈D,we further introduce a node n i to correspond to the case when this detection d i has no matching face region in L.

A matching of detections to face regions in this graph corresponds to the selection of a set of edges M?E.In the desired matching of nodes,we want every detection to be matched to at most one labeled face region,and every labeled face region to be matched to at most one detection.

Note that the nodes n k have a degree equal to one,so they can be connected to at most one detection through M as well.Mathematically,the desired matching M maximizes the cumulative matching score while satisfying the follow-ing constraints:

?d∈D,?l∈{L∪N},d M?→l(3)

?l∈L, d,d ∈D,d M?→l∧d M?→l(4) The determination of the minimum weight matching in a weighted bipartite graph has an equivalent dual formulation as?nding the solution of the minimum weighted(vertex) cover problem on a related graph.This dual formulation is exploited by the Hungarian algorithm[11]to obtain the solution for the former problem.For a given image,we em-ploy this method to determine the matching detections and ground-truth annotations.The resulting similarity score is used for evaluating the performance of the detection algo-rithm on this image.

6.2.Evaluation metrics

Let d i and v i denote the i th detection and the correspond-ing matching node in the matching M obtained by the al-gorithm described in Section6.1,respectively.We propose the following two metrics for specifying the score y i for this detection:

?Discrete score(DS):y i=δS(d

i ,v i)>0.5

.

?Continuous score(CS):y i=S(d i,v i).

For both of these choice of scoring the detections,we recommend analyzing the Receiver Operating Character-istic(ROC)curves to compare the performance of differ-ent approaches on this data set.Although comparing the area under the ROC curve is equivalent to a non-parametric statistical hypothesis test(Wilcoxon signed-rank test),it is plausible that the cumulative performances of none of the compared approaches is better than the rest with statistical signi?cance.Furthermore,it is likely that for some range of performance,one approach could outperform another, whereas the relative comparison is reversed for a different range.For instance,one detection algorithm might be able to maintain a high level of precision for low recall values, but the precision drops sharply after a point.This trend may suggest that this detector would be useful for application domains such as biometrics-based access controls,which may require high precision values,but can tolerate low re-call levels.The same detector may not be useful in a setting (e.g.,surveillance)that would requires the retrieval of all the faces in an image or scene.Hence,the analysis of the entire range of ROC curves should be done for determining the strengths of different approaches.7.Experimental Setup

For an accurate and useful comparison of different ap-proaches,we recommend a distinction based on the training data used for estimating their parameters.In particular,we propose the following experiments:

EXP-1:10-fold cross-validation

For this experiment,a10-fold cross-validation is performed using a?xed partitioning of the data set into ten folds.3The cumulative performance is reported as the average curve of the ten ROC curves,each of which is obtained for a different fold as the validation set.

EXP-2:Unrestricted training

For this experiment,data outside the FDDB data set is permitted to be included in the training set.The above-mentioned ten folds of the data set are separately used as validation sets to obtain ten different ROC curves.The cu-mulative performance is reported as the average curve of these ten ROC curves.

8.Benchmark

For a proper use of our data set,we provide the imple-mentation(C++source code)of the algorithms for matching detections and annotations(Section6.1),and computing the resulting scores(Section6.2)to generate the performance curves at https://www.wendangku.net/doc/65788741.html,/fddb/ results.html.To use our software,the user needs to create a?le containing a list of the output of this detector. The format of this input?le is described in Appendix B.

In Figure10,we present the results for the following ap-proaches for the above-mentioned EXP-2experimental set-ting:

?Viola-Jones detector[26]–we used the OpenCV4im-plementation of this approach.We set the scale-factor and minimum number of neighbors parameters to1.2 and0,respectively.

?Mikolajczyk’s face detector[14]5–we set the param-eter for the minimum distance between eyes in a de-tected face to5pixels.

?Kienzle et al.’s[10]face detection library(fdlib6).

3The ten folds used in the proposed experiments are available at http: //https://www.wendangku.net/doc/65788741.html,/fddb/FDDB-folds.tgz 4https://www.wendangku.net/doc/65788741.html,/projects/opencvlibrary/ 5https://www.wendangku.net/doc/65788741.html,/~vgg/research/

affine/face detectors.html

6http://www.kyb.mpg.de/bs/people/kienzle/fdlib/ fdlib.htm

(a)ROC curves based on discrete score

(DS)

(b)ROC curves based on continuous score(CS) Figure10.FDDB baselines.These are the ROC curves for differ-ent face detection algorithms.Both of these scores(DS and CS) are described in Section6.2,whereas the implementation details of these algorithms are included in Section8.

As seen in Figure10,the number of false positives ob-tained from all of these face detection systems increases rapidly as the true positive rate increases.Note that the per-formances of all of these systems on the new benchmark are much worse than those on the previous benchmarks,where they obtain less than100false positives at a true positive rate of0.9.Also note that although our data set includes im-ages of frontal and non-frontal faces,the above experiments are limited to the approaches that were developed for frontal face detection.This limitation is due to the unavailability of a public implementation of multi-pose or pose-invariant face detection system.Nevertheless,the new benchmark in-cludes more challenging examples of face appearances than the previous benchmarks.We hope that our benchmark will further prompt researchers to explore new research direc-tions in face detection.

Acknowledgements

We thank Allen Hanson,Andras Ferencz,Jacqueline Feild,and Gary Huang for useful discussions and sug-gestions.This work was supported by the National Sci-ence Foundation under CAREER award IIS-0546666.Any opinions,?ndings and conclusions or recommendations ex-pressed in this material are the authors’and do not neces-sarily re?ect those of the sponsor.

References

[1] A.S.Abdallah,M.A.El-nasr,and A.L.Abbott.A new color

image database for benchmarking of automatic face detec-tion and human skin segmentation techniques,to appear.In International Conference on Machine Learning and Pattern Recognition,2007.1

[2]T.L.Berg,A.C.Berg,J.Edwards,M.Maire,R.White,

Y.W.Teh,E.Learned-Miller,and https://www.wendangku.net/doc/65788741.html,s and faces in the news.In IEEE Conference on Computer Vision and Pattern Recognition,volume2,pages848–854,2004.2, 3

[3]O.Chum,J.Philbin,M.Isard,and A.Zisserman.Scalable

near identical image and shot detection.In ACM Interna-tional Conference on Image and Video Retrieval,pages549–556,New York,NY,USA,2007.ACM.3

[4]O.Chum,J.Philbin,and A.Zisserman.Near duplicate image

detection:min-hash and tf-idf weighting.In British Machine Vision Conference,2008.3

[5]J.J.Foo,J.Zobel,R.Sinha,and S.M.M.Tahaghoghi.De-

tection of near-duplicate images for web search.In ACM In-ternational Conference on Image and Video Retrieval,pages 557–564,New York,NY,USA,2007.ACM.3

[6]R.-L.Hsu,M.Abdel-Mottaleb,and A.Jain.Face detection

in color images.IEEE Transactions on Pattern Analysis and Machine Intelligence,24(5):696–706,May2002.1

[7] C.Huang,H.Ai,Y.Li,and https://www.wendangku.net/doc/65788741.html,o.High-performance rota-

tion invariant multiview face detection.IEEE Transactions on Pattern Analysis and Machine Intelligence,29(4):671–686,2007.2

[8] B.H.Jeon,S.U.Lee,and K.M.Lee.Rotation invari-

ant face detection using a model-based clustering algorithm.

In IEEE International Conference on Multimedia and Expo, volume2,pages1149–1152vol.2,2000.2

[9]M.J.Jones and P.A.Viola.Fast multi-view face detection.

Technical Report TR2003-96,Mitsubishi Electric Research Laboratories,August2003.2

[10]W.Kienzle,G.H.Bak?r,M.O.Franz,and B.Sch¨o lkopf.

Face detection—ef?cient and rank de?cient.In L.K.Saul, Y.Weiss,and L.Bottou,editors,Advances in Neural In-formation Processing Systems,pages673–680,Cambridge, MA,2005.MIT Press.7

[11]H.W.Kuhn.The Hungarian method for the assignment prob-

lem.Naval Research Logistics Quarterly,2:83–97,1955.7 [12]S.Z.Li,L.Zhu,Z.Zhang,A.Blake,H.Zhang,and H.Shum.

Statistical learning of multi-view face detection.In European Conference on Computer Vision,pages67–81,London,UK, 2002.Springer-Verlag.2

[13] A.Loui,C.Judice,and S.Liu.An image database for bench-

marking of automatic face detection and recognition algo-rithms.In IEEE International Conference on Image Pro-cessing,volume1,pages146–150vol.1,Oct1998.1 [14]K.Mikolajczyk,C.Schmid,and A.Zisserman.Human de-

tection based on a probabilistic assembly of robust part de-tectors.In European Conference on Computer Vision,pages 69–82,2004.7

[15] A.Y.Ng,M.I.Jordan,and Y.Weiss.On spectral clustering:

Analysis and an algorithm.In Advances in Neural Informa-tion Processing Systems,pages849–856.MIT Press,2001.

4

[16]M.Osadchy,Y.LeCun,and https://www.wendangku.net/doc/65788741.html,ler.Synergistic face

detection and pose estimation with energy-based models.

Journal of Machine Learning Research,8:1197–1215,2007.

2

[17]J.Rihan,P.Kohli,and P.Torr.OBJCUT for face detection.In

Indian Conference on Computer Vision,Graphics and Image Processing,pages576–584,2006.2

[18]H.A.Rowley,S.Baluja,and T.Kanade.Neural network-

based face detection.IEEE Transactions on Pattern Analysis and Machine Intelligence,20(1):23–38,January1998.1,2 [19]H.A.Rowley,S.Baluja,and T.Kanade.Rotation invariant

neural network-based face detection.In IEEE Conference on Computer Vision and Pattern Recognition,page38,Wash-ington,DC,USA,1998.IEEE Computer Society.2 [20]H.Schneiderman and T.Kanade.Probabilistic modeling of

local appearance and spatial relationships for object recogni-tion.In IEEE Conference on Computer Vision and Pattern Recognition,page45,Washington,DC,USA,1998.IEEE Computer Society.1

[21]H.Schneiderman and T.Kanade.A statistical method for3d

object detection applied to faces and cars.In IEEE Confer-ence on Computer Vision and Pattern Recognition,volume1, pages746–751vol.1,2000.1,2

[22]M.Seshadrinathan and J.Ben-Arie.Pose invariant face de-

tection.In Video/Image Processing and Multimedia Commu-nications,2003.4th EURASIP Conference focused on,vol-ume1,pages405–410vol.1,July2003.2

[23]P.Sharma and R.Reilly.A colour face image database for

benchmarking of automatic face detection algorithms.In EURASIP Conference focused on Video/Image Processing and Multimedia Communications,volume1,pages423–428 vol.1,July2003.1

[24]K.-K.Sung and T.Poggio.Example-based learning for view-

based human face detection.IEEE Transactions on Pattern Analysis and Machine Intelligence,20(1):39–51,1998.1,2[25]https://www.wendangku.net/doc/65788741.html,.The MPLab GENKI

Database,GENKI-4K Subset.1

[26]P.A.Viola and M.J.Jones.Robust real-time face detec-

tion.International Journal of Computer Vision,57(2):137–154,May2004.2,7

[27]P.Wang and Q.Ji.Multi-view face and eye detection using

discriminant https://www.wendangku.net/doc/65788741.html,puter Vision and Image Under-standing,105(2):99–111,2007.2

[28]M.-H.Yang,D.J.Kriegman,and N.Ahuja.Detecting faces

in images:A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,24(1):34–58,2002.1,2

[29] D.-Q.Zhang and S.-F.Chang.Detecting image near-

duplicate by stochastic attributed relational graph matching with learning.In ACM International Conference on Multi-media,pages877–884,2004.3

A.Guidelines for annotating faces using el-

lipses

To ensure consistency across multiple human annotators, we developed a set of instructions(shown in Figure11). These instructions specify how to use facial landmarks to ?t an ellipse depending on the pose of the head.Figure12 presents an illustration of the resulting ellipses on line draw-ings of a human head.The annotators were futher instructed to follow a combination of these guidelines to?t ellipses to faces with complex head poses.

The illustrations shown in Figure12use faces with neu-tral expressions.A presence of some expressions such as laughter,often changes the shape of the face signi?cantly. Moreover,even bearing a neutral expression,some faces have shapes markedly different from the average face shape used in these illustrations.Such faces(e.g.,faces with square-jaw or double-chin)are dif?cult to approximate us-ing ellipses.To annotate faces with such complexities,the annotators were instructed to refer to the following guide-lines:

?Facial expression.Since the distance from the eyes to the chin in a face with facial expression is not necessar-ily equal to the distance between the eyes and the top of the head(an assumption made for the ideal head), the eyes do not need to be aligned to the minor axis for this face.

?Double-chin.For faces with a double chin,the aver-age of the two chins is considered as the lowest point of the face,and is matched to the bottom extreme of the major axis of the ellipse.

?Square jaw.For a face with a square jaw,the el-lipse traces the boundary between the face and the ears, while some part of the jaws may be excluded from the ellipse.

Figure11.Procedure for drawing ellipses around an average face region.The annotators were instructed to follow this?owchart to draw ellipses around the face regions.The annotation steps are a little different for different poses.Here,we present the steps for three canonical poses:frontal,pro?le and tilted back/front.The annotators were instructed to use a combination of these steps for labeling faces with derived,intermediate head poses.For instance,to label a head facing slightly towards its right and titled back,a combination of the steps corresponding to the pro?le and tilted-back poses are used.

Figure12.Illustrations of ellipse labeling on line drawings of human head.The black curves show the boundaries of a human head in frontal(left),pro?le(center),and tilted-back(right)poses.The red ellipses illustrate the desired annotations as per the procedure shown in Figure11.Note that these head shapes are approximations to an average human head,and the shape of an actual human head may deviate from this mean shape.The shape of a human head may also be affected by the presence of factors such as emotions.The guidelines on annotating face regions in?uenced by these factors are speci?ed in Appendix A.

?Hair.Ignore the hair and?t the ellipse around the hy-pothetical bald head.?Occlusion.Hypothesize the full face behind the oc-cluding object,and match all of the visible features.

Figure13.Illustrations of labeling for complex face appearances.These images show example annotations for human heads with shapes different from an average human head due to the presence of facial expression,double chin,square jaw,hair-do,and occlusion,respectively.

Figure13shows some example annotations for complex

face shapes.

B.Data formats

The original set of images can be down-

loaded as originalPics.tar.gz from

https://www.wendangku.net/doc/65788741.html,/faceDataset/.

Uncompressing this tar-?le organizes the images as

originalPics/year/month/day/big/*.jpg.

The ten folds described in the EXP-1experiments(Sec-

tion7)are available at http://vis-www.cs.umass.

edu/fddb/FDDB-folds.tgz.Uncompressing the

FDDB-folds.tgz?le creates a directory FDDB-folds,

which contains?les with names:FDDB-fold-xx.txt

and FDDB-fold-xx-ellipseList.txt,where xx

={01,02,...,10}represents the fold-index.

Each line in the FDDB-fold-xx.txt?le

speci?es a path to an image in the above-

mentioned data set.For instance,the entry

2002/07/19/big/img130corresponds to

originalPics/2002/07/19/big/img130.jpg.

The corresponding annotations are included in the?le

FDDB-fold-xx-ellipseList.txt.These anno-

tations are speci?ed according to the format shown in

Table1.Each of the annotation face regions are represented

as an elliptical region,which is denoted by a6-tuple

(r a,r b,θ,c x,c y,1),(5)

where r a and r b refer to the half-length of the major and mi-

nor axes;θis the angle of the major axis with the horizontal

axis;and c x and c y are the x and y coordinates of the center

of this ellipse.

The detection output should also follow the format de-

scribed in Table1.The representation of each of the de-

tected face regions,however,could either be denoted using

a rectangle or an ellipse.The exact speci?cation for these

two types of representations is as following:

?Rectangular regions

...

name of the i th image

number of faces in the i th image=m

face f1

face f2

...

face f m

...

Table1.Format used for the speci?cation of annotations and

detections.

Each face region is represented as a5-tuple

(x,y,w,h,s),(6)

where x,y are the coordinates of the top-left corner;

w and h are the width and height;and s∈{?∞,∞}

is the con?dence score associated with the detection of

this rectangular region.

?Elliptical regions

Each face region is represented as a6-tuple

(r a,r b,θ,c x,c y,s),(7)

where r a and r b refer to the half-length of the major

and minor axes;θis the angle of the major axis with

the horizontal axis;and c x and c y are the x and y coor-

dinates of the center;and s∈{?∞,∞}is the con?-

dence score associated with the detection of this ellip-

tical region.

Note that the order of images in the output?le

is expected to be the same as the order in the?le

annotatedList.txt.

相关文档