当前位置：文档库 › Microsoft COCO Common Objects in Context

Microsoft COCO Common Objects in Context

Microsoft COCO:Common Objects in Context

Tsung-Yi Lin Michael Maire Serge Belongie Lubomir Bourdev Ross Girshick James Hays

Pietro Perona Deva Ramanan

https://www.wendangku.net/doc/313863669.html,wrence Zitnick Piotr Doll′ar

Abstract —We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of

object recognition in the context of the broader question of scene understanding.This is achieved by gathering images of complex everyday scenes containing common objects in their natural context.Objects are labeled using per-instance segmentations to aid in precise object localization.Our dataset contains photos of 91objects types that would be easily recognizable by a 4year old.With a total of 2.5million labeled instances in 328k images,the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection,instance spotting and instance segmentation.We present a detailed statistical analysis of the dataset in comparison to P ASCAL,ImageNet,and SUN.Finally,we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.

1I NTRODUCTION

One of the primary goals of computer vision is the understanding of visual scenes.Scene understanding involves numerous tasks including recognizing what objects are present,localizing the objects in 2D and 3D,determining the objects’and scene’s attributes,charac-terizing relationships between objects and providing a semantic description of the scene.The current object clas-si?cation and detection datasets [1],[2],[3],[4]help us explore the ?rst challenges related to scene understand-ing.For instance the ImageNet dataset [1],which con-tains an unprecedented number of images,has recently enabled breakthroughs in both object classi?cation and detection research [5],[6],[7].The community has also created datasets containing object attributes [8],scene attributes [9],keypoints [10],and 3D scene information [11].This leads us to the obvious question:what datasets will best continue our advance towards our ultimate goal of scene understanding?

We introduce a new large-scale dataset that addresses three core research problems in scene understanding:de-tecting non-iconic views (or non-canonical perspectives [12])of objects,contextual reasoning between objects and the precise 2D localization of objects.For many categories of objects,there exists an iconic view.For example,when performing a web-based image search for the object category “bike,”the top-ranked retrieved examples appear in pro?le,unobstructed near the cen-ter of a neatly composed photo.We posit that current recognition systems perform fairly well on iconic views,but struggle to recognize objects otherwise –in the

?T.Y.Lin and S.Belongie are with Cornell NYC Tech and the Cornell Computer Science Department.

?M.Maire is with the Toyota Technological Institute at Chicago.?L.Bourdev and P .Doll′a r are with Facebook AI Research.The majority of this work was performed while P .Doll′a r was with Microsoft Research.?R.Girshick and C.L.Zitnick are with Microsoft Research,Redmond.?J.Hays is with Brown University.

?P .Perona is with the California Institute of Technology.? D.Ramanan is with the University of California at

Irvine.

Fig.1:While previous object recognition datasets have focused on (a)image classi?cation,(b)object bounding box localization or (c)semantic pixel-level segmentation,we focus on (d)segmenting individual object instances.We introduce a large,richly-annotated dataset comprised of images depicting complex everyday scenes of com-mon objects in their natural context.

background,partially occluded,amid clutter [13]–re-?ecting the composition of actual everyday scenes.We verify this experimentally;when evaluated on everyday scenes,models trained on our data perform better than those trained with prior datasets.A challenge is ?nding natural images that contain multiple objects.The identity of many objects can only be resolved using context,due to small size or ambiguous appearance in the image.To push research in contextual reasoning,images depicting scenes [3]rather than objects in isolation are necessary.Finally,we argue that detailed spatial understanding of object layout will be a core component of scene analysis.An object’s spatial location can be de?ned coarsely using a bounding box [2]or with a precise pixel-level segmen-tation [14],[15],[16].As we demonstrate,to measure either kind of localization performance it is essential for the dataset to have every instance of every object

a r X i v :1405.0312v 3 [c s .C V ] 21 F e

b 2015

category labeled and fully segmented.Our dataset is unique in its annotation of instance-level segmentation masks,Fig.1.

To create a large-scale dataset that accomplishes these three goals we employed a novel pipeline for gathering data with extensive use of Amazon Mechanical Turk. First and most importantly,we harvested a large set of images containing contextual relationships and non-iconic object views.We accomplished this using a sur-prisingly simple yet effective technique that queries for pairs of objects in conjunction with images retrieved via scene-based queries[17],[3].Next,each image was labeled as containing particular object categories using a hierarchical labeling approach[18].For each category found,the individual instances were labeled,veri?ed, and?nally segmented.Given the inherent ambiguity of labeling,each of these stages has numerous tradeoffs that we explored in detail.

The Microsoft Common Objects in COntext(MS COCO)dataset contains91common object categories with82of them having more than5,000labeled in-stances,Fig.6.In total the dataset has2,500,000labeled instances in328,000images.In contrast to the popular ImageNet dataset[1],COCO has fewer categories but more instances per category.This can aid in learning detailed object models capable of precise2D localization. The dataset is also signi?cantly larger in number of in-stances per category than the PASCAL VOC[2]and SUN [3]datasets.Additionally,a critical distinction between our dataset and others is the number of labeled instances per image which may aid in learning contextual informa-tion,Fig.5.MS COCO contains considerably more object instances per image(7.7)as compared to ImageNet(3.0) and PASCAL(2.3).In contrast,the SUN dataset,which contains signi?cant contextual information,has over17 objects and“stuff”per image but considerably fewer object instances overall.

An abridged version of this work appeared in[19].

2R ELATED W ORK

Throughout the history of computer vision research datasets have played a critical role.They not only pro-vide a means to train and evaluate algorithms,they drive research in new and more challenging directions. The creation of ground truth stereo and optical?ow datasets[20],[21]helped stimulate a?ood of interest in these areas.The early evolution of object recognition datasets[22],[23],[24]facilitated the direct comparison of hundreds of image recognition algorithms while si-multaneously pushing the?eld towards more complex problems.Recently,the ImageNet dataset[1]containing millions of images has enabled breakthroughs in both object classi?cation and detection research using a new class of deep learning algorithms[5],[6],[7]. Datasets related to object recognition can be roughly split into three groups:those that primarily address object classi?cation,object detection and semantic scene labeling.We address each in turn.

Image Classi?cation The task of object classi?cation requires binary labels indicating whether objects are present in an image;see Fig.1(a).Early datasets of this type comprised images containing a single object with blank backgrounds,such as the MNIST handwritten digits[25]or COIL household objects[26].Caltech101 [22]and Caltech256[23]marked the transition to more realistic object images retrieved from the internet while also increasing the number of object categories to101 and256,respectively.Popular datasets in the machine learning community due to the larger number of training examples,CIFAR-10and CIFAR-100[27]offered10and 100categories from a dataset of tiny32×32images[28]. While these datasets contained up to60,000images and hundreds of categories,they still only captured a small fraction of our visual world.

Recently,ImageNet[1]made a striking departure from the incremental increase in dataset sizes.They proposed the creation of a dataset containing22k categories with 500-1000images each.Unlike previous datasets contain-ing entry-level categories[29],such as“dog”or“chair,”like[28],ImageNet used the WordNet Hierarchy[30]to obtain both entry-level and?ne-grained[31]categories. Currently,the ImageNet dataset contains over14million labeled images and has enabled signi?cant advances in image classi?cation[5],[6],[7].

Object detection Detecting an object entails both stating that an object belonging to a speci?ed class is present,and localizing it in the image.The location of an object is typically represented by a bounding box, Fig.1(b).Early algorithms focused on face detection[32] using various ad hoc https://www.wendangku.net/doc/313863669.html,ter,more realistic and challenging face detection datasets were created[33]. Another popular challenge is the detection of pedestri-ans for which several datasets have been created[24], [4].The Caltech Pedestrian Dataset[4]contains350,000 labeled instances with bounding boxes.

For the detection of basic object categories,a multi-year effort from2005to2012was devoted to the creation and maintenance of a series of benchmark datasets that were widely adopted.The PASCAL VOC[2]datasets contained20object categories spread over11,000images. Over27,000object instance bounding boxes were la-beled,of which almost7,000had detailed segmentations. Recently,a detection challenge has been created from200 object categories using a subset of400,000images from ImageNet[34].An impressive350,000objects have been labeled using bounding boxes.

Since the detection of many objects such as sunglasses, cellphones or chairs is highly dependent on contextual information,it is important that detection datasets con-tain objects in their natural environments.In our dataset we strive to collect images rich in contextual information. The use of bounding boxes also limits the accuracy for which detection algorithms may be evaluated.We propose the use of fully segmented instances to enable more accurate detector evaluation.

Fig.2:Example of (a)iconic object images,(b)iconic scene images,and (c)non-iconic images.

Semantic scene labeling The task of labeling se-mantic objects in a scene requires that each pixel of an image be labeled as belonging to a category,such as sky,chair,?oor,street,etc.In contrast to the detection task,individual instances of objects do not need to be segmented,Fig.1(c).This enables the labeling of objects for which individual instances are hard to de?ne,such as grass,streets,or walls.Datasets exist for both indoor [11]and outdoor [35],[14]scenes.Some datasets also include depth information [11].Similar to semantic scene labeling,our goal is to measure the pixel-wise accuracy of object labels.However,we also aim to distinguish between individual instances of an object,which requires a solid understanding of each object’s extent.

A novel dataset that combines many of the properties of both object detection and semantic scene labeling datasets is the SUN dataset [3]for scene understanding.SUN contains 908scene categories from the WordNet dictionary [30]with segmented objects.The 3,819ob-ject categories span those common to object detection datasets (person,chair,car)and to semantic scene la-beling (wall,sky,?oor).Since the dataset was collected by ?nding images depicting various scene types,the number of instances per object category exhibits the long tail phenomenon.That is,a few categories have a large number of instances (wall:20,213,window:16,080,chair:7,971)while most have a relatively modest number of instances (boat:349,airplane:179,?oor lamp:276).In our dataset,we ensure that each object category has a signi?cant number of instances,Fig.5.

Other vision datasets Datasets have spurred the ad-vancement of numerous ?elds in computer vision.Some notable datasets include the Middlebury datasets for stereo vision [20],multi-view stereo [36]and optical ?ow [21].The Berkeley Segmentation Data Set (BSDS500)[37]has been used extensively to evaluate both segmentation and edge detection algorithms.Datasets have also been created to recognize both scene [9]and object attributes [8],[38].Indeed,numerous areas of vision have bene?ted from challenging datasets that helped catalyze progress.

3I MAGE C OLLECTION

We next describe how the object categories and candi-date images are selected.3.1

Common Object Categories

The selection of object categories is a non-trivial exercise.The categories must form a representative set of all categories,be relevant to practical applications and occur with high enough frequency to enable the collection of a large dataset.Other important decisions are whether to include both “thing”and “stuff”categories [39]and whether ?ne-grained [31],[1]and object-part categories should be included.“Thing”categories include objects for which individual instances may be easily labeled (person,chair,car)where “stuff”categories include materials and objects with no clear boundaries (sky,street,grass).Since we are primarily interested in pre-cise localization of object instances,we decided to only include “thing”categories and not “stuff.”However,since “stuff”categories can provide signi?cant contex-tual information,we believe the future labeling of “stuff”categories would be bene?cial.

The speci?city of object categories can vary signi?-cantly.For instance,a dog could be a member of the “mammal”,“dog”,or “German shepherd”categories.To enable the practical collection of a signi?cant number of instances per category,we chose to limit our dataset to entry-level categories,i.e.category labels that are commonly used by humans when describing objects (dog,chair,person).It is also possible that some object categories may be parts of other object categories.For in-stance,a face may be part of a person.We anticipate the inclusion of object-part categories (face,hands,wheels)would be bene?cial for many real-world applications.We used several sources to collect entry-level object categories of “things.”We ?rst compiled a list of cate-gories by combining categories from PASCAL VOC [2]and a subset of the 1200most frequently used words that denote visually identi?able objects [40].To further augment our set of candidate categories,several children ranging in ages from 4to 8were asked to name every

Fig.3:Our annotation pipeline is split into3primary tasks:(a)labeling the categories present in the image(§4.1), (b)locating and marking all instances of the labeled categories(§4.2),and(c)segmenting each object instance(§4.3).

object they see in indoor and outdoor environments. The?nal272candidates may be found in the appendix. Finally,the co-authors voted on a1to5scale for each category taking into account how commonly they oc-cur,their usefulness for practical applications,and their diversity relative to other categories.The?nal selec-tion of categories attempts to pick categories with high votes,while keeping the number of categories per super-category(animals,vehicles,furniture,etc.)balanced.Cat-egories for which obtaining a large number of instances (greater than5,000)was dif?cult were also removed. To ensure backwards compatibility all categories from PASCAL VOC[2]are also included.Our?nal list of91 proposed categories is in Fig.5(a).

3.2Non-iconic Image Collection

Given the list of object categories,our next goal was to collect a set of candidate images.We may roughly group images into three types,Fig.2:iconic-object images[41], iconic-scene images[3]and non-iconic images.Typical iconic-object images have a single large object in a canonical perspective centered in the image,Fig.2(a). Iconic-scene images are shot from canonical viewpoints and commonly lack people,Fig.2(b).Iconic images have the bene?t that they may be easily found by directly searching for speci?c categories using Google or Bing image search.While iconic images generally provide high quality object instances,they can lack important contextual information and non-canonical viewpoints. Our goal was to collect a dataset such that a majority of images are non-iconic,Fig.2(c).It has been shown that datasets containing more non-iconic images are better at generalizing[42].We collected non-iconic images using two strategies.First as popularized by PASCAL VOC [2],we collected images from Flickr which tends to have fewer iconic images.Flickr contains photos uploaded by amateur photographers with searchable metadata and keywords.Second,we did not search for object cate-gories in isolation.A search for“dog”will tend to return iconic images of large,centered dogs.However,if we searched for pairwise combinations of object categories, such as“dog+car”we found many more non-iconic images.Surprisingly,these images typically do not just contain the two categories speci?ed in the search,but nu-merous other categories as well.To further supplement our dataset we also searched for scene/object category pairs,see the appendix.We downloaded at most5 photos taken by a single photographer within a short time window.In the rare cases in which enough images could not be found,we searched for single categories and performed an explicit?ltering stage to remove iconic images.The result is a collection of328,000images with rich contextual relationships between objects as shown in Figs.2(c)and6.

4I MAGE A NNOTATION

We next describe how we annotated our image collec-tion.Due to our desire to label over2.5million object instances,the design of a cost ef?cient yet high quality annotation pipeline was critical.The annotation pipeline is outlined in Fig.3.For all crowdsourcing tasks we used workers on Amazon’s Mechanical Turk(AMT).Our user interfaces are described in detail in the appendix. Note that,since the original version of this work[19], we have taken a number of steps to further improve the quality of the annotations.In particular,we have increased the number of annotators for the category labeling and instance spotting stages to eight.We also added a stage to verify the instance segmentations.

4.1Category Labeling

The?rst task in annotating our dataset is determin-ing which object categories are present in each image, Fig.3(a).Since we have91categories and a large number of images,asking workers to answer91binary clas-si?cation questions per image would be prohibitively expensive.Instead,we used a hierarchical approach[18].

(a)(b)

Fig.4:Worker precision and recall for the category labeling task.(a)The union of multiple AMT workers(blue) has better recall than any expert(red).Ground truth was computed using majority vote of the experts.(b)Shows the number of workers(circle size)and average number of jobs per worker(circle color)for each precision/recall range.Most workers have high precision;such workers generally also complete more jobs.For this plot ground truth for each worker is the union of responses from all other AMT workers.See§4.4for details.

We group the object categories into11super-categories (see the appendix).For a given image,a worker was presented with each group of categories in turn and asked to indicate whether any instances exist for that super-category.This greatly reduces the time needed to classify the various categories.For example,a worker may easily determine no animals are present in the im-age without having to speci?cally look for cats,dogs,etc. If a worker determines instances from the super-category (animal)are present,for each subordinate category(dog, cat,etc.)present,the worker must drag the category’s icon onto the image over one instance of the category. The placement of these icons is critical for the following stage.We emphasize that only a single instance of each category needs to be annotated in this stage.To ensure high recall,8workers were asked to label each image.A category is considered present if any worker indicated the category;false positives are handled in subsequent stages.A detailed analysis of performance is presented in§4.4.This stage took～20k worker hours to complete.

4.2Instance Spotting

In the next stage all instances of the object categories in an image were labeled,Fig.3(b).In the previous stage each worker labeled one instance of a category,but multiple object instances may exist.Therefore,for each image,a worker was asked to place a cross on top of each instance of a speci?c category found in the previous stage.To boost recall,the location of the instance found by a worker in the previous stage was shown to the current worker.Such priming helped workers quickly ?nd an initial instance upon?rst seeing the image.The workers could also use a magnifying glass to?nd small instances.Each worker was asked to label at most10 instances of a given category per image.Each image was labeled by8workers for a total of～10k worker hours.4.3Instance Segmentation

Our?nal stage is the laborious task of segmenting each object instance,Fig.3(c).For this stage we modi?ed the excellent user interface developed by Bell et al.[16] for image segmentation.Our interface asks the worker to segment an object instance speci?ed by a worker in the previous stage.If other instances have already been segmented in the image,those segmentations are shown to the worker.A worker may also indicate there are no object instances of the given category in the image (implying a false positive label from the previous stage) or that all object instances are already segmented. Segmenting2,500,000object instances is an extremely time consuming task requiring over22worker hours per 1,000segmentations.To minimize cost we only had a single worker segment each instance.However,when ?rst completing the task,most workers produced only coarse instance outlines.As a consequence,we required all workers to complete a training task for each object category.The training task required workers to segment an object instance.Workers could not complete the task until their segmentation adequately matched the ground truth.The use of a training task vastly improved the quality of the workers(approximately1in3workers passed the training stage)and resulting segmentations. Example segmentations may be viewed in Fig.6. While the training task?ltered out most bad workers, we also performed an explicit veri?cation step on each segmented instance to ensure good quality.Multiple workers(3to5)were asked to judge each segmentation and indicate whether it matched the instance well or not. Segmentations of insuf?cient quality were discarded and the corresponding instances added back to the pool of unsegmented objects.Finally,some approved workers consistently produced poor segmentations;all work ob-tained from such workers was discarded.

For images containing10object instances or fewer of a given category,every instance was individually segmented(note that in some images up to15instances were segmented).Occasionally the number of instances is drastically higher;for example,consider a dense crowd of people or a truckload of bananas.In such cases,many instances of the same category may be tightly grouped together and distinguishing individual instances is dif?cult.After10-15instances of a category were segmented in an image,the remaining instances were marked as“crowds”using a single(possibly multi-part)segment.For the purpose of evaluation,areas marked as crowds will be ignored and not affect a detector’s score.Details are given in the appendix.

4.4Annotation Performance Analysis

We analyzed crowd worker quality on the category labeling task by comparing to dedicated expert workers, see Fig.4(a).We compared precision and recall of seven expert workers(co-authors of the paper)with the results obtained by taking the union of one to ten AMT workers. Ground truth was computed using majority vote of the experts.For this task recall is of primary importance as false positives could be removed in later stages. Fig.4(a)shows that the union of8AMT workers,the same number as was used to collect our labels,achieved greater recall than any of the expert workers.Note that worker recall saturates at around9-10AMT workers. Object category presence is often ambiguous.Indeed as Fig.4(a)indicates,even dedicated experts often dis-agree on object presence,e.g.due to inherent ambiguity in the image or disagreement about category de?nitions. For any unambiguous examples having a probability of over50%of being annotated,the probability all8 annotators missing such a case is at most.58≈.004. Additionally,by observing how recall increased as we added annotators,we estimate that in practice over99% of all object categories not later rejected as false positives are detected given8annotators.Note that a similar analysis may be done for instance spotting in which8 annotators were also used.

Finally,Fig.4(b)re-examines precision and recall of AMT workers on category labeling on a much larger set of images.The number of workers(circle size)and aver-age number of jobs per worker(circle color)is shown for each precision/recall range.Unlike in Fig.4(a),we used a leave-one-out evaluation procedure where a category was considered present if any of the remaining workers named the category.Therefore,overall worker precision is substantially higher.Workers who completed the most jobs also have the highest precision;all jobs from work-ers below the black line were rejected.

4.5Caption Annotation

We added?ve written caption descriptions to each image in MS COCO.A full description of the caption statistics and how they were gathered will be provided shortly in a separate publication.5D ATASET S TATISTICS

Next,we analyze the properties of the Microsoft Com-mon Objects in COntext(MS COCO)dataset in com-parison to several other popular datasets.These include ImageNet[1],PASCAL VOC2012[2],and SUN[3].Each of these datasets varies signi?cantly in size,list of labeled categories and types of images.ImageNet was created to capture a large number of object categories,many of which are?ne-grained.SUN focuses on labeling scene types and the objects that commonly occur in them. Finally,PASCAL VOC’s primary application is object detection in natural images.MS COCO is designed for the detection and segmentation of objects occurring in their natural context.

The number of instances per category for all91cate-gories is shown in Fig.5(a).A summary of the datasets showing the number of object categories and the number of instances per category is shown in Fig.5(d).While MS COCO has fewer categories than ImageNet and SUN,it has more instances per category which we hypothesize will be useful for learning complex models capable of precise localization.In comparison to PASCAL VOC,MS COCO has both more categories and instances.

An important property of our dataset is we strive to?nd non-iconic images containing objects in their natural context.The amount of contextual information present in an image can be estimated by examining the average number of object categories and instances per image,Fig.5(b,c).For ImageNet we plot the object detection validation set,since the training data only has

a single object labeled.On average our dataset contains

3.5categories and7.7instances per image.In comparison ImageNet and PASCAL VOC both have less than2cat-egories and3instances per image on average.Another interesting observation is only10%of the images in MS COCO have only one category per image,in comparison, over60%of images contain a single object category in ImageNet and PASCAL VOC.As expected,the SUN dataset has the most contextual information since it is scene-based and uses an unrestricted set of categories. Finally,we analyze the average size of objects in the datasets.Generally smaller objects are harder to recog-nize and require more contextual reasoning to recognize. As shown in Fig.5(e),the average sizes of objects is smaller for both MS COCO and SUN.

6D ATASET S PLITS

To accommodate a faster release schedule,we split the MS COCO dataset into two roughly equal parts. The?rst half of the dataset was released in2014,the second half will be released in2015.The2014release contains82,783training,40,504validation,and40,775 testing images(approximately1

train,1

val,and1

test). There are nearly270k segmented people and a total of 886k segmented object instances in the2014train+val data alone.The cumulative2015release will contain a total of165,482train,81,208val,and81,434test images.

Fig.5:(a)Number of annotated instances per category for MS COCO and PASCAL VOC.(b,c)Number of annotated categories and annotated instances,respectively,per image for MS COCO,ImageNet Detection,PASCAL VOC and SUN(average number of categories and instances are shown in parentheses).(d)Number of categories vs.the number of instances per category for a number of popular object recognition datasets.(e)The distribution of instance sizes for the MS COCO,ImageNet Detection,PASCAL VOC and SUN datasets.

We took care to minimize the chance of near-duplicate images existing across splits by explicitly removing near duplicates(detected with[43])and grouping images by photographer and date taken.

Following established protocol,annotations for train and validation data will be released,but not for test. We are currently?nalizing the evaluation server for automatic evaluation on the test set.A full discussion of evaluation metrics will be added once the evaluation server is complete.

Note that we have limited the2014release to a subset of80categories.We did not collect segmentations for the following11categories:hat,shoe,eyeglasses(too many instances),mirror,window,door,street sign(ambiguous and dif?cult to label),plate,desk(due to confusion with bowl and dining table,respectively)and blender,hair brush(too few instances).We may add segmentations for some of these categories in the cumulative2015release.

Fig.6:Samples of annotated images in the MS COCO dataset.

7A LGORITHMIC A NALYSIS

Bounding-box detection For the following experiments we take a subset of55,000images from our dataset1and obtain tight-?tting bounding boxes from the annotated segmentation masks.We evaluate models tested on both MS COCO and PASCAL,see Table1.We evaluate two different models.DPMv5-P:the latest implementation 1.These preliminary experiments were performed before our?nal split of the dataset intro train,val,and test.Baselines on the actual test

set will be added once the evaluation server is complete.of[44](release5[45])trained on PASCAL VOC2012. DPMv5-C:the same implementation trained on COCO (5000positive and10000negative images).We use the default parameter settings for training COCO models. If we compare the average performance of DPMv5-P on PASCAL VOC and MS COCO,we?nd that average performance on MS COCO drops by nearly a factor of 2,suggesting that MS COCO does include more dif?cult (non-iconic)images of objects that are partially occluded, amid clutter,etc.We notice a similar drop in performance

plane bike bird boat bottle bus car cat chair cow table dog horse moto person plant sheep sofa train tv avg. DPMv5-P45.649.011.011.627.250.543.123.617.223.210.720.542.544.541.38.729.018.740.034.529.6 DPMv5-C43.750.111.8 2.421.460.135.616.011.424.8 5.39.444.541.035.8 6.328.313.338.836.226.8 DPMv5-P35.117.9 3.7 2.3745.418.38.6 6.317 4.8 5.835.325.417.5 4.114.59.631.727.916.9 DPMv5-C36.920.2 5.7 3.5 6.650.316.112.8 4.519.09.6 4.038.229.915.9 6.713.810.439.237.919.1

TABLE1:Top:Detection performance evaluated on PASCAL VOC2012.DPMv5-P is the performance reported by Girshick et al.in VOC release5.DPMv5-C uses the same implementation,but is trained with MS COCO. Bottom:Performance evaluated on MS COCO for DPM models trained with PASCAL VOC2012(DPMv5-P)and MS COCO(DPMv5-C).For DPMv5-C we used5000positive and10000negative training examples.While MS COCO is considerably more challenging than PASCAL,use of more training data coupled with more sophisticated approaches[5],[6],[7]should improve performance substantially.

for the model trained on MS COCO(DPMv5-C).

The effect on detection performance of training on PASCAL VOC or MS COCO may be analyzed by com-paring DPMv5-P and DPMv5-C.They use the same implementation with different sources of training data. Table1shows DPMv5-C still outperforms DPMv5-P in 6out of20categories when testing on PASCAL VOC.In some categories(e.g.,dog,cat,people),models trained on MS COCO perform worse,while on others(e.g.,bus, tv,horse),models trained on our data are better. Consistent with past observations[46],we?nd that including dif?cult(non-iconic)images during training may not always help.Such examples may act as noise and pollute the learned model if the model is not rich enough to capture such appearance variability.Our dataset allows for the exploration of such issues. Torralba and Efros[42]proposed a metric to measure cross-dataset generalization which computes the‘per-formance drop’for models that train on one dataset and test on another.The performance difference of the DPMv5-P models across the two datasets is12.7AP while the DPMv5-C models only have7.7AP difference. Moreover,overall performance is much lower on MS COCO.These observations support two hypotheses:1) MS COCO is signi?cantly more dif?cult than PASCAL VOC and2)models trained on MS COCO can generalize better to easier datasets such as PASCAL VOC given more training data.To gain insight into the differences between the datasets,see the appendix for visualizations of person and chair examples from the two datasets. Generating segmentations from detections We now describe a simple method for generating object bounding boxes and segmentation masks,following prior work that produces segmentations from object detections[47], [48],[49],[50].We learn aspect-speci?c pixel-level seg-mentation masks for different categories.These are read-ily learned by averaging together segmentation masks from aligned training instances.We learn different masks corresponding to the different mixtures in our DPM detector.Sample masks are visualized in Fig.7. Detection evaluated by segmentation Segmentation is a challenging task even assuming a detector reports correct results as it requires?ne localization of object

part Fig.7:We visualize our mixture-speci?c shape masks. We paste thresholded shape masks on each candidate detection to generate candidate

segments.

Fig.8:Evaluating instance detections with segmentation masks versus bounding boxes.Bounding boxes are a particularly crude approximation for articulated objects; in this case,the majority of the pixels in the(blue)tight-?tting bounding-box do not lie on the object.Our(green) instance-level segmentation masks allows for a more accurate measure of object detection and localization. boundaries.To decouple segmentation evaluation from detection correctness,we benchmark segmentation qual-ity using only correct detections.Speci?cally,given that the detector reports a correct bounding box,how well does the predicted segmentation of that object match the ground truth segmentation?As criterion for correct detection,we impose the standard requirement that intersection over union between predicted and ground truth boxes is at least0.5.We then measure the inter-section over union of the predicted and ground truth segmentation masks,see Fig.8.To establish a baseline for our dataset,we project learned DPM part masks onto the image to create segmentation masks.Fig.9shows results of this segmentation baseline for the DPM learned on the 20PASCAL categories and tested on our dataset.

P r e d i c t e d

G r o u n d t r u t h

Fig.9:A predicted segmentation might not recover boxes overlap well (left).Sampling from the person category illustrates that predicting segmentations from top-down projection of DPM part masks is dif?cult even for correct detections (center).Average segmentation overlap measured on MS COCO for the 20PASCAL VOC categories demonstrates the dif?culty of the problem (right).

8D ISCUSSION We introduced a new dataset for detecting and seg-menting objects found in everyday life in their natural environments.Utilizing over 70,000worker hours,a vast collection of object instances was gathered,anno-tated and organized to drive the advancement of object detection and segmentation algorithms.Emphasis was placed on ?nding non-iconic images of objects in natural environments and varied viewpoints.Dataset statistics indicate the images contain rich contextual information with many objects present per image.

There are several promising directions for future anno-tations on our dataset.We currently only label “things”,but labeling “stuff”may also provide signi?cant con-textual information that may be useful for detection.Many object detection algorithms bene?t from additional annotations,such as the amount an instance is occluded [4]or the location of keypoints on the object [10].Finally,our dataset could provide a good benchmark for other types of labels,including scene types [3],attributes [9],[8]and full sentence written descriptions [51].We are actively exploring adding various such annotations.To download and learn more about MS COCO please see the project website 2.MS COCO will evolve and grow over time;up to date information is available online.Acknowledgments Funding for all crowd worker tasks was provided by Microsoft.P .P .and D.R.were supported by ONR MURI Grant N00014-10-1-0933.We would like to thank all members of the community who provided valuable feedback throughout the process of de?ning and collecting the dataset.

2.https://www.wendangku.net/doc/313863669.html,/

A PPENDIX O VERVIEW

In the appendix,we provide detailed descriptions of the AMT user interfaces and the full list of 272candidate categories (from which our ?nal 91were selected)and 40scene categories (used for scene-object queries).

A PPENDIX I:U SER I NTERFACES

We describe and visualize our user interfaces for collect-ing non-iconic images,category labeling,instance spot-ting,instance segmentation,segmentation veri?cation and ?nally crowd labeling.

Non-iconic Image Collection Flickr provides a rich image collection associated with text captions.However,captions might be inaccurate and images may be iconic.To construct a high-quality set of non-iconic images,we ?rst collected candidate images by searching for pairs of object categories,or pairs of object and scene categories.We then created an AMT ?ltering task that allowed users to remove invalid or iconic images from a grid of 128candidates,Fig.10.We found the choice of instructions to be crucial,and so provided users with examples of iconic and non-iconic images.Some categories rarely co-occurred with others.In such cases,we collected candidates using only the object category as the search term,but apply a similar ?ltering step,Fig.10(b).

Category Labeling Fig.12(a)shows our interface for category labeling.We designed the labeling task to encourage workers to annotate all categories present in the image.Workers annotate categories by dragging and dropping icons from the bottom category panel onto a corresponding object instance.Only a single instance of each object category needs to be annotated in the image.We group icons by the super-categories from Fig.11,allowing workers to quickly skip categories that are unlikely to be present.

Fig.10:User interfaces for non-iconic image collection.

(a)Interface for selecting non-iconic images containing pairs of objects.(b)Interface for selecting non-iconic images for categories that rarely co-occurred with others. Instance Spotting Fig.12(b)depicts our interface for labeling all instances of a given category.The interface is initialized with a blinking icon specifying a single instance obtained from the previous category-labeling stage.Workers are then asked to spot and click on up to 10total instances of the given category,placing a single cross anywhere within the region of each instance.In order to spot small objects,we found it crucial to include a“magnifying glass”feature that doubles the resolution of a worker’s currently selected region.

Instance Segmentation Fig.12(c)shows our user interface for instance segmentation.We modi?ed source code from the OpenSurfaces project[16],which de?nes a single AMT task for segmenting multiple regions of a homogenous material in real-scenes.In our case,we de-?ne a single task for segmenting a single object instance labeled from the previous annotation stage.To aid the segmentation process,we added a visualization of the object category icon to remind workers of the category to be segmented.Crucially,we also added zoom-in functionality to allow for ef?cient annotation of small ob-jects and curved boundaries.In the previous annotation stage,to ensure high coverage of all object instances,we used multiple workers to label all instances per image. We would like to segment all such object instances, but instance annotations across different workers may refer to different or redundant instances.To resolve this correspondence ambiguity,we sequentially post AMT segmentation tasks,ignoring instance annotations that are already covered by an existing segmentation mask. Segmentation Veri?cation Fig.12(d)shows our user interface for segmentation veri?cation.Due to the time consuming nature of the previous task,each object in-stance is segmented only once.The purpose of the veri-?cation stage is therefore to ensure that each segmented instance from the previous stage is of suf?ciently high quality.Workers are shown a grid of64segmentations and asked to select poor quality segmentations.Four of the64segmentation are known to be bad;a worker must identify3of the4known bad segmentations to complete the task.Each segmentation is initially shown to3annotators.If any of the annotators indicates the seg-mentation is bad,it is shown to2additional workers.At this point,any segmentation that doesn’t receive at least 4of5favorable votes is discarded and the corresponding instance added back to the pool of unsegmented objects. Examples of borderline cases that either passed(4/5 votes)or were rejected(3/5votes)are shown in Fig.15. Crowd Labeling Fig.12(e)shows our user interface for crowd labeling.As discussed,for images containing ten object instances or fewer of a given category,every object instance was individually segmented.In some images,however,the number of instances of a given category is much higher.In such cases crowd labeling provided a more ef?cient method for annotation.Rather than requiring workers to draw exact polygonal masks around each object instance,we allow workers to“paint”all pixels belonging to the category in question.Crowd labeling is similar to semantic segmentation as object in-stance are not individually identi?ed.We emphasize that crowd labeling is only necessary for images containing more than ten object instances of a given category.

A PPENDIX II:O BJECT&S CENE C ATEGORIES Our dataset contains91object categories(the2014re-lease contains segmentation masks for80of these cate-gories).We began with a list of frequent object categories taken from WordNet,LabelMe,SUN and other sources as well as categories derived from a free recall experi-ment with young children.The authors then voted on the resulting272categories with the aim of sampling a diverse and computationally challenging set of cate-gories;see§3for details.The list in Table2enumerates those272categories in descending order of votes.As discussed,the?nal selection of91categories attempts to pick categories with high votes,while keeping the num-ber of categories per super-category(animals,vehicles, furniture,etc.)balanced.

As discussed in§3,in addition to using object-object queries to gather non-iconic images,object-scene queries also proved effective.For this task we selected a subset of 40scene categories from the SUN dataset that frequently co-occurred with object categories of interest.Table3 enumerates the40scene categories(evenly split between indoor and outdoor scenes).

R EFERENCES

[1]J.Deng,W.Dong,R.Socher,L.-J.Li,K.Li,and L.Fei-Fei,“Im-

ageNet:A Large-Scale Hierarchical Image Database,”in CVPR, 2009.

[2]M.Everingham,L.Van Gool,C.K.I.Williams,J.Winn,and A.Zis-

serman,“The PASCAL visual object classes(VOC)challenge,”

IJCV,vol.88,no.2,pp.303–338,Jun.2010.

[3]J.Xiao,J.Hays,K.A.Ehinger,A.Oliva,and A.Torralba,“SUN

database:Large-scale scene recognition from abbey to zoo,”in CVPR,2010.

[4]P.Doll′a r,C.Wojek,B.Schiele,and P.Perona,“Pedestrian detec-

tion:An evaluation of the state of the art,”P AMI,vol.34,2012.

[5] A.Krizhevsky,I.Sutskever,and G.Hinton,“ImageNet classi?ca-

tion with deep convolutional neural networks,”in NIPS,2012.

[6]R.Girshick,J.Donahue,T.Darrell,and J.Malik,“Rich feature

hierarchies for accurate object detection and semantic segmenta-tion,”in CVPR,2014.

[7]P.Sermanet,D.Eigen,S.Zhang,M.Mathieu,R.Fergus,and

Y.LeCun,“OverFeat:Integrated recognition,localization and detection using convolutional networks,”in ICLR,April2014. [8] A.Farhadi,I.Endres,D.Hoiem,and D.Forsyth,“Describing

objects by their attributes,”in CVPR,2009.

[9]G.Patterson and J.Hays,“SUN attribute database:Discovering,

annotating,and recognizing scene attributes,”in CVPR,2012. [10]L.Bourdev and J.Malik,“Poselets:Body part detectors trained

using3D human pose annotations,”in ICCV,2009.

[11]N.Silberman,D.Hoiem,P.Kohli,and R.Fergus,“Indoor seg-

mentation and support inference from RGBD images,”in ECCV, 2012.

[12]S.Palmer,E.Rosch,and P.Chase,“Canonical perspective and the

perception of objects,”Attention and performance IX,vol.1,p.4, 1981.

[13].Hoiem,D,Y.Chodpathumwan,and Q.Dai,“Diagnosing error

in object detectors,”in ECCV,2012.

[14]G.Brostow,J.Fauqueur,and R.Cipolla,“Semantic object classes

in video:A high-de?nition ground truth database,”PRL,vol.30, no.2,pp.88–97,2009.

[15] B.Russell,A.Torralba,K.Murphy,and W.Freeman,“LabelMe:a

database and web-based tool for image annotation,”IJCV,vol.77, no.1-3,pp.157–173,2008.

[16]S.Bell,P.Upchurch,N.Snavely,and K.Bala,“OpenSurfaces:

A richly annotated catalog of surface appearance,”SIGGRAPH,

vol.32,no.4,2013.

[17]V.Ordonez,G.Kulkarni,and T.Berg,“Im2text:Describing images

using1million captioned photographs.”in NIPS,2011.

[18]J.Deng,O.Russakovsky,J.Krause,M.Bernstein,A.Berg,and

L.Fei-Fei,“Scalable multi-label annotation,”in CHI,2014. [19]T.Lin,M.Maire,S.Belongie,J.Hays,P.Perona,D.Ramanan,

P.Doll′a r,and C.L.Zitnick,“Microsoft COCO:Common objects in context,”in ECCV,2014.

[20] D.Scharstein and R.Szeliski,“A taxonomy and evaluation of

dense two-frame stereo correspondence algorithms,”IJCV,vol.47, no.1-3,pp.7–42,2002.

[21]S.Baker,D.Scharstein,J.Lewis,S.Roth,M.Black,and R.Szeliski,

“A database and evaluation methodology for optical?ow,”IJCV, vol.92,no.1,pp.1–31,2011.

[22]L.Fei-Fei,R.Fergus,and P.Perona,“Learning generative visual

models from few training examples:An incremental bayesian approach tested on101object categories,”in CVPR Workshop of Generative Model Based Vision(WGMBV),2004.

[23]G.Grif?n,A.Holub,and P.Perona,“Caltech-256object category

dataset,”California Institute of Technology,Tech.Rep.7694,2007.

[24]N.Dalal and B.Triggs,“Histograms of oriented gradients for

human detection,”in CVPR,2005.

[25]Y.Lecun and C.Cortes,“The MNIST database of handwritten

digits,”1998.[Online].Available:https://www.wendangku.net/doc/313863669.html,/exdb/ mnist/

[26]S.A.Nene,S.K.Nayar,and H.Murase,“Columbia object image

library(coil-20),”Columbia Universty,Tech.Rep.,1996.

[27] A.Krizhevsky and G.Hinton,“Learning multiple layers of fea-

tures from tiny images,”Computer Science Department,University of Toronto,Tech.Rep,2009.

[28] A.Torralba,R.Fergus,and W.T.Freeman,“80million tiny

images:A large data set for nonparametric object and scene recognition,”P AMI,vol.30,no.11,pp.1958–1970,2008.[29]V.Ordonez,J.Deng,Y.Choi,A.Berg,and T.Berg,“From large

scale image categorization to entry-level categories,”in ICCV, 2013.

[30] C.Fellbaum,WordNet:An electronic lexical database.Blackwell

Books,1998.

[31]P.Welinder,S.Branson,T.Mita,C.Wah,F.Schroff,S.Belongie,

and P.Perona,“Caltech-UCSD Birds200,”Caltech,Tech.Rep.

CNS-TR-201,2010.

[32] E.Hjelm?a s and B.Low,“Face detection:A survey,”CVIU,vol.83,

no.3,pp.236–274,2001.

[33]G.B.Huang,M.Ramesh,T.Berg,and E.Learned-Miller,“Labeled

faces in the wild,”University of Massachusetts,Amherst,Tech.

Rep.07-49,October2007.

[34]O.Russakovsky,J.Deng,Z.Huang,A.Berg,and L.Fei-Fei,

“Detecting avocados to zucchinis:what have we done,and where are we going?”in ICCV,2013.

[35]J.Shotton,J.Winn,C.Rother,and A.Criminisi,“TextonBoost

for image understanding:Multi-class object recognition and seg-mentation by jointly modeling texture,layout,and context,”IJCV, vol.81,no.1,pp.2–23,2009.

[36]S.M.Seitz,B.Curless,J.Diebel,D.Scharstein,and R.Szeliski,“A

comparison and evaluation of multi-view stereo reconstruction algorithms,”in CVPR,2006.

[37]P.Arbelaez,M.Maire,C.Fowlkes,and J.Malik,“Contour detec-

tion and hierarchical image segmentation,”P AMI,vol.33,no.5, pp.898–916,2011.

[38] https://www.wendangku.net/doc/313863669.html,mpert,H.Nickisch,and S.Harmeling,“Learning to detect

unseen object classes by between-class attribute transfer,”in CVPR,2009.

[39]G.Heitz and D.Koller,“Learning spatial context:Using stuff to

?nd things,”in ECCV,2008.

[40]R.Sitton,Spelling Sourcebook.Egger Publishing,1996.

[41]T.Berg and A.Berg,“Finding iconic images,”in CVPR,2009.

[42] A.Torralba and A.Efros,“Unbiased look at dataset bias,”in

CVPR,2011.

[43]M.Douze,H.J′e gou,H.Sandhawalia,L.Amsaleg,and C.Schmid,

“Evaluation of gist descriptors for web-scale image search,”in CIVR,2009.

[44]P.Felzenszwalb,R.Girshick,D.McAllester,and D.Ramanan,

“Object detection with discriminatively trained part-based mod-els,”P AMI,vol.32,no.9,pp.1627–1645,2010.

[45]R.Girshick,P.Felzenszwalb,and D.McAllester,“Discriminatively

trained deformable part models,release5,”P AMI,2012.

[46]X.Zhu,C.Vondrick,D.Ramanan,and C.Fowlkes,“Do we need

more training data or better models for object detection?”in BMVC,2012.

[47]T.Brox,L.Bourdev,S.Maji,and J.Malik,“Object segmentation

by alignment of poselet activations to image contours,”in CVPR, 2011.

[48]Y.Yang,S.Hallman,D.Ramanan,and C.Fowlkes,“Layered

object models for image segmentation,”P AMI,vol.34,no.9,pp.

1731–1743,2012.

[49] D.Ramanan,“Using segmentation to verify object hypotheses,”

in CVPR,2007.

[50]Q.Dai and D.Hoiem,“Learning to localize detected objects,”in

CVPR,2012.

[51] C.Rashtchian,P.Young,M.Hodosh,and J.Hockenmaier,“Col-

lecting image annotations using Amazon’s Mechanical Turk,”in NAACL Workshop,2010.

Fig.11:Icons of91categories in the MS COCO dataset grouped by11super-categories.We use these icons in our annotation pipeline to help workers quickly reference the indicated object category.

Fig.12:User interfaces for collecting instance annotations,see text for details.

(a)PASCAL VOC.(b)MS COCO.

Fig.13:Random person instances from PASCAL VOC and MS COCO.At most one instance is sampled per image.

person bicycle car

motorcycle

bird cat dog horse sheep bottle chair couch potted plant

tv cow airplane hat ?license plate bed laptop fridge microwave sink oven toaster bus train mirror ?dining table elephant banana bread toilet book boat plate ?

cell phone mouse remote clock face hand apple keyboard backpack steering wheel wine glass

chicken zebra shoe ?eye mouth scissors truck traf?c light eyeglasses ?

cup blender ?hair drier wheel street sign ?umbrella door ??re hydrant bowl teapot fork knife spoon bear headlights window ?desk ?computer refrigerator pizza squirrel duck frisbee guitar nose teddy bear tie stop sign surfboard sandwich pen/pencil kite orange toothbrush

printer pans head sports ball broccoli suitcase carrot chandelier parking meter

?sh handbag hot dog stapler basketball hoop

donut vase baseball bat

baseball glove

giraffe jacket skis snowboard table lamp egg door handle power outlet hair

tiger table coffee table skateboard helicopter tomato tree bunny pillow tennis racket

cake feet bench chopping board

washer lion monkey hair brush ?light switch arms legs house cheese goat magazine key picture frame cupcake fan (ceil/?oor)

frogs rabbit owl scarf ears

home phone

pig strawberries

pumpkin van kangaroo rhinoceros sailboat deer playing cards

towel hyppo can dollar bill doll soup meat window muf?ns tire necklace tablet corn ladder pineapple candle desktop carpet cookie toy cars bracelet bat balloon gloves milk pants wheelchair building bacon box platypus pancake cabinet whale dryer torso lizard shirt shorts pasta grapes shark swan ?ngers towel

side table gate beans ?ip ?ops moon road/street fountain fax machine bat hot air balloon cereal

seahorse rocket cabinets basketball telephone movie (disc)football goose long sleeve shirt short sleeve shirt raft rooster copier radio fences goal net toys engine soccer ball ?eld goal posts

socks tennis net seats elbows aardvark dinosaur unicycle honey legos ?y roof baseball mat ipad iphone hoop hen

back

table cloth soccer nets turkey pajamas underpants gold?sh robot crusher

animal crackers

basketball court

horn

?re?y

armpits

nectar

super hero costume

jetpack

robots

TABLE 2:Candidate category list (272).Bold :selected categories (91).Bold ?:omitted categories in 2014release (11).

(a)PASCAL VOC.(b)MS COCO.

Fig.14:Random chair instances from PASCAL VOC and MS COCO.At most one instance is sampled per image.

Fig.15:Examples of borderline segmentations that passed (top)or were rejected (bottom)in the veri?cation stage.

library church of?ce restaurant kitchen living room

bathroom factory campus bedroom child’s room

dining room

auditorium shop home hotel classroom cafeteria hospital room

food court street park beach river

village valley market harbor yard parking lot lighthouse

railway

playground

swimming pool

forest

gas station

garden

farm

mountain

plaza

TABLE 3:Scene category list.

宏电DTUGPRS远传实例

GPRS信号监测装置调试硬件接线图一、ADAM-4117参数设置 1.将ADAM-4117模块右侧开关拨至INIT（配置状态）； 2.打开调试软件https://www.wendangku.net/doc/313863669.html,Utility，选择相应的串口号，右击选择Search，出现对话框（图a），点击Start，直至搜索到模块（）后点击Cancel；图a 3.点击4117（*），配置并保存相应参数，如图（b）。点击右上角的“Applychange”保存设置到模块的芯片里。图b

二、宏电H7710GPRSDTU模块参数设置 1．断电，打开调试软件sscom32.exe，选择相应串口号，设置相应参数如图c（修改参数的波特率一直为57600）；图c 2.按住空格键，通电，直至出现图d现象；图d 3.按照帮助指示输入“H”，出现主菜单（图e），输入“C”，再输入密码“1234”，回车，进入DTU配置（C）菜单（图f）；图e图f图g 4．输入“3”，进入“数据服务中心设置（DSC）”菜单（图g），输入“1”，配置“DSCIP地址”，；输入“2”，配置“DSC域名”；（注：若已配置固态IP地址，则无需配置域名，即配置域名时按回车键即可；若使用动态IP地址，则将IP地址设为0.0.0.0，域名改为相应的域名地址）输入“3”，配置“DSC通讯端口”，端口号自己定义，但必须与读取时端口号设置一致；

输入“4”，配置“DNSIP地址”，，一般设为主站的DNSIP地址；输入“r”，保存设置输入Y或者N。 5.输入“4”进入“用户串口设置”菜单（图h）图h图i 输入“1”，配置波特率（图i），一般采用9600bps，故输入“4”，再输入“r” 返回菜单；输入“2”，配置数据位，一般设为8；输入“3”，配置校验位，一般设为无校验位，故输入“1”，再输入“r”返回菜单；输入“4”，配置停止位，一般设为1；输入“r”，返回主菜单。 6.输入“5”，进入“特殊选项设置”菜单（图j）图j 输入“6”，配置“通讯协议选择(透明0/DDP协议1)” ，一般选择透明，故输入“0”；输入“7”，配置“网络连接方式(UDP0/TCP1)” ，一般采用TCP连接方式，故输入“1”；输入“r”，再输入“r”返回主菜单。

GPRS DTU模块说明书

目录第一章产品简介 (4) 1.1概述 (4) 1.2产品特点 (4) 1.3技术参数 (4) 1.4产品外型 (5) 1.5接口定义 (6) 第二章安装调试 (7) 2.1概述 (7) 2.2 开箱 (7) 2.3 安装方法 (7) 2.3.1安装说明 (7) 2.3.2安装SIM卡和天线 (8) 2.4指示灯状态说明 (9) 第三章参数设置 (10) 3.1 连接设置 (10) 3.2 设参软件安装及模块的参数设置 (10) 3.2.1设参软件的安装 (10) 3.2.2 模块的参数设置 (11) 3.2.3多个DATA-6123模块的参数设置 (14) 3.2.3.1多个模块连续设置 (14) 3.2.3.2多个模块在不同时间进行设置 (14) 3.2.4其它参数解释 (15) 第四章测试 (17) 第五章远程维护 (19) 5.1 远程升级 (19) 5.1.1概述 (19) 5.1.2 通过“程序下载软件”远程升级 (19) 5.2 远程读、设参 (22) 5.2.1概述 (22) 5.2.2软件设置 (22) 5.2.3远程读参 (22) 5.2.4远程设参 (24) 第六章故障分析与排除 (25)

版权声明：本使用说明书包含的所有内容均受版权法的保护，未经唐山平升电子技术开发有限公司的书面授权，任何组织和个人不得以任何形式或手段对整个说明书和部分内容进行复制和转载，不得以任何形式传播。商标声明：为唐山平升电子技术开发有限公司的注册商标。本文档提及的其他所有商标或注册商标，由拥有该商标的机构所有。注意：由于产品版本升级或其他原因，本文档内容会不定期进行更新。除非另有约定，本文档仅作为使用指导，本文档中的所有陈述、信息和建议不构成任何明示或暗示的担保。

DTU基础知识

1、什么是DTU？答：DTU是数据终端设备（Data Terminal unit)的简写。广义地讲，在进行通信时，传输数据的链路两端负责发送数据信息的模块单元都称之为DTU，在它的作用下对所传信息进行格式转换和数据整理校验。狭义地讲，DTU一般特指无线通讯中的下位GPRS/CDMA发射终端设备。前者是一种模块，而后者则是设备。后面的介绍如果不加特别说明，都是指后者（下位发射终端设备）。 2、与DTU有关的名词解释？ 1）什么是上位机和下位机？答：上位机和下位机是一个相对的概念，在通信中，有主从关系的一对设备，负责提交信息的终端设备是下位机，负责处理提交信息的设备是上位机。DTU设备大多数情况下就属于下位机，而负责处理DTU回传信息的数据中心就是上位机，典型的应用方式是多台DTU对应一个数据中心。但是上位机和下位机不一定都是一一对应，他们可以是一个下位机对应一个或多个上位机，也可以是一个上位机对应一个或多个下位机，具体的对应方式要视应用而定。 2）什么是数据中心？答：数据中心是指对下位机回传的信息进行采集、汇总和处理，并对下位机进行一定控制和管理的上位机系统，他包括完整的计算机硬件设备和特定的完整软件功能。 3）什么是全透明传输？答：全透明传输就是对IP包不作任何操作和改变，只是简单的发送过程。通俗的讲就是，全透明传输时，数据在发送前和发送后的格式、内容都不发生变化，远端数据中心接收的数据与现场采集的数据是一样的，数据在传输过程中不发生变化，如果IP包有任何的操作和改变，就不是完全意义上的全透明了。 3、DTU与无线Mmodem有什么区别？答：首先要明确的一点是：DTU与无线Modem是不一样的。在软件设计上，DTU封装了协议栈内容并且具有嵌入式操作系统，硬件上可看作是嵌入式PC加无线接入部分的接合。GPRS/CDMA Modem是接入GPRS/CDMA分组网络的一个物理通道，它本身不具有操作系统，必须依附于计算机（在功能上类似于有线Modem），在计算机操作系统之上才能进行PPP拨号连接，通常是与PC结合使用。从某种角度来说，DTU是嵌入式PC与GPRS/CDMA Modem的结合，但它不能单独当作Modem 使用，它完成数据协议转换和透明传输这样一个功能。在使用上，前端采集设备或智能数据设备，通常提供标准的数据接口，如RS232、RS485/422等，这些前端用户设备适合采用DTU，借助于GPRS/CDMA网络平台，实现与监控中心端的数据通信。GPRS/CDMA Modem需要接入计算机，实现组网连接，比如说中心站的PC主机可以通过GPRS/CDMA Modem接入GPRS/CDMA网络，从而构成某中心站服务器，实现网络监控，数据通信等。而DTU很难作为中心服务站来使用。 4、DTU测试前需要做什么准备工作？答：在对DTU进行测试前，需要有以下条件：

GPRS模块参数设置说明

GPRS模块参数设置说明 1. 超级终端通讯端口设置新建一个超级终端，Windows系统会要求选择有关串行口的设置，选择连接的串行端口号(如COM2)，参照下图所示配置串行端口参数：超级终端通信参数设置如下：速率： 57600baud 数据位： 8bit 奇偶校验：无停止位： 1bit 数据流控制：无 2. 进入参数设置模式启动PC的超级终端软件，按住PC键盘的空格键(SPACE)，打开配置终端盒电源。必须在设备加电之前按住PC键盘的空格键(SPACE)不放，然后加电，直至PC机的超级终端屏幕上显示下图所示界面

在主菜单(Main Menu)状态下键入C进入参数配置，系统可能会要求输入密码，请输入正确密码：密码：1234 输入正确密码后键回车，进入如下所示界面。在此状态下键入相应数字，即可进入对应参数配置项。一般情况需要配置的参数项有： 1 移动业务中心参数配置 3 数据业务中心参数配置 4 串口通讯参数配置其他参数项建议采用默认值！ 3. 各参数项设置 3.1 移动业务中心参数配置(MSC) 在DTU参数配置(Configurations)菜单状态下1，进入移动业务中心参数配置 (MSC)：

在此状态下键入相应数字，即可进行参数设置。例：按提示信息输入名称,按回车键确认。然后按“R”键返回上层菜单，按“Y”键确认保存。如不需更改此项参数，按“Esc”键退出此项。其它参数设置方法同此。如采用公网，此参数项可采用默认。如采用专网，根据需要设置的参数是： 2 用户名称 3 用户密码 4 设置接入点名称（默认为“CMNET”） 3.2 数据业务中心参数配置(DSC) 在DTU参数配置(Configurations)菜单状态下 3，进入移动业务中心参数

宏电DTU参数配置

宏电DTU的参数配置和与DEMO的连接测试一、DTU的参数配置与下载 1、RS232接口的DTU接线原则：232母头连接线的RXD连接DTU的RS（B-）；TXD连接DTU的TX（A+）。GND接地一定要跟电源的GND相连接。否则通讯不上。 2、RS485接口的DTU接线原则：485+接A+；485-接B-。 3、配置DTU参数的时候一定要把IP设定为公网IP，另外端口映射一定要正确。 4、具体的DTU参数配置如下图所示：图1.1 在图1中的DSC连接类型中选择UDP连接方式。IP地址设定为公网IP。

在图2中的本地IP也不需要进行设置。图1.3

在图8中的本地端口和DNS地址不需要配置选择系统默认的就行。图1.5

图1.6 图1.7

图1.8 配置注意问题： 1、配置过程中一定要注意IP地址是设定公网IP，并且端口号得映射也一定要与本机电脑在路由器上的映射端口号相一致，否则就会出现连接不上的现象。 2、在配置好参数向DTU下载的过程中一定要先把DTU断电，点击“连接”之后再给DTU 上电，才能连接成功。连接成功之后，点击“全选”然后再点击“设置”就可以把配置好的参数下载到DTU中去。二、DTU与DSC_DEMO的连接设置 DTU演示系统与DTU的连接过程中，也要对其进行参数设置，否则无法连接成功。具体的需要设置的参数为：“设置”，如下图所示

图2.1 图2.2 在图2.2中“指定IP”前面一定不能选，服务类型选择UDP，启动类型：自动启动。图2.3 按照上面的步骤配置完成后，先点击“启动服务”然后再开启DTU电源，连接成功后显示如下的界面。

DTU使用说明书

DTU使用说明书此说明书适用于下列型号产品：型号产品类别 F2116GPRS IP MODEM F2216CDMA IP MODEM F2416WCDMA IP MODEM F2616EVDO IP MODEM F2716TDD-LTE IP MODEM F2816FDD-LTE IP MODEM F2A16LTE IP MODEM 文档修订记录

目录第一章产品简介 (3) 1.1产品概述 (3) 1.2产品特点 (3) 1.3工作原理框图 (4) 1.4产品规格 (4) 第二章安装 (8) 2.1概述 (8) 2.2开箱 (8) 2.3安装与电缆连接 (8) 2.4电源说明 (11) 2.5指示灯说明 (11) 第三章参数配置 (12) 3.1配置连接 (12) 3.2参数配置方式介绍 (12) 3.3参数配置详细说明 (12) 3.3.1配置工具运行界面 (13) 3.3.2设备上电 (14) 3.3.4中心服务 (20) 3.3.5串口 (21) 3.3.6无线拔号 (22) 3.3.7全局参数 (23) 3.3.8设备管理 (25) 3.3.9其它功能项 (26) 第四章数据传输试验环境测试 (26) 4.1试验环境网络结构 (26) 4.2测试步骤 (27)

第一章产品简介 1.1产品概述 F2X16系列IP MODEM是一种物联网无线数据终端，利用公用蜂窝网络为用户提供无线长距离数据传输功能。该产品采用高性能的工业级32位通信处理器和工业级无线模块，以嵌入式实时操作系统为软件支撑平台，同时提供RS232和RS485（或RS422）接口，可直接连接串口设备，实现数据透明传输功能；低功耗设计，最低功耗小于5mA@12VDC；提供1路ADC,2路I/O，可实现数字量输入输出、脉冲输出、模拟量输入、脉冲计数等功能。该产品已广泛应用于物联网产业链中的M2M行业，如智能电网、智能交通、智能家居、金融、移动POS终端、供应链自动化、工业自动化、智能建筑、消防、公共安全、环境保护、气象、数字化医疗、遥感勘测、军事、空间探索、农业、林业、水务、煤矿、石化等领域。IP MODEM典型应用如图1-1所示：图1-1IP MODEM应用拓扑图 1.2产品特点工业级应用设计 ◆ 采用高性能工业级无线模块 ◆ 采用高性能工业级32位通信处理器 ◆ 低功耗设计，支持多级休眠和唤醒模式，最大限度降低功耗 ◆ 采用金属外壳，保护等级IP30。金属外壳和系统安全隔离，特别适合于工控现场的应用 ◆ 宽电源输入（DC5~36V）稳定可靠 ◆ WDT看门狗设计，保证系统稳定 ◆ 采用完备的防掉线机制，保证数据终端永远在线 ◆ RS232/RS485/RS422接口内置15KV ESD保护 ◆ SIM/UIM卡接口内置15KV ESD保护 ◆ 电源接口内置反相保护和过压保护 ◆天线接口防雷保护（可选）标准易用

DTU介绍

DTU数据传输单元介绍数据传输单元DTU (Data Transfer unit)，是专门用于将串口数据转换为IP数据或将IP数据转换为串口数据通过无线通信网络进行传送的无线终端设备[1-2]。 DTU DTU产品系列(3张) DTU硬件组成 DTU 硬件组成部分主要包括CPU控制模块、无线通讯模块以及电源模块 DTU 优点：组网迅速灵活，建设周期短、成本低；网络覆盖范围广；安全保密性能好；链路支持永远在线、按流量计费、用户使用成本低； CPU：工业级高性能ARM9嵌入式处理器，带内存管理MMU,200MPS, 16KB Dcache,16KB Icache FLASH：8MB，可扩充到32MB SDRAM：64MB，可扩充到256MB

接口： UART: CM 3160P: 1个RS232串口串口速率：110bps ~ 230400bps 数据位支持：8位或7位奇偶校验位：无或奇数校验或偶数校验停止位：1位或2位流控：无或RTS/CTS CM 3160EP: 1个RS485接口（根据需要，可硬件跳线支持RS232/422/TTL）串口速率：110bps ~ 230400bps 数据位支持：8位或7位奇偶校验位：无或奇数校验或偶数校验停止位：1位或2位流控：无或RTS/CTS 控制口： RS-232, 115200 bps, 8 data bits,1 stop bit, no parity (8N1) 指示灯：具有电源、通信及在线指示灯天线接口：标准SMA阴头天线接口，特性阻抗50欧 UIM卡接口：3V/1.8V标准的推杆式用户卡接口电源接口：标准的3芯火车头电源插座语音接口：标准的耳机麦克风接口 3.5 供电：外接电源：DC 9V 500mA 宽电压供电：DC 5-32V 通信电流：350mA 待机电流：35mA 3.6 尺寸：产品外形尺寸：92x62x22 mm（不包括天线及固定件）产品包装尺寸：298x226x60mm 3.7 重量： 0.41KG 3.8 其他参数：工作环境温度：-25~+65ºC 储存温度：-40~+85ºC 相对湿度：95%(无凝结) DTU软件组成 4.1 TCP/UDP透明数据传输；支持多种工作模式。心跳包技术 4.2 智能防掉线，支持在线检测，在线维持，掉线自动重拨，确保设备永远在线 4.3 支持RSA，RC4加密算法

【整理】无线通讯模块DTU电科院各项认证实验报告

无线模块电科院各项认证实验报告实验一：外观认证产品名字，厂家，合格证书等认证。实验二：通讯认证通信终端向服务器每秒发送一定字节的数据，服务器能否全部收到，有没有乱码等。实验三：功耗测试电源12V输入，待机模式下1.2W，收发数据模式下1.6W。实验四：电源浪涌实验实验依据：GB/T 17626.5中规定的严酷等级为4级的浪涌骚扰。实验方法： 1.差模±2000V，每60S一次，打5次， 2.共模±4000V，每60S一次，打5次。 1.整改前，电源12V进来接通过电流3A保险丝，并联一个300W功率，40V耐压的TVS管，再接一个过3A电流的防反接二极管，最后输入到DC-DC电源芯片。如下图所示。 2.整改方案1，电源12V输入端先并联一个耐压为36V的压敏电阻470KD20JX，再串联一个电流3A的保险丝，再并联一个功率1500W，耐压36V的TVS管，再接过3A电流的防反接二极管，最后输入到DC-DC 电源芯片。如下图所示。 3.整改方案2，电源12V输入端直接并联一个5000W，26耐压的TVS 管，然后串上过3A电流的防反接二极管，最后输入到DC-DC电源芯

片。如下图所示。浪涌实验结果如下：整改前：共模±4000V没问题，差模±2000V导致TVS管烧坏击穿，lmr14030芯片击穿，模块不能正常工作。方案1：共模±4000V没问题，差模±2000V时，压敏电阻，TVS 管，lmr1430都没有问题，但是BL1117-3.3V的LDO芯片击穿，导致单片机烧坏，模块不能正常工作。方案2：共模±4000V和差模±2000V都实验两台设备，全程实验没有出现烧坏元器件的现象，模块通讯也正常，没有出现程序死机复位等现象。实验五：电快速瞬变脉冲群实验实验依据：GB/T 17626.4规定的严酷等级为4级的电快速瞬变干扰。实验标准：差模±4000V，频率5KHz，持续时间90S 实验结果：导致实验设备掉线，无法正常通讯。整改方案：电源12V输入端直接并联330KD20JX耐压为26V的压敏电阻，和一个5000W，26耐压的TVS管SMDJ26CA，电源正负极分别与大地之间接一个101 8KV的高压瓷片电容，然后串上过3A电流的防反接二极管，最后输入到DC-DC电源芯片。如下图所示。整改后实验结果：整改后没有出现死机，掉线等现象，实验合格。实验六：高频干扰实验实验依据：GB/T 17626.10规定的严酷等级为4级的高频干扰。

DTU、GPRS DTU模块与力控组态软件通信设置说明

DTU(GPRS DTU模块)与力控组态软件通信设置说明 V1.1

修订文档历史记录日期版本说明作者2015-8-19 1.1 力控驱动7.0文档整理发布

目录一、简介 (4) 二、设置GPRS模块（DTU）相关参数： (4) 1、安装设参软件 (4) 2、GPRS模块参数设置 (4) 3、同时设置多个模块 (6) 三、力控组态软件的配置方法 (7) 1、安装驱动 (7) 2、新建I/O设备 (7) 3、数据库组态 (11)

一、简介设备类型：GPRS DTU 支持网络类型：GPRS 主要功能：通过GPRS实现远端设备与组态软件服务器的通信。GPRS网络覆盖率高、部署成本低，可与各式串口终端设备连接 ,具有内建Watchdog功能，系统永不当机,使用者不需编写程式即可设定通讯格式、传送方式及传送内容。支持协议：TCP/IP。二、设置GPRS模块（DTU）相关参数：使用DATA-6123 GPRS 模块(DTU)作为无线通信设备,实现远端串口设备与服务器端的力控组态软件之间的无线通信，首先必须对模块进行设置，设置方法如下: 1、安装设参软件请点击随机光盘内的设参软件安装包，将设参软件安装到PC主机。双击图标进行安装 2、GPRS模块参数设置将GPRS模块通过串口与电脑主机相连，接通模块电源后打开设参软件，点击“GPRS参数”，在弹出窗口中，“指定方式”选择串口，点击确定，如下所示：

进入参数设置页面，点击“读取”按钮，即可读取当前连接的GPRS模块参数设置信息，如下图所示：读取设备参数后，即可进行参数设置，首先请设置“基本参数”：本机号码：即SIM卡号；传输型号：B型；系统识别码：为一个任意6位数字，需保持默认设置，不可更改；监听端口：必须与组态软件配置服务器的端口号一致；

DTU的30种问题的解决办法

DTU的30种问题的解决办法一、DTU不能进入配置状态：方法如下： 1、检查DTU的波特率和参数配置软件的波特率是否一致(才茂DTU出厂波特率为57600，最大115200)； 2、检查串口线是否连接正常；二、DTU可以进入配置状态,但AT指令不能写入? 方法如下： 1、检查模块与电路板是否接处良好； 2、模块波特率是否被改成太大了；三、如何更改波特率? 方法如下：在对波特率进行修改的时候,(透明模式下)按“s”键进入设置模式,输入AT+IPR?先查询其波特率, 含协议下为38400 , 但后台串口波特率仍为57600 。这时输入你想要更改的波特率,AT+IPR=1200,返回,在超级终端下断开连接,将后台波特率更改为1200,在进入超级终端,直接敲回车,输入AT 指令,这时就可以通过1200 波特率进行通信了。(非透明模式下)直接输入AT+IPR=1200,返回,相同的在超级终端下断开连接,将后台波特率更改为1200,就可以正常使用了。

四、DTU的默认设置是什么? 方法如下： 1、8位数据位/无奇偶校验/1位停止位、波特率57600bps 2、数据传输速率:57600bps 五、如何检查DTU有没有登陆GPRS/CDMA网络? 方法如下：检测方法为:在AT 命令态下,输入AT+CGATT=1,返回OK,再输入AT+CGATT? 如返回的是1则表示进入GPRS 网络,如返回的是0 则表示还未登入GPRS 网络。CDMA暂无指令进行判断! 六、DTU不能连上中心? 方法如下： 1、先确定才茂通信中心DEMO的设置是否正常(可用模拟DTU来连中心),然后,在查看DTU的配置是否正确,即中心地址,端口等是否设置正确； 2、DTU信号是否正常(AT指令查看),SIM卡是否欠费；七、DTU和中心DEMO的设置都正常的情况下,DTU还是连不上中心? 方法如下：首先,我们先确定运行中心DEMO的PC机是否进行端口映射(如运行在DMZ服务器上则不需要端口映射)；再次,确定所映射的端口没有被占用;有没有被防火墙挡住。八、中心DEMO与DTU通讯正常,但和下位的PLC通讯不上? 方法如下： 1、DTU的参数是否与下位的PLC对应,如波特率,奇偶校验,数据位,停止位。 2、PLC的通讯协议是还正确(可以先与PC机进行通讯)。 3、DTU与下位PLC 串口485电阻不匹配(匹配电阻一般在10几欧到200欧之间。 4、DTU与PLC通讯接口是否连接正确九、中心DEMO与DTU正常通讯,但经常掉线? 方法如下： 1、DTU的心跳包时间是否设置正确,心跳包时间设置长度与当地的网络有关,一般设置在当地网络允许时间内无数据传输不会掉线之内； 2、DTU的ID号,SIM卡号是不是设置成唯一的,如果重复则会出现不断的掉线重连；十、每次发送数据产生冗余数据量大小是多少? 方法如下： DTU只有在和中心端建立连接的时候会产生冗余数据,就是把自身的信息发给中心,以后只要在这链路未断的情况下,发送的数据是不会产生；冗余数据。冗余数据为45个字节,具体格式如下:8位HEXID(4位)+11位电话号码+ 0 + 登入IP地址=41个；

DTU功能简介

DTU功能简介 DTU概述 DTU是数据终端设备（Data Terminal unit)的简写。广义地讲，在进行通信时，传输数据的链路两端负责发送数据信息的模块单元都称之为DTU，在它的作用下对所传信息进行格式转换和数据整理校验。狭义地讲，DTU一般特指无线通讯中的下位GPRS/CDMA发射终端设备。前者是一种模块，而后者则是设备。后面的介绍如果不加特别说明，都是指后者（下位发射终端设备） DTU的核心功能 1）内部集成TCP/IP协议栈 DTU内部封装了PPP拨号协议以及TCP/IP协议栈并且具有嵌入式操作系统，从硬件上，它可看作是嵌入式PC与无线的结合；它具备无线拨号上网以及TCP/IP数据通信的功能。 2）提供串口数据双向转换功能 DTU提供了串行通信接口，包括CM510-21X,CM510-23X,CM510-22X等都属于常用的串行通信方式，而DTU在设计上大都将串口数据设计成“透明转换”的方式，也就是说DTU可以将串口上的原始数据转换成TCP/IP数据包进行传送，而不需要改变原有的数据通信内容。因此，DTU可以和各种使用串口通信的用户设备进行连接，而且不需要对用户设备作改动。 3）支持自动心跳，保持永久在线

无线通信网络的优点之一就是支持无线终端设备永久在线，因此典型的无线DTU在设计上都支持永久在线功能，这就要求DTU包含了上电自动拨号、采用心跳包保持永久在线（当长时间没有数据通信时，移动网关将断开DTU 与中心的连接，心跳包就是DTU与数据中心在连接被断开之前发送一个小数据包，以保持连接不被断开）、支持断线自动重连、自动重拨号等特点。 4）支持参数配置，永久保存 DTU作为一种通信设备，其应用场合十分广泛。在不同的应用中，数据中心的IP地址及端口号，串口的波特率等都是不同的。因此，DTU都应支持参数配置，并且将配置好的参数保存内部的永久存储器件内（一般为FLASH 或EEPROM等）。一旦上电，就自动按照设置好的参数进行工作。 5）支持用户串口参数设置不同用户设备的串口参数有所不同，DTU连接用户设备的串口时，要根据用户设备串口的实际参数对DTU端进行相应设置，保证用户设备的正常通信和可靠数据传输。 6）DTU (Data Transfer unit)全称数据传输单元，是专门用于将串口数据转换为IP数据或将IP数据转换为串口数据通过无线通信网络进行传送的无线终端设备。 DTU硬件组成 DTU 硬件组成部分主要包括CPU控制模块、无线通讯模块以及电源模块 DTU 优点组网迅速灵活，建设周期短、成本低；网络覆盖范围广；安全保密性能好；链路支持永远在线、按流量计费、用户使用成本低； DTU工作过程描述 DTU上电后，首先读出内部FLASH中保存的工作参数（包括无线拨号参数，串口波特率，数据中心IP地址等等，事先已经配置好）。 DTU登陆GSM网络，然后进行PPP拨号。拨号成功后，DTU将获得一个由移动随机分配的内部IP地址(一般是10.X.X.X)。也就是说，DTU处于移动内网中，而且其内网IP地址通常是不固定的，随着每次拨号而变化。我们可以理解为无线DTU这时是一个移动内部局域网内的设备，通过移动网关来实现与外部Internet公网的通信。这与局域网内的电脑通过网关访问外部网络的方式相似。 DTU主动发起与数据中心的通信连接，并保持通信连接一直存在。由于DTU处于移动内网，而且IP地址不固定。因此，只能由DTU主动连接数据中心，而不能由数据中心主动连接DTU。这就要求数据中心具备固定的公网IP 地址或固定的域名。数据中心的公网IP地址或固定的域名作为参数存储在DTU内，以便DTU一旦上电拨号成功，就可以主动连接到数据中心。具体地讲，DTU通过数据中心的IP地址（如果是采用中心域名的话，先通过中心域名解析出中心IP地址）以及端口号等参数，向数据中心发起TCP

浅析宏电新版本DTU配置方法

浅析宏电新版本DTU配置方法黄锐，蓝天飞，任玮颖（湖北省十堰市气象局，湖北十堰 442000）摘要：我省区域自动气象站的数据传输采用的是移动网络GPRS方式，数据传输的核心部件DTU的参数配置正确与否直接影响其通信状态。针对DTU生产厂家对DTU升级后采用了新的配置方法，本文介绍了DTU升级后新的配置方法与原有配置方法的区别，详细介绍了配置软件的的操作步骤，对配置参数时需要注意的选项进行了介绍，并和原有的配置方法进行了对比。关键词：DTU；配置；方法引言深圳市宏电技术股份有限公司生产 DTU(Data Transfer Unit)是一款基于 GPRS/GSM网络的无线 DDN（Digital Data Network）的数据通信产品[1]。其广泛应用在我省的高山自动气象站、区域自动气象站、应急车载便携式自动气象站等设备上，DTU的配置参数使用超级终端工具进行配置。然而，从2015年开始，深圳市宏电技术股份有限公司对DTU进行了升级，新型号为H7118C－V59、WUSH8118，模块参数的配置方法有了改变，不能使用原有的超级终端进行配置，需要用专用的配置软件对通信模块进行参数设置。因此，本文对使用配置软件对新版本通信模块进行参数配置时，配置方法与原有使用超级终端配置DTU的方法进行了区分，对配置参数时的异同之处进行了介绍，对需要着重注意的地方进行了说明，解决因厂家DTU升级后气象技术保障人员对新版本DTU配置方法不熟练、参数配置不正确而导致DTU不上线的问题。 1 宏电DTU版本的查询与配置工具的准备深圳市宏电技术股份有限公司生产的宏电DTU的版本信息可在DTU背部查询（见图1）。D54、V56等原版本DTU均为2008年、2012年生产，生产日期显示在条形码最下端。新版本DTU V59、WUSH8118均为为2015年后生产，它的配置方法与原版本DTU使用超级终端工具的配置方法不同，新版本的DTU采用专用的DTU配置软件“DTU工具盒”来对模块的参数进行配置。图1 原版本模块参数（a）、新版本模块参数（b）