文档库 最新最全的文档下载
当前位置:文档库 › 基于FPGA的SIFT兴趣点检测

基于FPGA的SIFT兴趣点检测

基于FPGA的SIFT兴趣点检测
基于FPGA的SIFT兴趣点检测

Machine Vision and Applications(2013)24:371–392

DOI10.1007/s00138-012-0430-8

ORIGINAL PAPER

FPGA-based detection of SIFT interest keypoints

Leonardo Chang·JoséHernández-Palancar·

L.Enrique Sucar·Miguel Arias-Estrada

Received:11April2011/Revised:21December2011/Accepted:23April2012/Published online:30May2012?Springer-Verlag2012

Abstract The use of local features in images has become very popular due to its promising results.They have shown significant bene?ts in a variety of applications such as object recognition,image retrieval,robot navigation,panorama stitching,and others.SIFT is one of the local features meth-ods that have shown better results.Among its main disad-vantages is its high computational cost.In order to speedup this algorithm,this work proposes the design and implemen-tation of an ef?cient hardware architecture based on FPGAs for SIFT interest point detection In order to take full advan-tage of the parallelism in this algorithm and to minimize the device area occupied by its implementation in hardware,part of the algorithm was reformulated.The main contribution of the hardware architecture proposed in this paper and the main difference with the rest of the architectures reported in the literature is that as the number of octaves to be processed is increased,the amount of occupied device area remains almost constant.The evaluations and experiments to the architecture support this contribution,as well as accuracy,repeatability, and distinctiveness of the results.Experiments also showed device area occupation and time constraints of the hardware implementation.The architecture presented in this paper is L.Chang(B)·J.Hernández-Palancar

Advanced Technologies Application Center,

7a#21812e/218y222,Siboney,Playa,CP12220Havana,Cuba

e-mail:lchang@ccc.inaoep.mx;lchang@cenatav.co.cu

J.Hernández-Palancar

e-mail:jpalancar@cenatav.co.cu

L.E.Sucar·M.Arias-Estrada

National Institute for Astrophysics,Optics and Electronics,

Luis Enrique Erro No.1,Sta María Tonantzintla,

CP72840Puebla,Mexico

e-mail:esucar@inaoep.mx

M.Arias-Estrada

e-mail:ariasmo@inaoep.mx able to detect interest points in an image of320×240in 11ms,which represents a speedup of250×with respect to a software implementation.

Keywords Local features·SIFT·Keypoint detection·Hardware architecture·FPGA

1Introduction

In Computer Vision,it is necessary to extract image features that can be used in applications such as object recognition, image retrieval,robot navigation,panorama stitching,face recognition,and others.These features should be invariant to image variations such as translation,rotation,scale,view-point,and illumination.The feature extraction process also needs to be repetitive and precise,so that the same features are extracted from different images containing the same object, as well as distinctive,that is to say,that the different features can be distinguished from each other.

In the past decade,significant progress was achieved in this direction with the development of local invariant fea-tures.One of the most popular and widely used local fea-tures method that has shown good results in this area is the Scale Invariant Feature Transform(SIFT)method proposed by Lowe[13].The features extracted by SIFT are largely invariant to scale,rotation,illumination changes,noise,and small changes in viewpoint.The idea of this method is to?rst identify significant points in the image and to obtain a dis-criminant description of these points from its surroundings, which is then used for comparison between these descriptors using a similarity measure.

One of the main disadvantages of the SIFT algorithm is its high computational cost.This is the result of complex iter-ative processes to obtain invariance to the aforementioned

372L.Chang et al.

changes and transformations.For an image of1,024×768pixels,a software implementation[24]of the algorithm takes about3s to extract an average of1,200characteristics in a PC(CPU P43.0GHz,2GB RAM).

There are several scenarios and applications that require features to be extracted and compared in real time(approx-imately30frames per second)and even on high-resolution images(more than2megapixels).Currently,very few sys-tems running on personal computers achieve such processing results,and those who reach that speed,process low-resolu-tion images or reduce the number of octaves and scales in the scale-space,compromising the robustness of the algorithm. Therefore,an implementation of this algorithm that achieves real-time processing with high repeatability and distinctive-ness rates is desired.

A technique that has been widely used in recent years to accelerate computational tasks is the use of Field-Program-mable Gate Arrays(FPGAs).These are revolutionary devices that combine the bene?ts of hardware and software.These devices can implement circuits,providing great advantages in energy,area,and performance compared with software. They can be recon?gured in a simple and low-cost manner to implement a wide range of tasks.

In this paper,to speedup the extraction of SIFT features, we propose a reformulation of the most computationally expensive phase of this algorithm:the detection of interest keypoints.Based on this reformulation,we propose a parallel algorithm and a hardware architecture for this stage of the SIFT method.

The main contribution of the architecture and the parallel algorithm proposed here is that,while increasing the number of octaves to be processed,the amount of occupied device area will remain almost constant,only increasing the number of memory blocks needed to store the new octaves and the logic needed to control the interleaving of more octaves.This is possible because all octaves for the same scale,regardless of the amount,will be processed in parallel in the same convo-lution block.This is relevant as the trend in computer vision is to work with larger images,and the number of octaves is a function of the size of the image.Therefore,for higher resolu-tion images(and thus a greater number of octaves),the hard-ware logic required to process these higher resolution images will be the same.This contribution was supported by the experiments to architecture,showing quantitatively the ben-e?ts introduced with the interleaving of octaves processing.

The rest of the paper is organized as follows:In Sect.2the SIFT algorithm is described and its interest points detection stage is detailed.Section3discusses the works presented in literature to speedup SIFT using FPGA.The proposed reformulations and parallel algorithm aimed to obtain the maximum performance of a hardware implementation are presented in Sect.4.The hardware architecture that imple-ments the algorithm introduced in Sect.4is explained in Sect.5.The tests to the proposed hardware architecture and the main results are discussed in Sect.6.Finally,Sect.7con-cludes the paper,and future work is presented in Sect.8.

2Scale invariant feature transform

Methods based on comparisons of entire images or win-dows within them are suitable for learning and describing the global structure of objects,but cannot deal with partial occlusion problems,sudden changes in pose or viewpoint,or with non-rigid objects.

Significant advances have been accomplished in solving these problems with the development of local invariant fea-tures.The use of these features allows us to?nd local struc-tures that are present in different views of the image.It also provides a description of these structures that is largely invari-ant to image transformations such as translation,rotation, scale,illumination,and viewpoint.A study and comparison of some local feature extraction methods is presented in[22].

The purpose of local features is to provide a representation that allows us to?nd correspondences between images ef?-ciently and effectively.To satisfy this objective,the feature extractor must meet two important aspects:

?The feature extraction process must be repeatable and accurate,so that the same features of an object are extracted from different images containing that object.?The features should be distinctive,so that extracted fea-tures can be distinguished from each other.

In turn,a suf?cient number of features are required that cover the entire object so that it can be recognized even under partial occlusion.

SIFT,proposed by Lowe[13],is one of the most popular local features methods.Its descriptor has shown better results than other local descriptors[14].This method tries to iden-tify structures that are similar in different views of a scene and describe them by a vector which is independent of image size and orientation.

2.1SIFT algorithm pro?ling

In order to achieve its invariance to scale changes and rota-tion,and as a result of complex and iterative processes,the SIFT feature extraction method is an expensive computa-tional task.

Lowe divided his method in four major computational stages:

1.Scale-space extrema detection

2.Keypoint localization

3.Orientation assignment

4.Keypoint description

FPGA-based detection of SIFT interest keypoints373

Table1SIFT algorithm pro?ling

Stage Time(ms)Percentage

(1)Scale-space extrema detection1,39144.83

(2)Keypoint localization973.13

(3)Orientation assignment34110.99

(4)Keypoint description1,27441.05

(*)Whole algorithm3,103100.0

Table1shows execution times for each stage of the SIFT

algorithm.These times were obtained for an image of size

1,024×768pixels.We used the software implementation

provided in[24].The timing was acquired on a PC with an

Intel P4processor at3.0GHz and2GB of RAM.

As could be seen in Table1total running time was above

3s.The scale-space extrema detection stage was the most

expensive occupying nearly45%of the total processing.

The high computational cost of this stage is due to the large

number of convolutions that are produced to generate the Dif-

ference of Gaussians(DoG)scale-space,resulting in a large

number of multiplication-accumulation(MAC)operations of

?oating-point numbers.The number of MAC operations to

be performed for an M×N sized image to generate its DoG

scale-space with O octaves and S scales is given by

ω=O?1

i=0

M N

4i

k2S,

where k is the Gaussian convolution kernel width.

Also,in this stage there are a large number of compari-sons to?nd local extrema in the DoG scale-space which are marked as candidate keypoints.The number of comparisons at this stage is roughly given by

=O?1

i=0

26·

M N

4i

(S?2),(1)

For example,for the con?guration used to obtain the above pro?ling(M=1,024,N=768,O=4,S=6and k=7) the number of MAC operations for the generation of the DoG scale-space is307077120,and the number of comparisons for local extrema detection is108625920.

The keypoint description stage proved to be the second largest in terms of computational cost,with more than40% of the total processing.At this stage of the algorithm,for each keypoint,a descriptor is generated from the gradient direction and magnitude of its neighbors.The calculation of the gradient orientation involves trigonometric operations, which are the most computationally expensive operations in the descriptor generation phase.In hardware,to achieve a result per clock cycle,this type of operation requires a large amount of device area.There are other solutions that use less silicon area,but take several clock cycles[23].

Scale-space extrema detection and keypoint description stages have similar computational costs,but the former has greater potential for parallelism and hardware acceleration. For these reasons,to obtain the highest possible acceleration of the SIFT algorithm by speeding up one of its parts,we focused on the scale-space extrema detection stage.

2.2Scale-space extrema detection

This work presents an algorithm reformulation and a hard-ware architecture for the scale-space extrema detection phase of SIFT.This section describes in detail this stage of the algo-rithm and some of its theoretical foundations.

The scale-space extrema detection stage searches through all scales and image locations to?nd potential interest points that are invariant to scale and orientation.For this,the image is convolved with Gaussian?lters at different scales and then differences between adjacent blurred images are obtained. Finally,the local maxima and minima in the difference of Gaussians(DoG)at different scales are marked as interest points.

For a given image I(x,y),the SIFT detector is constructed from its Gaussian scale-space,L(x,y,σ),that is built from the convolution of I(x,y)with a variable-scale Gaussian:

L(x,y,σ)=G(x,y,σ)?I(x,y),

where?is the convolution operator in x and y,and G(x,y,σ) is the Gaussian kernel de?ned by

G(x,y,σ)=

1

2πσ2

e?(x2+y2)/2σ2.

The Gaussian scale space is created by generating a series of smoothed images at discrete values ofσ.Thus,theσdomain is quantised in logarithmic steps arranged in O octaves,where each octave is further subdivided in S sub-levels.The value ofσat a given octave o and sub-level s is given by

σ(o,s)=σ02o+s/S,o∈[0,...,O?1],

s∈[0,...,S?1],

whereσ0is the base scale level,e.g.,σ0=1.6.At each successive octave the data are spatially down-sampled by a factor of two.

To ef?ciently detect stable keypoint locations in scale space,Lowe proposed to use extrema in the DoG scale-space, D(x,y,σ),computed from the difference of adjacent scales: D(x,y,σ(o,s))=L(x,y,σ(o,s+1))

?L(x,y,σ(o,s)).

In order to detect the local maxima and minima of D(x,y,σ),each pixel in the DoG images is compared with

374L.Chang et al.

its eight neighbors at the same image,plus the nine corre-sponding neighbors at adjacent scales.If the pixel value is bigger or smaller than all these neighbors,it is selected as an interest point.

3Related work

In recent years,as a result of the popularity of SIFT as a local features method,and because of its high computational cost which makes it not viable for many real-time applications, several researchers have been trying to obtain faster imple-mentations of this algorithm.Some researchers have focused on the use of Graphics Processing Units(GPUs).Examples of such works are[10,12,20,21].Some other works in the literature have also addressed the speeding up of SIFT by using approximations or modi?cations in software.The most significant examples are[2,8,11].Also,due to the wide-spread use and positive results of FPGAs as a means to speedup various computing tasks,researchers have begun to focus on developing systems based on FPGAs for real-time extraction of SIFT features.The main works that use these devices to accelerate SIFT are[3,6,15,17–19].In this section we only discuss each of the last papers,highlighting their advantages and disadvantages,and analyzing the type of hardware architecture proposed in each of them,as they are the most relevant publications for the purpose of this paper.

The?rst work reported in the literature in the?eld of scale and orientation invariant feature extraction based on FPGAs was the work of Se et al.[19].In their work,to speedup SIFT with respect to software implementations,the authors presented an FPGA implementation of the algorithm using ?xed point arithmetic.This implementation was developed based on a software implementation employing?oating point representation.The authors also mentioned that several of the routines of the software version were modi?ed to make more ef?cient their hardware implementation.In order to imple-ment most of the algorithm they used Xilinx System Gen-erator.The authors suggest that using low-level hardware description languages,such as VHDL or Verilog,would be very costly in terms of development time.However,VHDL was used to implement low-level processes such as Direct Memory Access(DMA)and other memory access routines. In this study they used a Xilinx Virtex II FPGA.The SIFT execution time for a640×480image was reduced to60ms compared with600ms required by a Pentium III processor at 700MHz.Their paper only provided the above details;there is not any kind of information about the modi?cations to the algorithm,and architecture speci?cations.

In[15],Pettersson and Petersson presented a partial implementation of SIFT for online stereo calibration.They implemented some of the most expensive parts of SIFT:the generation of DoG scale-space and Sobel?ltering.These parts of the algorithm were implemented in a Xilinx Virtex II FPGA and the rest of the algorithm was implemented in software running on a personal computer.The authors pro-pose a pipeline architecture where convolution blocks are cascaded to reduce the errors introduced by having a very small kernel compared with its standard deviation,and to be able to use a kernel of?xed size.For obtaining each scale-image it is used a different convolution block.For the con-volution they use the separability property of the Gaussian kernel,and multiplications are replaced by using a Look Up Table(LUT);how to do this is described in[1,7].This tech-nique,despite replacing the multiplication operations,is a compromise between accuracy and size of the LUT,because it depends on the width of the convolution kernel and the number of bits used to represent it.The authors state that their systems work at60Hz and reduce the feature extrac-tion time between50and70%,but no information about the resolution of the input image is provided.Besides there is no details on the use of FPGA device area;neither is there any analysis on the replacement of multiplications or any other information on the architecture that affects the accuracy of the results.

Another FPGA-based partial implementation of the SIFT is presented in[6].In this work,Chati et al.present a hardware/software co-design to detect SIFT keypoints, implementing in hardware the parts with large degree of par-allelism.They propose to use a wide array or sliding window to produce all scales at the same time;however,this is only mentioned and they do not provide any information about the operation of this method.In their paper,Chati et al.exposed some modi?cations to the algorithm for operation in hard-ware,but they did not provide details of the system architec-ture,nor mention details about the use of device resources, silicon area occupation,or other analysis.The device used was a Xilinx Virtex II Pro FPGA,where the system can pro-cess images of size320×240pixels in0.8ms.

The most complete FPGA implementation of SIFT reported to date in the literature is the work of Bonato et al.[3,4].Their implementation uses a hardware/software co-design strategy;except the generation of descriptors, which is executed on a NIOS-II software processor,the remaining stages of SIFT are implemented in hardware.This architecture consists of three hardware blocks,one for the generation of DoG scale-space,one for the calculation of the orientation and magnitude,and one for the location of key-points.The block for DoG scale-space generation receives the input image from the camera and the result is sent to the other two hardware blocks.In addition,this architecture has a software block that handles the generation of descriptors for each keypoint.The authors suggest that the generation of descriptors is developed in software as the type of calculation performed at this stage is more feasible to be conducted by a software processor;also it is easier to implement in soft-

FPGA-based detection of SIFT interest keypoints375

ware than in hardware and gives greater?exibility to modify

the descriptor according to the?nal application.The imple-

mentation of the DoG scale-space generation block considers

the properties of separability and symmetry of the Gaussian

kernel.In addition,they save four multipliers by normal-

izing the convolution kernel so that it always takes values

of0or1on its?rst and last positions,avoiding the mul-

tiplication at these points.This brings the disadvantage of

being forced to work with?xed point or?oating point values,

because for certain values ofσif these results are normal-

ized in this way and then rounded to integers,all elements

will have the same value.The proposed system implements

18blocks of convolution with Gaussians,one for each scale-

image,under a con?guration of three octaves and six scales.

Another modi?cation in the architecture to save area of the

device,is that they represent the DoG images with a5-bit

unsigned https://www.wendangku.net/doc/7f14427625.html,ing an unsigned format affects

the amount of points detected,which is reduced by about

half,since only local maxima points are considered,not tak-

ing into account the minima.According to the authors,this

decrease in the number of points is not considered a prob-

lem for their application to Simultaneous Localization and

Mapping(SLAM)where only a few dozen of these are nec-

essary,but this decrease in the number of points could affect

other applications.This system,implemented in an Altera

Stratix II FPGA with a NIOS-II soft processor running at

100MHz,requires33ms to extract the SIFT features in

an image of320×240pixels,where the architecture bot-

tleneck is the generation of descriptors held in the NIOS-

II.

In[18],Qiu et al.present an architecture for the genera-

tion of the DoG scale-space.This work outperforms[3]and

[17]in terms of the use of device resources.This system

manages to generate the DoG scale-space for input images

of size320×240pixels in12ms.For this,they exploit

the separability property of the Gaussian kernel,making

the separable convolution as[17].In addition,it uses the

associative property of convolution,where the result of a

convolution can be equivalent to two successive convolu-

tions,and the sum of the squares of the radii of the con-

volution kernels of the latter is equal to the square of the

radius of the?rst(R20=r21+r22).This allows them to split one convolution in two,but using smaller kernels.Accord-

ing to the authors,the advantages of using this technique

is given by the possibility of reusing intermediate results,

saving hardware resources and simpli?cations provided by

the order in which they perform the convolutions.Theoret-

ically,this gives them a saving of up to17.8%of the cost

of hardware resources.In this architecture,the authors pro-

pose to use only?ve convolution blocks,in which,after

seven iterations,the DoG scale-space for?ve octaves and

six scales(O=5,S=6)is generated.This scheme

has the disadvantage that,despite using only?ve blocks of convolution(which implies a saving in the use of device area),seven iterations must be completed to obtain the whole DoG scale-space.The authors achieved not only improve-ments in the FPGA resources occupation with respect to [3,4],they also mentioned improvements in processing time, but comparing their architecture with the whole system in [3,4]and not just with the DoG scale-space generation stage of the algorithm,which in[3,4]is more ef?cient in time than the architecture proposed in the work of Qiu et al.

In our work we present a hardware architecture in order to speedup the detection of interest points(i.e.scale-space extrema)of the SIFT algorithm.The main difference between the architecture proposed in our work with earlier architec-tures reported in the literature,lies in a more ef?cient use of FPGA resources by interleaving the processing of octaves, while obtaining a result every two clock cycles,implying a considerable speedup over existing software implementa-tions and many of the hardware implementations discussed in this section.Furthermore,the architecture presented in our work achieves higher rates of FPGA resources saving as the number of processed octaves is increased.This implies a great advantage since the number of octaves depends on the size of the image,and the trend in computer vision is to work increasingly with higher resolution images.

4A parallel algorithm for scale-space extrema detection When performing a particular computational task it is com-mon to have several methods or algorithms.The?nal selec-tion is usually given by the application and the hardware device to be https://www.wendangku.net/doc/7f14427625.html,ually,the optimal algorithm for FPGA differs from the optimal algorithm for a general purpose pro-cessor or a sequential computer.

Although the speci?cations and con?guration of FPGA systems looks like software programs in high-level lan-guages,they specify hardware and not software.A reformula-tion of the algorithm in software can often mean a substantial improvement in the performance of the hardware due to the fact that a speci?c computational technique that is good in software does not necessarily have to be good in hardware [9].Hardware provides?exibility to create optimal compu-tational structures that best undertakes a given task as well as to exploit low level parallelism.

This section describes the proposed parallel algorithm for the scale-space extrema detection.This algorithm is a refor-mulation of the algorithm presented by Lowe[13]for this purpose.This algorithm is aimed at obtaining maximum per-formance in a hardware implementation of this stage of the algorithm.These reformulations are primarily focused on taking full advantage of parallelism in this process,while trying to minimize the device area occupied.

376L.Chang et al.

Fig.1In2D convolution,the result of a pixel depends only on a neigh-

borhood of the same size of the convolution window around the pixel in

the input image.In this?gure for a convolution window G of size3×3

the result depends only on a region of equal size in the input image I

4.1General considerations of the algorithm

In order to obtain an algorithm that allows a more ef?cient

use of FPGA resources,we took into account the potential

for exploitation of data parallelism,the separability prop-

erty of Gaussian kernel,and the interleaving in the process-

ing of octaves.This section details these elements that form

the basis of our proposed reformulation for the scale-space

extrema detection algorithm.

4.1.1Exploiting data parallelism

Let I be a two-dimensional image and let G be a convolution

mask of odd size k×k,then the convolution of I and G is

given by

f(x,y)=

i

?i

j

?j

I(i,j)G(x?i,y?j),(2)

where i,j=

k

2

.

As can be seen in Eq.2,for the calculation of f(x1,y1) only a neighborhood of size k×k of center(x1,y1)is nec-essary.This is also shown graphically in Fig.1.Similarly,to determine if a point is a point of interest only a neighborhood of size3×3is needed in the DoG image and in the adjacent images in the DoG scale-space.

Previously mentioned characteristics of the2D convolu-tion and of the scale-space extrema detection provide a high potential for data parallelism,specifically the type Single Process,Multiple Data(SPMD).As an example of using the SPMD parallelism in this task we can divide an image into P partitions with an overlap of k?1lines and process all partitions simultaneously by using P different processors. This implies an improvement in processing time of P times, but also an increase in the use of the device area by the same factor.Therefore,the right balance between desired speedup and device area must be found depending on the

application.Fig.2A matrix of M×N is separable if it can be decomposed into two matrices M×1and1×N

4.1.2Exploiting the separability property

of the Gaussian kernel

A technique that has been widely used in image processing to reduce the computational complexity of the2D Gaussian ?ltering is the exploitation of the separability property of the Gaussian kernel[16].A2D?lter is separable if it can be divided into two1D signals:a vertical and a horizontal pro-jection(see Fig.2).The Gaussian?lter can be separated as follows:

G(x,y,σ)=h(x,σ)?v(y,σ),

where

h(x,σ)=

1

2πσ

e?x2/2σ2,and v(y,σ)=

1

2πσ

e?y2/2σ2.

In addition,the convolution associative property holds: I(x,y)?

h(x,σ)?v(y,σ)

=

I(x,y)?h(x,σ)

?v(y,σ).

Therefore,the2D image convolution with a Gaussian?lter can be carried out by?rst convolving the image with h(x,σ) in the horizontal direction and then with v(y,σ)in the vertical direction or vice versa.A1D convolution to obtain an output value require k MAC(multiplication–accumula-tion)operations compared with k2MAC operations required by the2D variant.Therefore,the computational advantage of the separable convolution versus non-separable is k2/2k. Having a convolution window of size7×7,the use of this technique would represent a reduction in the number of MAC operations by a factor of49/14=3.5,which could represent a reduction of up to3.5times in the use of device area for these operations.

4.1.3Octaves interleaving for spatial pyramid processing

After processing each octave,the image is sub-scaled by a factor of two,taking every second pixel in each row and column,i.e.I o(x,y)=I o?1(2x,2y).After scaling an image in half,the total number of pixels is reduced by four. In hardware,to reduce the amount of data,its sampling rate is reduced by the same factor.If after processing every octave the amount of data is reduced by a factor of four,the sampling

τ(o)=τ04o,(3) whereτ0is the sampling period of the?rst octave.Therefore, after sub-scaling,there is a large percentage of idle process-ing time with respect to the processing time of the?rst octave. This large amount of idle processing time is a result of the high sampling period in the last octave due to the small size of the images with respect to the original one.The idle pro-cessing time for a system of O octaves is given by

?i=O?1

o=0

[τ(o)?1],

and can be identi?ed in Fig.3a as the rising edges not marked in bold in each of the octaves.

The main contribution of this paper is a scheme for spatial pyramid processing that takes advantage of these periods of inactivity,enabling the calculation of the O octaves of a scale in a single convolution block,no matter how big O is.

The general idea of this approach is to interleave the pro-cessing of the O octaves in only one processor.In order to do that,the sampling period of every octave is doubled,aiming to make place in the?rst octave processor for the calculation of the rest of the octaves.Figure3b shows this idea.

We claim that using this technique,regardless of the num-ber of octaves,all the octaves for a speci?c scale could be processed in a single processor(with the required latency), which involve a great system scalability and saving of hard-ware resources.

Proposition1Let the?rst octave occupy every odd clock cycle,every other octave k(where k=0refers to the sec-ond octave)will occupy the cycles de?ned by the following sequence:

(4) where a k is the?rst processing cycle of the octave k and τ1=8is the sampling period of the second octave.

This interleaving order ensures that two or more octaves will never request the same processing clock cycle,allowing to interleave an in?nite number of octaves.Moreover,letting a k be the?rst unused cycle,an optimal interleaving order its obtained.

Proof.by Mathematical Induction:

BASIS:The case k=0:

s0:2,10,18,...a0=2,x≡2(8),?x∈s0

is trivially satis?ed because every clock cycle in s0is even and every clock cycle occupied by the?rst octave is odd. INDUCTION STEP:Consider any k>0.Assume the induc-tion hypothesis that any two or more octaves except the last one will never request the same processing clock cycle (i.e.s0

s1

...

s k?1=?):

s0:a0,a0+8·40,a0+2(8·40),...,a0+n(8·40),... s1:a1,a1+8·41,a1+2(8·41),...,a1+n(8·41),... ..

.

s k?1:a k?1,a k?1+8·4k?1,a k?1+2(8·4k?1),...,

a k?1+n(8·4k?1);...

then,the following congruences are satis?ed:

378L.Chang et al.

x 0≡a 0(8),?x 0∈s 0x 1≡a 1(8·41),?x 1∈s 1

(5)

x k ?1≡a k ?1(8·4k ?1),?x k ?1∈s k ?1

Assume that ?a k such that it is not in any sequence

s i ,?1≤i ≤k ?1;then a k do not satisfy any of the previ-ous congruences in Eq.5and since 8·4k ≡0(8),8·4k ≡0(8·41),...,8·4k ≡0(8·4k ?1)by properties of congruences we have

a k +8·4k ≡a 0(8),a k +2(8·4k )≡a 0(8),...,a k +n (8·4k )≡a 0(8)a k +8·4k

≡a 1(8·41),a k +2(8·4k )

≡a 1(8·41),...,a k +n (8·4k )≡a 1(8·41)

...a k +8·4k ≡a k ?1(8·4k ?1),a k +2(8·4k )≡a k ?1(8·4k ?1),...,a k +n (8·4k )≡a k ?1(8·4k ?1)

Then,based on the assumption that ?a k such that it is not

in any sequence s i ,?1≤i ≤k ?1we have proved that s 0 s 1 ...

s k =?.Now we have to prove that there is always a possibility to ?nd an a k that is not present in any of the previous sequences.

The total number of clock cycles occupied by an in?nite number of octaves is given by L =

|U |2+|U |2·4+|U |2·42+···+|U |2·4k

+···L =|U |

12+12 1

4 +12 14 2+···+12 14

k

+··· which have the form of the in?nite geometric series ar 0+

ar 1+ar 2,...,+ar k +···which converges to a

1?r if and

only if |r |<1,since a =12and r =1

4,L =

2

3

|U |.As the number of occupied clock cycles is less than the number of available ones (L <|U |)there will always exist a clock cycle a k that is not occupied.This complete the proof of the induction step and thus of the proposition.

Further details about the hardware architecture that imple-ments this idea are provided in Sect.5.1.4.2Local extrema detection

As detailed in Sect.2.2,the scale-space is constructed by generating a series of images blurred at discrete values of σ,where its domain is divided into logarithmic intervals orga-nized in O octaves and where each octave is then divided in S

sub-levels.Therefore,to obtain a result at a given location in

the image in a certain scale it is necessary to obtain the value of that same location in the previous scale,and so on.Since a result of convolution only depends on a small region,all convolutions are performed concurrently,existing a latency,in the input data with respect to previous scale,relatively small compared with the size of the image.Similarly,the differences between adjacent scales to form the DoG scale-space are performed concurrently at the same time the DoG scale-space is being obtained.

In order to detect local extrema in the DoG scale-space,each pixel in the DoG images is compared with its eight neighbors in the same image,plus the corresponding nine neighbors in the adjacent scales.This implies that the same neighborhood of 3×3on a certain scale is used three times,while processing its scale and while processing the two adja-cent scales (see the total number of comparisons in Eq.1).An ef?cient and equivalent way to obtain local maxima and minima that allows to reuse partial results is described below.For every adjacent images is obtained the minimum and maximum point to point

Min 1(x ,y ,o ,s )=min (D (x ,y ,o ,s ),D (x ,y ,o ,s +1)),Max 1(x ,y ,o ,s )=max (D (x ,y ,o ,s ),D (x ,y ,o ,s +1)).Then the process is repeated on the images obtained in the previous step:Min 2(x ,y ,o ,s )

=min (Min 1(x ,y ,o ,s ),Min 1(x ,y ,o ,s +1)),Max 2(x ,y ,o ,s )

=max (Max 1(x ,y ,o ,s ),Max 1(x ,y ,o ,s +1)).With this procedure it is possible to obtain images repre-senting the minimum and maximum values over three adja-cent images.To check whether a pixel is a point of interest it is necessary to prove that it is a local maximum or a min-imum of Min 2(x ,y ,o ,s )or Max 2(x ,y ,o ,s ),respectively.Figure 4shows a diagram for this procedure.In addition,it should be checked that its value is equal to the corresponding pixel in the DoG,and despite being a local extrema it is not equal to its counterpart in any of the adjacent scales.For this to Min 2(x ,y ,o ,s )and Max 2(x ,y ,o ,s )a ?ag β(x ,y ,o ,s )is added to indicate these phenomena.

The total number of comparisons with the proposed local extrema detection method is de?ned in every octave by the S ?1image comparisons for the calculation of the ?rst-order extrema,plus the S ?2image comparisons for the second-order extrema,plus the eight comparisons of every pixel in the second-order extrema images against its neighbors and the one needed to check the β?ag:

Fig.4A pixel(marked with X in D)is selected as a point of interest if it is a local minimum in a3×3neighborhood in Min2(marked with circles).Min2is the second order minimum between adjacent scales in the DoG scale-space.Similarly this?gure applies to the maximum

Fig.5The proposed architecture consists of two main parts:one for the generation of the DoG

scale-space and the other for the detection of extrema in this space.The?rst block receives the image,generating the DoG scale-space,which serves as input to the second block,which extracts the points of interest

SIFT Interest Keypoints Detection

Input Image

DoG

Scale-Space

Generation

DoG

Scale-Space

Extrema

Detection

DoG (o=O, s=S-1)

DoG (o=1, s=1)

DoG (o=1, s=2)

DoG (o=1, s=S-1)

.

.

.

DoG (o=2, s=1)

DoG (o=2 s=2)

DoG (o=2, s=S-1)

.

.

.

.

.

.

Interest Points (o=1)

Interest Points (o=2)

Interest Points (o=O)

.

.

.

=O?1

i=0

M N

4i

[(S?1)+(S?2)+9(S?2)],

=O?1

i=0

11·

M N

4i

(S?1.9).(6)

Comparing the total number of comparison operations of the proposed method(Eq.6)with the comparisons needed by the classical method(Eq.1),a decrease by at least a factor of two is appreciated.

5Proposed hardware architecture

for scale-space extrema detection

In the previous section we proposed a reformulation for the scale-space extrema detection phase of the SIFT algorithm presented by Lowe[13].This reformulation tries to maxi-mize the advantage of the parallelism of this algorithm and to minimize the device area occupied by a hardware imple-mentation.In this section we propose a hardware architec-ture for the scale-space extrema detection stage of the SIFT method.The architecture presented here implements the par-allel algorithm proposed in the previous section.

The proposed architecture uses the elements discussed in Sect.4,namely the exploitation of data parallelism,the exploitation of the separability property of the Gaussian ker-nel,and the octaves processing interleaving.The utilization of these elements contributes to a better use of the device area since they provide an ef?cient way to perform this process.

This section describes each of the parts that integrates the architecture,which are also illustrated with diagrams,indi-cating their relation with the parallel algorithm proposed in the previous section.

For the detection of scale-space extrema,the architecture is divided into two parts:(i)generation of DoG scale-space and(ii)detection of local extrema in this space(see Fig.5). The input image is processed by the DoG scale-space gen-eration block,which returns O·(S?1)images that form the DoG scale-space.These images are given to the local extrema detection block that determines which image loca-tions are considered as points of interest.

5.1DoG scale-space generation

In the architectures proposed in[15]and[3],to generate the DoG scale-space,the authors use one convolution block for each convolution operation that is carried out and divide the processing by octaves,so it takes O·S convolution blocks. In the architecture presented here,we propose to use only S convolution blocks for the O·S convolutions,dividing the processing by scales while keeping the same system perfor-mance.This is achieved by interleaving the octaves process-ing as detailed in Sect.4.1.3.

A block diagram for the generation of DoG scale-space is shown in Fig.6.This diagram shows a system of four octaves and?ve scales(O=4,S=5).This architecture can also be generalized to any con?guration of these parameters.

The proposed architecture is mainly composed of Scale Calculation Blocks(SCB).A single SCB block performs O

380L.Chang et al.

Fig.6High-level diagram of the architecture for DoG scale-space generation.This diagram shows the cascade connections between the scale processor blocks,where each of these blocks processes O octaves.As output of this block DoG scale-space is obtained which serves as input to the local extrema detection block

Gaussian?ltering operations for a given scale,following the interleaving procedure described in Sect.4.1.3.Therefore, each SCB block has O input ports and O output ports,one for each octave,respectively,where the sampling period for each octave is de?ned by Eq.3.SCB blocks are cascaded to use a convolution kernel of?xed size and thus avoid the convolutions with large kernels.This cascading can be seen in Fig.6.

A SC

B block,for Gaussian?ltering,takes advantage of the separability property of Gaussian kernel as described in Sect.4.1.2.Taking advantage of that property,this block per-forms?ltering?rst in the horizontal direction and then in the vertical,which can be seen in Fig.7.

The internal organization of horizontal?ltering block is detailed in Fig.8.Each input signal is shifted through k?1 registers,where k is the number of coef?cients of the1D

FPGA-based detection of SIFT interest keypoints381 Fig.72D convolution is

performed by two consecutive

1D convolution,?rst passing

through a horizontal?lter and

then through a vertical one

Fig.8Internal structure of a horizontal?ltering block.It can be seen how all the octaves are processed in the same block,sharing the same hardware elements for convolution.The operating logic for interleaving the processing is given by the block M and by the multiplexers to which it controls

convolution kernel.The k signals corresponding to the O octaves are multiplexed with the aim of controlling the pro-cessing order of octaves and achieve the desired interleaving. The operating logic of the multiplexers in an instant t is deter-mined by block M,which implements the interleaving order de?ned in Proposition1.

The structure of vertical?ltering block is the same as the horizontal,with the difference that each buffer stores the last k lines of the image instead of the last k pixels.To store these values a RAM block is used to store each line.There-fore,this part of the design will use k?1blocks of RAM for each octave in each SCB block;hence the amount of RAM blocks used to generate the DoG scale-space is given by

#RAM_blocks=(k?1)·O·S.

382L.Chang et al.

DoG (o=1, s=1)DoG (o=3, s=1)DoG (o=4, s=1)

DoG (o=2, s=1)DoG (o=1, s=2)DoG (o=3, s=2)DoG (o=4, s=2)

DoG (o=2, s=2)DoG (o=1, s=3)DoG (o=3, s=3)DoG (o=4, s=3)

DoG (o=2, s=3)DoG (o=1, s=4)DoG (o=3, s=4)DoG (o=4, s=4)

DoG (o=2, s=4)isExtremum

isExtremum

isExtremum

isExtremum

Octave 1

Octave 2Octave 3

Octave 4

Local Extrema Detection on the DoG Scale-Space

Interest Points (o=1)

Interest Points (o=2)

Interest Points (o=3)

Interest Points (o=4)

Fig.9High-level diagram of the architecture for local extrema detection in the DoG scale-space.Each block isExtremum receives all the DoG images of an octave,for which,this block determines the local extrema,i.e.points of interest

A 1D convolution block uses k multipliers and k ?1adders,number that we call r (k ).Then,the amount of multi-pliers and adders resources used by the architecture for build-ing the scale-space is given by #multipliers_adders =2·r (k )·S .

As it can be seen,this quantity only depends on the size of the convolution kernel and the number of scales,and it is independent of the number of octaves.

The HSB block in Fig.6performs image subscaling.To this end,an addressable shift register and a counter is used.In order to replace the use of ?xed-point values using inte-gers,the coef?cients of the convolution kernel are multi-plied by a constant.Then,the ?ltered result is divided by this

same constant.Preferably,this constant must be a power of two,to replace the division operation by a simple bit shift operation.

5.2Local extrema detection on the DoG scale-space

The processing block that detects local extrema receives as input the DoG scale-space.This block implements the algorithm for this purpose stated in Sect. 4.2.A high-level diagram of this block for a system with a DoG scale-space of four octaves and four scales (O =4,S =4)is shown in Fig.9.This architecture can also be generalized for any con?guration of these parameters.

FPGA-based detection of SIFT interest keypoints383

Fig.10Internal structure of isExtremum block.First,the block obtains the maxima and minima for every three adjacent images,this is done in two stages in order to reuse intermediate results.The blocks enclosed in dashed lines implement theβ?ag,which serves as input to the isLo-calMin and isLocalMax blocks which determine the points of interest

For each octave,all the DoG images are passed to a is Extremum block,which determines which are the points to be considered of interest.The output of this block is a1-bit vector indicating for each point if it is regarded as an interest point or not.The internal structure of a isExtremum block is detailed in Fig.10.

As explained in Sect.4.1.1,the procedure designed to detect local extrema aims to reuse the intermediate calcula-tions by more than one scale,resulting in device resources saving.As can be seen in Fig.10,this process was divided into two stages where minimum and maximum values in the images in common are reutilized.The blocks enclosed in dashed lines implement theβ?ag,which indicates whether each minimum or maximum value is equal to its correspond-ing pixel in the DoG and if it is not equal to its counterpart in any of the adjacent scales.isLocalMin and isLocalMax blocks determine whether each pixel is a local extremum in a neighborhood of3×3,taking into account also the value of theβ?ag.If a point is an extremum at any scale it is con-sidered as an interest point,so one OR gate is used before the output.6Experimental results

This section details and analyzes the experiments conducted on the proposed architecture for the scale-space extrema detection phase of the SIFT method.The evaluation of the architecture focused on measuring the reliability and accu-racy of the results obtained and how they affect the repeat-ability and distinctiveness of the extracted SIFT features.The ef?ciency in the occupation of device area and the speedup obtained with respect to a software implementation is also analyzed.In addition,we compare our results with other related architectures reported in the literature.

In order to verify the accuracy and reliability of our pro-posed architecture a hybrid system was implemented where scale-space extrema detection stage is performed by our pro-posed architecture.The remaining stages of the algorithm are performed by Vedaldi[24]software implementation avail-able online.Figure11shows an schematic of this hybrid system.

The proposed architecture was modeled and simulated using Xilinx System Generator10.1+Simulink.As shown

384L.Chang et al. Fig.11Experimentation

platform.The detection of the

SIFT interest points stage is

performed by our proposed

architecture.The results are passed to a software implementation that executes the rest of the stages Scale-Space

Extrema

Detection

Keypoint

Description

Keypoint

Localization

Orientation

Assignment

Vedaldi Software

Implementation Our Proposed

Hardware

Architecture

in Fig.11,the results of the?rst stage of the SIFT obtained from the simulation of the architecture are passed to the Matlab workspace,where the software implementation takes the values necessary to perform the remaining stages of the algorithm.The comparison between results obtained by this hybrid implementation and a software implementation[24] will provide us with a basis for determining the quality of the results produced by our architecture.

6.1Accuracy evaluation

To evaluate the accuracy of our proposed hardware architec-ture we compared the results obtained in a set of38images for the hybrid implementation with the results obtained by the software implementation of Vedaldi.Test images were taken from Krystian Mikolajczyk website.1These images were captured with the aim of testing local feature extraction methods and were used in[14]to compare state-of-the-art local features methods.The experiments and analysis presented here only focus on the generation of the DoG scale-space,since for the detection of local extrema in this space the results were identical to the software implementa-tion.

As evaluation measure we used the Mean Square Error (MSE).The MSE quanti?es the difference between an obtained result and its expected or true value.It measures the average of the square of the error,where the error is the amount by which the result differs from the true value.In this paper the MSE is used to measure the difference between the values obtained by the proposed hardware architecture and a software implementation.The MSE is de?ned by Eq.7.

MSE=

M,N

[I s w(m,n)?I h w(m,n)]2

M N

(7)

where I s w(m,n)and I h w(m,n)are the intensity values of the pixel(m,n)in the images of size M×N generated by the hybrid and the software implementations,respectively.

For these tests we use the scale-spaces generated for a con?guration of six octaves and?ve scales(O=6,S=5). Smaller values of MSE indicate that the results generated by our architecture are more similar to those obtained by

1http://lear.inrialpes.fr/people/Mikolajczyk/.

Fig.12MSE values for each octave and scale.The rounding and aproximation errors committed in the convolution process are prop-agated in the order of dependence between the images in scale-space and hence the MSE increase in that order

Our Proposed HW Architecture

Image Software Implementation

Fig.13SIFT features are extracted from input image using both imple-mentations independently.Then,based on their keypoint correspon-dences we calculate the error in detection

the software implementation(i.e.higher degree of accuracy). The results of this test are summarized in Fig.12.

Each bar in Fig.12represents the average MSE of the38test images in a speci?c octave and scale.In this ?gure,we can see how the error within an octave is increasing at every scale,as well as increases at every octave.This is due,as explained in Sect.2.2,that each scale-image depends on the former scale-image,and the ?rst image in each octave depends on the penultimate scale-image in the previous octave.Therefore,the error in the calculation of each image is propagated to the next. These dependencies between the images can be seen in Fig.6.

The initial error,which is then increased over each scale and octave is caused by approximations to the Gaussian

FPGA-based detection of SIFT interest keypoints

385

5

10

15

20

25

30

35

5101520

255

10

15

20

25

30

35

0.511.522.533.544.555

10

15

20

25

30

35

0.511.522.533.544.555

10

15

20

25

30

35

0.20.40.60.811.21.4

1.61.82(a)

(b)

(d)

(c)Delta orientation Delta x Delta y

Delta sigma

test set

test set p i x e l s

d e g r e e s

test set

test set

s i g m a

p i x e l s

Fig.14Errors in the extraction of keypoints using our architecture with respect to software implementation.a ,b shows the localization errors,c shows errors in orientation,and d errors in scale

convolution kernel to substitute arithmetic operations on ?oat numbers with operations with integer values.This process is detailed in Sect.5.1.

The MSE provides a quantitative measure of how much the rounding affects the scale-space obtained,but provides no information about how much these approximations affect the output of the algorithm,which is the detection of inter-est points.With this objective in mind,we extracted SIFT keypoints using the hybrid and software implementations described above from the same set of images.For each image,correspondences between the keypoints extracted by both implementations were found.An example of this is shown in Fig.13.

Figure 14shows the average variation in each of the 38images for the keypoints obtained with both implementa-tions.We measured variations in the coordinates,scale,and orientation of the gradient of the keypoints detected.It can be seen that variations in terms of coordinates of keypoints identi?ed for these images did not exceed one pixel on aver-age,although major changes were of four pixels.For gradient orientation each point variation was also small;the average variation was smaller than 2.5?,and the largest variations were of 10.0?.The average variation of σwas 0.2,which also represents a small difference.

6.2Repeatability and distinctiveness evaluation

The previous section presented results showing the accuracy of the proposed architecture from a more theoretical perspec-tive focusing on the error in the scale-space generation and keypoints detection.This section follows an experimentation more focused on the use of these features in an application.In a real application,we need points to be detected with great accuracy,but we also need repetitive and distinctive keypoints,that is,the same point can be detected in different views of the scene or object and that it can be differentiated from the others.

To this end,we checked the correspondences between SIFT keypoints detected by our architecture in different images of the same scene.Figure 15shows examples of images used to evaluate the repeatability and distinctive-ness of the proposed architecture.We evaluated four dif-ferent changes in image conditions:changes in viewpoint (Fig.15a),changes in scale and rotation (Fig.15b),different JPEG compressions (Fig.15c),and image blurring (Fig.15d).In images with viewpoint changes the camera position varies from a frontal to a lateral position with a deviation of 60?.Images with scale and rotation changes were obtained by varying the camera tilt and optical zoom.Different JPEG

386L.Chang et

al.

(a)(b)(c)(d)

Fig.15Test set.In a changes in viewpoint,b changes in scale and rotation,c variations in JPEG compression,and d variations of blur.For each of these subsets the?rst image is taken as reference image

compressions were obtained with a standard software by modifying the parameter of image quality.Blurred images were obtained by varying the focus of the camera.These images were also obtained from the website of Krystian Mikolajczyk2;they were captured specifically aiming to test and compare local descriptor through a similar experimenta-tion.

To measure keypoints repeatability and distinctiveness we use the matches rate.This is calculated as the ratio between the number of correct matches between two images and the smaller number of detected points in this pair of images:

2http://lear.inrialpes.fr/people/Mikolajczyk/.matches_rate(I,I )=

#correct_matches(I,I )

min(#keypoints(I),#keypoints(I ))

.

It is desired that the proposed architecture presents high matches rate values but also a high number of matches.

The results of these tests are shown in Fig.16.The mea-sure was calculated for each of the above variations between a reference image(?rst image in each column of Fig.15)and the rest of the images in the subset.An ideal response would be a horizontal line at100%.

As can be seen in Fig.16,for the?rst three vari-ations,matches rates of the proposed architecture are smaller in comparison with software implementation,espe-cially in the images of minor variations,with more similar

FPGA-based detection of SIFT interest keypoints387

Fig.16Matches rates and number of matches for each subset of vari-ations.As can be seen in a,c,and e matches rates of the proposed architecture are a little smaller when compared with the software imple-mentation,particularly in the minor variations images,with more sim-ilar rates in images with greater variations.In g,the opposite situation is evidenced,induced in spite of the number of matches is smaller, the matches found were the most repetitive.As can be seen in b,d,f,and h the number of matches found was always more than200,which represents a good number of extracted features for many applications. The differences in matches rates and matches count obtained by the hybrid implementation with respect to software implementation,show the impact on the repeatability and distinctiveness caused by errors dis-cussed in the previous section.However,the fall of these values was not very drastic

rates in images with greater variations.The number of matches found was always more than200,which rep-resents a good number of extracted features for many applications.

The lines that describe the results obtained by the pro-posed architecture in all cases have a smaller slope than those obtained for the software implementation,indicating that despite their lower match rates in the images with minor

388L.Chang et al.

Fig.17Advantage in device area achieved by interleaving the octaves processing.a,c The differences between the proposed architecture using this technique and ignoring it.In b and d percentages of savings in the use of hardware resources

variations,the results were more stable over all images. Moreover,for the blurred images set the results showed a pos-itive slope.The differences in match rates and match count obtained by the hybrid implementation with respect to soft-ware implementation show the impact on the repeatability and distinctiveness caused by errors discussed in the previ-ous section.However,the fall of these values was not very drastic.

6.3Evaluation of ef?ciency in FPGA resources occupation In Sect.4it is stated that by introducing octave pro-cessing interleaving,for the same scale,all octaves could be calculated in the same processing unit;therefore,a great advantage in the consumption of the device area would be https://www.wendangku.net/doc/7f14427625.html,ter,in Sect.5the architecture that implements this idea is proposed and then the savings in FPGA resources that it implies are better evidenced.In this section we present several tests to validate the men-tioned contribution.With this aim,the proposed architec-ture was redesigned having only one significant change,i.e. the removal of the interleaving of octaves processing.In this new design a?ltering block is added for each image convolution at each scale and each octave.The resulting design is very similar to that proposed by Bonato et al.in[3].Its high-level structure remains the same as the one shown in Fig.6.

To demonstrate the bene?ts of octaves processing inter-leaving,we obtained several implementations of both archi-tectures for different con?gurations of its parameters,where the number of octaves varied between three and seven (O=[3,7]),and the number of scales(S=5)and the dimensions of the image(M=512,N=512)were kept constant.Then,these implementations were synthesized for a Xilinx Virtex II Pro device(XC2-VP30-5FF1152), to obtain the amount of device resources occupied by each of these architectures for different number of octaves, and to obtain a quantitative measure of the advantage in terms of device area,achieved by interleaving the octaves processing.

Figure17summarizes these comparisons.In Fig.17a,c, the number of registers and LUTs occupied by each of the architectures is shown.Also,the reduction of FPGA resources introduced by the octaves processing interleav-ing can be noticed;moreover,its lower growth trend can be appreciated and the line that describes it has a lower slope. Figures17b and d show the percentage of saved registers and LUTs provided by the use of this technique.These values are almost all above50%and with a noticeable tendency to increase while increasing the number of octaves being processed.

FPGA-based detection of SIFT interest keypoints389

Table2Hardware synthesis results of the proposed architecture for a con?guration of M=320,N=240,O= 3,S=6,k=7using a Xilinx Virtex II Pro

(XC2VP30-5FF1152)Logic utilization Used Available Utilization

Number of Slice Flip Flops5,67627,39220 Number of4input LUTs5,55427,39220 Logic Distribution

Number of occupied slices4,39313,69632 Number of Slices containing only related logic4,3934,393100 Number of Slices containing unrelated logic04,3930 Total Number of4input LUTs6,69927,39224 Number used as logic5,154

Number used as a route-thru1,145

Number used as Shift registers400

Number of bonded IOBs15364423 Number of RAMB16s10813679 Number of BUFGMUXs1166

Table3Comparison with

related works Comparison Proposed Bonato et al.[3]Qiu et al.[17]Qiu et al.[18]

parameters architecture

Image Size QVGA QVGA VGA QVGA

Max.clock frequency MHz145.122149.082.095.0

Throughput(Mpixels/seg)72.6149.0 5.115.3

Speed900fps1,940fps16fps81fps

Registers5,6767,2566,3336,120

LUTs5,55415,1375,8255,011

6.4Comparison with related architectures

This section compares the results obtained by our proposed architecture with related works of Bonato et al.[3],and Qiu et al.[17,18].To this end,the proposed architecture was synthesized in a Xilinx Virtex II Pro(XC2VP30-5FF1152) with a con?guration as close to that of those works(M= 320,N=240,O=3,S=6,k=7).The synthesis results for these settings are summarized in Table2.

After this process of synthesis was also determined that the implementation could operate at a maximum frequency of145.122MHz.Therefore,since the architecture returns a result every two clock cycles,our system is able to pro-cess72.6millions of pixels per second.With the achieved throughput it is possible to process high-definition video (1080×1280pixels)at a50frames per second(fps)rate.

Table3compares these results with those obtained by related architectures(discussed in Sect.3)reported in the literature for the detection of SIFT interest keypoints.

As can be seen in Table3,the maximum frequency at which the system could work is higher than the rest of these works,except for the work of Bonato et al.,which have very similar maximum frequencies.This maximum frequency, combined with the fact that our architecture returns a result every two clock cycles,allows us to have a processing speed of900fps,well above Qui et al.architectures[17,18].The work of Bonato et al.for this stage of the algorithm returns a result per clock cycle,so this part working separately can achieve twice the speed of our system,although its general architecture has a restriction of30fps,which is introduced by another stage of the algorithm.The architecture proposed in this paper achieves half the throughput of the work of Bonato et al.;this is because to perform the octaves process-ing interleaving it is necessary to reduce by two the sample rate.However,we sacri?ce half of throughput to obtain an advantage in device area of almost three times,which is the critical factor.Our work exceeds several times the through-put provided by Qiu et al.works.The proposed architecture also consumes less silicon area than the other architectures, except the number of LUTs compared with the work of[18] where the difference is very small.In addition,as discussed in other sections,the greater the number of octaves processed, the smaller the increased rate in the use of the device area of our architecture.Therefore,for a larger number of octaves,

390L.Chang et al.

Table4Comparison with other known implementations on software and GPU Comparison Proposed SIFT OpenCV SURF OpenCV SiftGPU[25] parameters architecture 2.3.1[5] 2.3.1[5]

Used FPGA Xilinx Macbook Intel2.4GHz Macbook Intel2.4GHz8800GTX Hardware Virtex II Pro Core2Duo,4Gb RAM Core2Duo,4Gb RAM768Mb GPU Image size QVGA QVGA QVGA QVGA Speed900fps8fps19fps153fps

the savings of hardware resources achieved by our architec-ture will be much bigger.

6.5Comparison with other known implementations

This section compares the results obtained by our pro-posed architecture implemented in a Xilinx Virtex II Pro (XC2VP30-5FF1152)with other software and GPU-based well-known implementations.We compare our implementa-tion against the implementations of SIFT and SURF in the latest version of OpenCV(2.3.1)[5]and against the GPU-based implementation SiftGPU[25].The comparison results are shown in Table4.As could be seen in Table4our results also outperform these implementations.

7Conclusions

In this paper we proposed a hardware architecture for the detection of SIFT interest points.In order to take full advan-tage of the parallelism of this stage of the algorithm and to minimize the device area occupied by its implementation in hardware,part of the algorithm was reformulated.Given the characteristics of the algorithm we took into account the potential for exploitation of data parallelism.To decrease the amount of multiplication-accumulation operations and thanks to the separability property of Gaussian kernel we used the separable convolution.Also,we introduced the octaves processing interleaving,which allowed us to per-form all convolution operations for a given scale in a single processing unit.

The main contribution of this architecture and the algo-rithm that it implements is that as the number of octaves to be processed is increased,the amount of occupied device area remains almost constant.This phenomenon is due to the fact that all octaves for the same scale—no matter how many—will be processed in the same convolution block.

The experiments and evaluations to the architecture,as ?rst target,checked how similar the results were com-pared with a software implementation.Low error rates in the generation of Gaussian scale-space were reported,as well as average errors lower than a pixel on the loca-tion of interest points.Also,a series of tests to verify the variation in repeatability and distinctiveness of SIFT fea-tures detected by our architecture were conducted.We took into account several variations in the images as viewpoint, rotation and scale,JPEG compression and blur.The dif-ferences in matches rates were small,detecting a suf?-cient number of features correspondences between images.

A series of tests that showed quantitatively the bene?ts introduced by interleaving the octaves processing were also carried out,resulting in savings in the use of device area above50%with an increasing tendency while more octaves are being processed.Finally,we compared the results obtained by our proposed architecture with other architec-tures reported in the literature for the detection of SIFT interest points.The proposed architecture showed best indi-cators of time and ef?ciency of device area use than the rest in almost all parameters.The architecture presented in this work is able to detect SIFT interest points in an image at a rate of one pixel every two clock cycles.Imple-mented in a Xilinx Virtex II Pro FPGA,with a con?gu-ration of three octaves and six scales,and a clock restric-tion of145MHz,an image of320×240is processed in1.1ms(900fps),which represents a speedup of250x (two orders of magnitude)with respect to Vedaldi software implementation.

8Future work

Based on the results obtained in this paper,some ideas arise that can be followed as future work.First,to implement in hardware the remaining stages of SIFT,in particular the descriptors generation phase which is the second largest stage in terms of computational cost,and to thus obtain a greater speedup of the algorithm in general.Also,it is worth to explore hardware acceleration of other SIFT variations,since their algorithmic conception was designed with the aim of speeding up this algorithm,either by approximations or by substituting operations with equivalents of lower computa-tional cost.

Acknowledgments This work was supported in part by CONACYT grant No.103878.L.Chang was supported in part by CONACYT schol-arship No.240251.

基于FPGA的多功能数据选择器设计与实现

基于FPGA的多功能数据选择器设计与实现 章军海201022020671 [摘要]传统的数字系统设计采用搭积木式的方法来进行设计,缺乏设计的灵活性。随着可编程逻辑器件(PLD)的出现,传统设计的缺点得以弥补,基于PLD的数字系统设计具有很好的灵活性,便于电路系统的修改与调试。本文采用自顶向下的层次化设计思想,基于FPGA设计了一种多功能数据选择器,实现了逻辑单元可编程、I/O单元可编程和连线可编程功能,并给出了本设计各个层次的原理图和仿真时序图;本文还基于一定的假设,对本设计的速度和资源占用的性能进行了优化。 [关键词]层次化设计;EDA;自顶向下;最大时延 0引言: 在现代数字系统的设计中,EDA(电子设计自动化)技术已经成为一种普遍的工具。基于EDA技术的设计中,通常有两种设计思想,一种是自顶向下的设计思想,一种是自底向上的设计思想[1]。其中,自顶向下的设计采用层次化设计思想,更加符合人们的思维习惯,也容易使设计者对复杂系统进行合理的划分与不断的优化,因此是目前设计思想的主流。基于层次化设计思想,实现逻辑单元、I/O单元和连线可编程可以提高资源的利用效率,并且可以简化数字系统的调试过程,便于复杂数字系统的设计[2][3]。 1系统原理图构架设计 1.1系统整体设计原理 本设计用于实现数据选择器和数据分配器及其复用的I/O端口和连线的可编程却换,提高系统的资源利用效率。系统顶层原理框图如图1所示,系统拥有两个地址选择端口a0、a1,一个功能选择端口ctr,还有五路I/O复用端口。其中,地址选择端口用于决定数据选择器的数据输入端和数据分配器的数据输出端;功能选择端口用于切换数据选择器和数据分配器,以及相应的I/O端口和连线;I/O复用端口数据的输入和输出,其功能表如表一所示。 图1顶层模块原理图 表一顶层系统功能表

基于FPGA的脉冲发生器的设计

【基础?应用】 基于FP GA 的脉冲发生器的设计 ① 张 涛 (北方交通大学电子信息工程学院,北京100044)【摘 要】 以脉冲发生器为研究对象,介绍了脉冲发生器的基本原理、硬件构成和实现方法,阐述了一种基于DSP -FP G A 数字系统的PWM 控制脉冲生成方法,并给出了仿真及实测实验结果。 【关键词】 脉宽调制;脉冲发生器;可编程门阵列 1 FP G A 简介 FP G A (Field Programmable G ate Array ,可编程门阵列)是美国Xinlinx 公司推出的一种采用单元型结构的新型PLD 器件。它采用CMOS 、SRAM 工艺制作,在结构上与阵列型PLD 不同,它的内部由许多独立的可编程逻辑单元构成,各逻辑单元之间可以灵活地相互连接,具有密度高、速度快、编程灵活和可重新配置等诸多优点。FP G A 已成为当前主流的PLD 器件之一。 1.1 PLD 的主要特点 (1)缩短研制周期。 (2)降低设计成本。用PLD 来设计和改造电子产品可以大幅度地减少印制板的面积和接插件,降低装配和调试费用。 (3)提高设计灵活性和可靠性。大量分立式元器件在向印制板上装配时,往往会发生由于虚焊或接触率近似于线性增加,且线性斜率较小;肝脏中大小不同的散射源对不同频率的声波存在有不同的散射效应。 由于肝脏组织结构的非均匀性、复杂性及其各部分散射相关长度分布的不一致性,其散射谱随深度增加而衰减变化,并非完全呈线性关系,而呈现较复杂的关系变化。 ⑵肝叶边缘部分及表层区域,其结构散射近似呈瑞利散射特征;肝叶表层以下与肝叶中心之间的中间区域,其结构散射呈随机散射特征;肝叶中心区域,其结构散射呈扩散漫射特征,也有较强的反射。 ⑶利用区域结构散射特征谱,不仅可对各特征区域组织微结构作出粗略估计,而且可通过区域散射谱特征的变化,对生物软组织的生理病理变化的判断提供依据。 综上所述,利用超声散射谱分析,可为B 超的形态学图像信息诊断提供一个组织特征的信息,在临床上是有应用前景的。 参考文献 [1]Luigi Landini et al.IEEE Trans on U FFC.1990,37(5):448-456 [2]陈启敏等.声学学报.1995,Vol.21,No.4:692-699 [3]E.J.Feleppa ,et al.IEEE Annual International Conference ,EMB ,1990;12(1):337 (责任编辑:常 平) 2003年4月第19卷第2期 武警工程学院学报JOURNAL OF EN GG COLL EGE OF ARMED POL ICE FORCE Apr.2003Vol.19No.2 ①收稿日期:2002-12-06作者简介:张涛(1968.07-),1994年毕业于西安交通大学工业电器自动化专业,现在北方交通大学电子信息工程学院电子与信息工程专业攻读硕士学位。

基于FPGA的模拟IIC接口设计与实现

研究生课程论文 课程名称基于FPGA的模拟IIC接口设计与实现授课学期2012 学年至2013 学年第一学期学院电子工程学院 专业电子与通信工程 学号2012011603 姓名 任课教师 交稿日期2013.01.10 成绩 阅读教师签名 日期 广西师范大学研究生学院制

基于FPGA的模拟I2C接口设计与实现 摘要:本文论述了I2C总线的基本协议,以及基于FPGA 的模拟I2C 总线接口模块的设计,在QuartusII软件中用Verilog HDL语言编写了部分I2C总线接口功能的程序代码,生成原理图模块。并连接好各个模块,进行了时序仿真。最后,下载到FPGA的板运行测试。 关键词:I2C 接口FPGA Verilog 1课题研究意义、现状及应用分析 目前市场上主流的嵌入式设备主要是微处理器、DSP等,但FPGA 以其独有的高抗干扰性、高安全性正在逐步取得开发公司的青睐,在FPGA上开发I2C势在必行。并且利用EDA 工具设计芯片实现系统的功能,已经成为支撑电子设计的通用平台,并逐步向支持系统级的设计方向发展。模块化的设计思想在软件设计过程中越来越被重视。I2C总线是Philips 公司推出的双向两线串行通讯标准,具有接口线少、通讯效率高等特点。因此,基于FPGA的I2C总线设计有着广泛的应用前景。

2课题总体方案设计及功能模块介绍 本设计主要分三大模块,分别是I2C 总线接口模块、按键输入控制模块、数码管显示模块。I2C总线模块集成了I2C协议用于和总线相接EEPROM的通信;按键输入控制模块用于控制I2C模块的页读、页写、字节读、字节写功能;数码管显示模块用于显示通过I2C总线读取EEPROM中的数据。 3I2C接口设计原理 I2C总线最主要的优点是其简单性和有效性。由于接口直接在组件之上,因此I2C总线占用的空间非常小,减少了电路板的空间和芯片管脚的数量,降低了互联成本。总线的长度可高达25英尺,并且能够以10 Kbps的最大传输速率支持40个组件。I2C总线的另一个优点是,它支持多主控(multimastering),其中任何能够进行发送和接收的设备都可以成为主总线。一个主控能够控制信号的传输和时钟频率。 3.1总线的构成 I2C总线是由数据线SDA和时钟SCL构成的串行总线,可发送和接收数据。在CPU与被控IC之间、IC与IC之间进行双向传送,最高传送速率100kbps。各种被控制电路均并联在这条总线上,但就像电话机一样只有拨通各自的号码才能工作,所以每个电路和模块都

基于fpga的eeprom设计

二线制I2C CMOS 串行EEPROM 的FPGA设计 姓名:钱大成 学号:080230114 院系:物理院电子系 2011年1月1日

一、课程设计摘要: (1)背景知识: A、基本介绍: 二线制I2C CMOS 串行EEPROM AT24C02/4/8/16 是一种采用CMOS 工艺制成的串行可用电擦除可编程只读存储器。 B、I2C (Inter Integrated Circuit)总线特征介绍: I2C 双向二线制串行总线协议定义如下: 只有在总线处于“非忙”状态时,数据传输才能被初始化。在数据传输期间,只要时钟线为高电平,数据线都必须保持稳定,否则数据线上的任何变化都被当作“启动”或“停止”信号。图1 是被定义的总线状态。· ①总线非忙状态(A 段) 数据线SDA 和时钟线 SCL 都保持高电平。 ②启动数据传输(B 段) 当时钟线(SCL)为高电平状态时,数据线(SDA)由高电平变为低电平的下降沿被认为是“启动”信号。只有出现“启动”信号后,其它的命令才有效。

③停止数据传输(C 段) 当时钟线(SCL)为高电平状态时,数据线(SDA)由低电平变为高电平的上升沿被认为是“停止”信号。随着“停在”信号出现,所有的外部操作都结束。 ④数据有效(D 段) 在出现“启动”信号以后,在时钟线(SCL)为高电平状态时数据线是稳定的,这时数据线的状态就要传送的数据。数据线(SDA)上的数据的改变必须在时钟线为低电平期间完成,每位数据占用一个时钟脉冲。每个数传输都是由“启动”信号开始,结束于“停止”信号。 ⑤应答信号 每个正在接收数据的EEPROM 在接到一个字节的数据后,通常需要发出一个应答信号。而每个正在发送数据的EEPROM 在发出一个字节的数据后,通常需要接收一个应答信号。EEPROM 读写控制器必须产生一个与这个应答位相联系的额外的时钟脉冲。在EEPROM 的读操作中,EEPROM 读写控制器对EEPROM 完成的最后一个字节不产生应答位,但是应该给EEPROM 一个结束信号。 C、3. 二线制I2C CMOS 串行EEPROM读写操作 ① EEPROM 的写操作(字节编程方式) 所谓EEPROM 的写操作(字节编程方式)就是通过读写控制器把一个字节数据发送到EEPROM 中指定地址的存储单元。其过程如下:EEPROM 读写控制器发出“启动”信号后,紧跟着送4 位I2C 总线器件特征编码1010 和3 位EEPROM 芯片地址/页地址XXX 以及写状态的R/W 位(=0),到总线上。这一字节表示在接收到被寻址的EEPROM 产生的一个应答位后,读写控制器将跟着发

基于FPGA芯片的最小系统设计

黑龙江大学本科生 毕业论文(设计)档案编码: 学院:电子工程学院 专业:电子信息工程 年级:2007 学生姓名:王国凯 毕业论文题目:基于FPGA 的电梯自动控制 系统设计

摘要 本文在介绍了在当前国内外信息技术高速发展的今天,电子系统数字化已成为有目共睹的趋势。从传统的应用中小规模芯片构成电路系统到广泛地应用单片机,直至今天FPGA 在系统设计中的应用,电子设计技术已迈人了一个全新的阶段。FPGA 利用它的现场可编程特性,将原来的电路板级产品集成为芯片级产品,缩小体积,缩短系统研制周期,方便系统升级,具有容量大、逻辑功能强,提高系统的稳定性,而且兼有高速、高可靠性。越来越多的电子设计人员使用芯片进行电子系统的设计,通过基于FPGA 电梯系统开发设计,说明了FAPG 芯片研究的动机和研究意义。 关键词 FPGA;电梯系统;FLEX10K;JTAG;模块设计

Ab s t ract This paper introduces the rapid development of information technology around the world today. Digitalized electronic systems have become the trend. From the traditional application of small and medium-chip circuitry to Microcontroller and FPGA application in system design, electronic design technology is stepping into a new field. By using its field programmable features, FPGA changes the original circuit board-level products to the chip-level integration products. Now FPGA has advantages of reduced the size, shorten development cycle, facilitated in system upgrades, highly capacity, strong logic functions, stable system and high speed. More and more electronic designers use FPGA to design electronic systems. This paper shows the motivation and significance of designing by FPGA through the elevator FPGA system design. Ke ywo r d FPGA; Mini-System; FLEX10K; JTAG;Module design

FPGA设计方案

FPGA课程设计 题目:全天候温度纪录仪的设计与FPGA实现 姓名: 学号: 院系:信息科学与工程学院 专业:计算机技术

摘要 本设计有效的克服了传统的数字温度计的缺点,采用自上而下的设计思路,绘制出了系统结构流程图,最后又在硬件上通过对其进行调试和验证。基于FPGA在Quartus II13.0软件下应用Verilog HDL语言编写程序,采用ALTRA公司Cyclone- IV系列的EP4CE40F23I7 芯片进行了计算机仿真,并给出了相应的仿真结果。该电路能够实现很好的测温功能。 关键字:数字温度计;FPGA;Quartus II130.;Verilog HDL;EP4CE40F2317 Abstract This design effectively overcomes the traditional digital thermometer’s wea knesses and takes a top-down approach to design flow chart of system, and fi nally pass the circuits to the hardware to debug and verify it. This design is b ased on FPGA using Verilog HDL language to write program in Quartus II sof tware, adopting EP4CE40F23I7 chip of Cyclone- IV series of ALTRA company for computer simulation and at the same time showing the corresponding sim ulation result. This circuit is able to carry out excellent temperature- measurem ent function. KeyWords:Digital thermometer;FPGA;Quartus II 13.0;Verilog HDL ;EP4CE40F2317

基于FPGA的简易的ALU设计

本科毕业设计开题报告 题目:基于FPGA的简易的ALU设计 院(系): 班级: 姓名: 学号: 指导教师: 教师职称:讲师

xxxxx学院本科毕业设计开题报告 题目基于FPGA的简易ALU设计来源工程实际 1、研究目的和意义 从20 世纪中叶的无线电时代,到21 世纪以计算机技术为中心的智能化加信息化的现代电子时代,电子系统发生了巨大的变化。现代电子系统愈发庞大和复杂,很多应用要求能够在现场进行实时的高速运算,并对系统进行有效地控制。作为这一需求的解决方案,嵌入式计算机应用系统已成为现代电子系统的核心技术。 早期的嵌入式系统是将通用计算机经改装后嵌入到被测控对象去,实现数据采集、分析处理、状态显示、输出控制等功能。随着大规模集成电路技术的发展,中央处理器CPU、随机存取存储器RAM、只读存储器ROM、输入/输出端口I/O等主要的计算机功能部件可以集成在一块集成电路芯片上,这颗芯片就被称为单片机。与改装普通计算机相比,单片机具有性能高、速度快、体积小、价格低、稳定可靠、应用广泛、通用性强等突出优点,因此迅速成为最普及的嵌入式应用系统方案。 通常,我们要实现一些功能可以用单片机来完成,但是,用可编程逻辑FPGA同样可以实现。在计算机中,算术逻辑单元(ALU)是专门执行算术和逻辑运算的数字电路。ALU是计算机中央处理器的最重要组成部分,甚至连最小的微处理器也包含ALU作计数功能。此次我要完成的设计是基于FPGA的四位ALU算数逻辑单元设计。通过对ALU功能的拓展,来实现更快更好的运算功能,相信这一功能的实现将使运算功能更加简单、快捷、准确,从而提高我们今后的学习工作效率。 2、发展情况(文献综述) 算术逻辑单元(arithmetic logic unit,缩写ALU)是进行整数运算的结构。现阶段是用电路来实现,应用在电脑芯片中。 在计算机中,算术逻辑单元(ALU)是专门执行算术和逻辑运算的数字电路。ALU是计算机中央处理器的最重要组成部分,甚至连最小的微处理器也包含ALU作计数功能。在现代CPU和GPU处理器中已含有功能强大和复杂的ALU;一个单一元件也可能含有ALU。 1945年数学家冯诺伊曼在一篇介绍被称为EDV AC的一种新型电脑的基础构成的报告中提出ALU的概念。 早期发展:1946年,冯诺伊曼与同事合作为普林斯顿高等学习学院(IAS)设计计算机。随后IAS计算机成为后来计算机的原形。在论文中,冯诺伊曼提出他相信计算机中所需的部件,其中包括ALU。冯诺伊曼写到,ALU是计算机的必备组成部分,因为已确定计算机一定要完成基本的数学运算,包括加减乘除。于是他相信计算机应该含有专门完成此类运算的部件。 ①数字系统 ALU必须使用与数字电路其他部分使用同样的格式进行数字处理。对现代处理器而言,几乎全都使用二进制补码表示方式。早期的计算机曾使用过很多种数字系统,包括反码、符号数值码,甚至是十进制码,每一位用十个管子。以上这每一种数字系统所对应的ALU都有不同的设计,而这也影响了当前对二进制补码的优先选择,因为二进制补码能简化ALU加法和减法的运算。 ②可行性分析 绝大部分计算机指令都是由ALU执行的。ALU从寄存器中取出数据,数据经过处理将运算结果存入ALU输出寄存器中。其他部件负责在寄存器与内存间传送数据,控制单元控制着ALU,通过控制电路来告诉ALU该执行什么操作。 ③简单运算 大部分ALU都可以完成以下运算∶整数算术运算(加、减,有时还包括乘和除,不过成本

基于FPGA的嵌入式监控系统设计

基于FPGA的嵌入式监控系统设计 来源:无线测温.testeck. 目前,图像监控系统大多采用PC和视频采集卡作为系统主要部分,基于嵌入式技术的图像监控系统设备在我国还只是起步阶段,没有成熟的产品应用。这一现状的根本原因就是我国在开发这类产品时,没有统一的开发标准和共用的开发平台,而且没有可靠的功能和性能测试标准,各个企业的开发技术力量分散,极大的影响了该类产品开发的效率和可靠性。而制造出来的产品同国外同类产品相比,功能相差太大,没有竞争力,市场基本上被国外公司所占领。因此,开发一个该类嵌入式系统势在必行。 系统总体方案 为了实现自动图像报警和图像采集,本文设计了动体检测算法,这是因为绝大多数情况下我们只对监控区域中运动的物体感兴趣,这样可以过滤掉只包含静态背景的图像,从而降低了对有限的嵌入式硬件资源的消耗。由于活动物体大多是人,而且这也是图像监控的目标,为此加入了人体信号探测器,用以辅助动体检测,以达到降低图像报警误报率的目的。本系统主要集成了图像采集、控制和存储等器件或芯片,组成了以FPGA为控制核心的实时图像监控系统。系统

的总体方案如图1所示。 图1 图像监控系统结构图 系统工作流程为:系统上电后,FPGA从外部EEPROM自动加载程序,I2C模块对CIS进行初始化工作参数配置。CIS 向FPGA输入图像数据信号,FPGA将采集的原始数据(RAW)转换成RGB格式,帧缓冲模块(Frame Buffer)每次将相邻两帧图像数据写入SDRAM,然后比较这两帧图像的差值,如果差值大于设定的阈值,并且人体探测器输出高电平,就认为检测到了外界场景的运动,系统会自动将捕获的图像输出到SD卡进行存储。图2给出了系统的工作流程。 图2 系统工作流程图 图3 电源电路原理图 系统硬件设计与实现 图像监控系统处理的数据量较大,同时还要满足实时性要

基于FPGA的SPWM设计方案

基于FPGA的SPWM设计方案 第1章绪论 1.1 SPWM介绍 PWM的全称是Pulse Width Modulation(脉冲宽度调制)。,它是通过改变输出方波的占空比来改变等效的输出电压。广泛地用于电动机调速和阀门控制,比如电动车电机调速就是使用这种方式 SPWM,即正弦脉冲宽度调制(Sinusoidal Pulse Width Modulation),就是在PWM的基础上改变了调制脉冲方式,脉冲宽度时间占空比按正弦规律排列,用SPWM波形控制逆变电路中开关器件的通断,使其输出的脉冲电压的面积与所希望输出的正弦波在相应区间内的面积相等,通过改变调制波的频率和幅值则可调节逆变电路输出电压的频率和幅值,这样输出波形经过适当的滤波可以做到正弦波输出。它广泛地用于直流交流逆变器等. 1.2 SPWM原理实现方案 1.2.1 等面积法 该方案实际上就是SPWM法原理的直接阐释,用同样数量的等幅而不等宽的矩形脉冲序列代替正弦波,然后计算各脉冲的宽度和间隔,并把这些数据存于微机中,通过查表的方式生成PWM信号控制开关器件的通断,以达到预期的目的.由于此方法是以SPWM控制的基本原理为出发点,可以准确地计算出各开关器件的通断时刻,其所得的的波形很接近正弦波,但其存在计算繁琐,数据占用内存大,不能实时控制的缺点. 1.2.2 硬件调制法 硬件调制法是为解决等面积法计算繁琐的缺点而提出的,其原理就是把所希望的波形作为调制信号,把接受调制的信号作为载波,通过对载波的调制得到所期望的PWM波形。通常采用等腰三角波作为载波,当调制信号波为正弦波时,所得到的就是SPWM波形。其实方法简单,可以用模拟电路构成三角波载波和正弦

基于FPGA的简单VGA显示控制器设计开题报告

开题报告的内容 一、本课题国内外状况,说明选题依据和意义 现在社会,以计算机技术为核心的信息技术迅速发展,以及信息的爆炸式增长,人类获得视觉信息的很大一部分是从各种各样的电子显示器件上获得的,对这些电子显示器件的要求也越来越高,在这些诸多因素的驱动下,显示技术也取得了飞速的发展。VGA(Video Graphics Array)是IBM在1987年随PS/2机一起推出的一种视频传输标准,具有分辨率高、显示速率快、颜色丰富等优点,在彩色显示器领域得到了广泛的应用。使用FPGA设计的VGA 显示控制器具有很高的灵活性,可以根据其不同的类型、尺寸、适用场合特别是不同的工业产品,做一些特殊的设计,以最小的代价满足系统的要求,而且可以解决通用的显示控制器本身固有的一些缺点。 VGA接口是与显示器进行通信的唯一接口,通过FPGA器件控制RGB信号、行同步信号、场同步信号等信号,并参照有关标准,最后可以实现对VGA显示器的控制。VGA 图像控制器是一个较大的数字系统,传统放入图像显示方法是将图像数据传回电脑并通过显示器显示出来,在传输的过程中就需要CPU不断地对图像数据信号进行控制,这样就造成了CPU资源的浪费,同时系统还需要依赖电脑,因而降低了系统的灵活性。利用FPGA 芯片和EDA设计方法,可以根据用户的需要,设计出针对性强的VGA显示控制器,而且不需要依靠计算机,既能够大大降低成本,又可以满足生产实践中不断变化的需要,同时产品的升级换代也方便迅速。 基于这种目的,本设计采用了Altera公司的EDA软件Quartus II进行设计。 二、研究的基本内容、基本思路(方案)及解决的主要问题 2.1设计的主要内容 此设计要求实现某一分辨率下(如640*480@60Hz)的VGA显示驱动,能简单显示彩条、图像等。 1.熟悉FPGA芯片资料(如Cyclone系列)、集成开发环境Altera Quartus II、仿真软件ModelSim等设计相关工具、器件。 2.熟悉VGA工作原理及VGA接口协议、工作时序。 3.计算出合适的时序,并对原始时钟进行分频处理以获取符合时序要求的各频率。 4.须要显示的图像等可存储于外部存储器,运行时,从外部存储器读取显示数据。

相关文档