文档库 最新最全的文档下载
当前位置:文档库 › Small-Scale Reconfigurability for Improved Performance and Double-Precision in Graphics Har

Small-Scale Reconfigurability for Improved Performance and Double-Precision in Graphics Har

Small-Scale Reconfigurability for Improved Performance and Double-Precision in Graphics Har
Small-Scale Reconfigurability for Improved Performance and Double-Precision in Graphics Har

This is a preprint of an article submitted for consideration

in the INTERNATIONAL JOURNAL OF ELECTRONICS

Copyright c 2007Taylor and Francis;INTERNATIONAL

JOURNAL OF ELECTRONICS is available online at:

https://www.wendangku.net/doc/8316305976.html,/

Small-Scale Recon?gurability for Improved Performance and Double-Precision in Graphics Hardware

Kevin Dale?,Jeremy W.Shea?er?,Vinu Vijay Kumar?,David P.Luebke?,

Greg Humphreys?,and Kevin Skadron?

(submitted for review on November30,2006)

We explore the application of Small-Scale Recon?gurability(SSR)to graphics hardware.SSR is an architectural technique wherein functionality common to multiple subunits is reused rather than replicated,yielding high-performance recon?gurable hardware with reduced area requirements(Vi-jay Kumar and Lach2003).We show that SSR can be used e?ectively in programmable graphics architectures to allow double-precision computation without a?ecting the performance of single-precision calculations and to increase fragment shader performance with a minimal impact on chip area.

1Introduction

Every hardware system makes a tradeo?between performance and?exibility. At one end of the spectrum,general purpose processors provide maximum ?exibility at the expense of performance,area,power consumption,and price. Custom ASICs are the other extreme,providing maximum performance at a minimum cost,albeit for only a very narrow set of applications.

Modern graphics hardware requires both high performance and?exibility, placing it somewhere between these two extremes.Traditional intermediate hardware solutions like FPGAs are inappropriate for graphics processors be-cause of their large size and low performance relative to their?xed-logic coun-terparts.Small-scale recon?gurability(SSR)provides an attractive compro-mise;systems that use SSR components can approach the high speed and small size of ASICs while providing some specialized con?gurability(Vijay Kumar and Lach2003).In this paper,we explore the applicability of SSR to pro-grammable graphics hardware.

The simplest example of a recon?gurable component is two fully functional components connected with a multiplexer(see Fig.1).Although these two ?University of Virginia,{kdale,jws9c,vv6v,humper,skadron}@https://www.wendangku.net/doc/8316305976.html,

?NVIDIA Research,david@https://www.wendangku.net/doc/8316305976.html,

2SSR for Improved Performance and Double-Precision in Graphics Hardware

A B C D

A B

E F MUX

(a)Na¨?ve recon?gurable hardware

A B

C D E F

MUX

(b)A more e?cient solution

Figure1.A na¨?ve implementation of recon?gurable hardware can be built by simply multiplexing between two distinct,unmodi?ed units(a),but a more e?cient design would reuse common

substructure to avoid replication(b).

components are disjoint,in typical usage they will contain substantially sim-ilar redundant substructures,which is precisely the situation in which SSR performs best.Rather than replicate all of the redundant structure,one can instead reuse common substructure,and do so at a?ne granularity within a single component(Vijay Kumar and Lach2003,Chiricescu et al.2002).

A common SSR unit is the morphable multiplier.These multiplier-adders can be recon?gured into a multiplier or an adder in a single cycle.When used to create?xed-point units,morphable multipliers yield a nearly17%reduc-tion in total area when compared to the sum of the sizes of their constituent parts(Chiricescu et al.2002).

Graphics processors,like specialized multimedia processors and DSPs,are a particularly suitable target for SSR due to their vector-processor like oper-ations.When the same operation is performed repeatedly in SIMD fashion, recon?guration and its associated overhead is infrequently needed,and any cost can be amortized over many instructions.Furthermore,SSR-based com-ponents typically have lower static power requirements because less hardware goes unused.

2Related Work

Dynamically recon?gurable hardware has been a popular topic in recent com-puter architecture literature,especially in the FPGA and recon?gurable com-puting communities.The con?gurability of these systems serves myriad design

Dale et.al3 goals,among them improved performance,power,area,and fault tolerance characteristics.

Even et al.(1997)describe a dual mode IEEE multiplier—a pipelined unit capable of producing one double-precision or two single-precision multiplica-tions every clock cycle with a three cycle latency.The authors argue that the reuse of substructure yields a cheap device that performs well for both precisions.They further claim that the single precision mode is particularly useful for SIMD applications,like graphics,because it is conducive to systems on which the same operation is regularly repeated on large numbers of data points.

Guerra et al.(1998)explore built-in-self-repair(BISR)and its application to fault tolerance,manufacturability,and application-speci?c programmable processor design.Previous work in the area of dynamic repair had made use of specialized redundant units to replace damaged units;their paper describes the synthesis of more general units that can replace any of several units on a chip when damage is detected.The authors coin the term HBISR(heterogeneous BISR)for the technique.

A morphable multiplier is a device capable of performing either a?xed point multiply or add using the same hardware structure(Chiricescu et al.2002). Morphable multipliers require less area than the sum of the area needed for a separate multiplier and adder(in fact,they require only slightly more than a multiplier alone),while imposing negligible performance penalties.

Metrics like area,performance,and power are easily quanti?ed,but it is less obvious how to measure the increasingly important metric of hardware?https://www.wendangku.net/doc/8316305976.html,pton and Hauck have de?ned a testing method and quanti?cation metric for?exibility of recon?gurable hardware(Compton and Hauck2004). Other examples of relevant research in recon?gurable hardware include Kim et al.(1997),Chiou et al.(2005).

The work in this paper makes use of Brook(Buck et al.2004),a stream-based programming language which allows the programmer to write general-purpose applications for a GPU without worrying about the sometimes byzantine de-tails of GPU programming.Our experiments all use Chromium(Humphreys et al.2002)to intercept and analyze streams of graphics commands made by real applications.The primary advantage of using Chromium is that we ensure that our workloads are not contrived.Although we use Brook and Chromium without modi?cation,we have enhanced the Qsilver graphics architectural simulator(Shea?er et al.2004,2005)to model the necessary aspects of the fragment pipeline.A detailed description of our modi?cations to Qsilver and our experimental setup are presented in Sect.3and Sect.4.

4SSR for Improved Performance and Double-Precision in Graphics Hardware

Fragment processor

Stage 2

Stage 1 (+ Texture)

Branch Processor

Rasterizer

(a)Fragment processor

Crossbar

MUL

MUL

MUL

MUL

Texture Operations

SFU

(b)Stage1

ADD

ADD

ADD

ADD

Crossbar

Crossbar

SFU

MUL

MUL

MUL

MUL

Crossbar

(c)Stage2

Figure2.Baseline fragment units used for comparison.Stage2can take up to three4-channel operands,one of which directly feeds the ADD units and whose data path is represented here by dashed lines.Note the additional data paths that cascade the ADD units;these allow for a

single-pass dot product(Seifert2004).

3Simulation Setup

3.1The Qsilver Simulator

Qsilver is a simulation framework for graphics architectures that can simu-late low-level GPU activity for any existing OpenGL application(Shea?er et al.2004).Qsilver uses Chromium(Humphreys et al.2002)to intercept and transform an OpenGL application’s API calls and create an annotated trace that encapsulates geometry,timing,and state information.This trace serves as input to the Qsilver simulator core,which performs an accurate timing simulation of the graphics hardware and produces detailed statistics. Qsilver is con?gured at runtime with a description of its pipeline.In these experiments we simulate an NV4x-like architecture,with a pipeline con?gura-tion similar to that of NVIDIA’s6800GT,so we con?gure Qsilver to model a system with6vertex pipelines and16fragment pipelines.The fragments are tiled in blocks of2×2,so we e?ectively have4tile pipelines,each of which can process4fragments simultaneously.NV4x GPUs use a similar tiled con?guration in the fragment engine(Kilgari?and Fernando2005).

To account for modi?cations to the fragment pipeline,we enhanced Qsilver to track fragment shader activity.Our modi?ed Qsilver simulator stores a per-triangle identi?er which uniquely speci?es which,if any,fragment shader was bound when that triangle was being rendered.We also store the text of the fragment shaders so that they can be analyzed by the Qsilver simulator core.

Dale et.al5 3.2Baseline Architecture

Both of the following experiments hold?xed the graphics pipeline described above and focus on the programmable path of the fragment engine.While the NV4x vertex engine follows a MIMD architecture,its fragment engine is truly SIMD in nature.Additionally,in many modern games the majority of fragments are shaded by fragment programs(see Fig.3),so we focus our e?orts on the programmable path in the fragment engine.

Our baseline fragment pipeline,depicted in Fig.2,is similar to that found in NV4x GPUs1.A single fragment unit contains two stages;four-channel frag-ments(RGBA)reach stage1from either the rasterizer or fragment pipeline loopback.Stage2can execute instructions in parallel with stage1in dual-issue mode as well sequentially,taking its operands from the output of stage 1.Crossbars route operands to the appropriate functional units,and Special Function Units(SFUs)are used to perform special scalar operations like re-ciprocal square root.The fragment units can also operate in co-issue mode, whereby a single4-channel data path functions as two distinct data paths, with independent instructions executing in parallel,on the same unit,across these two data paths—e.g.,a3-vector and a scalar,or two2-vectors(Kilgari?and Fernando2005).

NVShaderPerf2—a utility that displays shader scheduling information for NVIDIA hardware—is used to schedule programs for our baseline architec-ture.While NV4x GPUs have dedicated hardware for performing common half-precision operations in parallel with full-precision operations,none of the fragment programs tested included any half-precision operations.However,to be sure of a legitimate comparison of performance along the full-precision path, NVShaderPerf is con?gured to schedule programs for our NV4x-like architec-ture using the full-precision path only.

3.3Benchmarks

For benchmarking,we use the recent game Doom III(see Fig.3),as well as four demo programs included with the Brook distribution.We use Chromium to intercept the fragment programs used in each benchmark and to generate traces for simulation under Qsilver.Each benchmark,along with its fragment programs,is summarized below.

(i)doom3,a representative50-frame demo from the game.Includes a shader

for general per–pixel lighting and a special e?ects shader.

1Based on those details that have been made available to the public or indirectly obtained via patents and extensive benchmark tests(see Kilgari?and Fernando2005,Seifert2004,for additional details). 2Uni?ed compiler version77.80.

6SSR for Improved Performance and Double-Precision in Graphics Hardware

(a)(b)

Figure3.Screen captures from the doom3benchmark.On the left,the color of each pixel is modulated to indicate which fragment program generated it.The right image is the unmodi?ed rendering from the game.Notice that the majority of pixels are generated by programmable

fragment shaders.

(ii)bitonic sort,a parallel sorting network.Includes a main sorting kernel and simple pass-thru kernel.

(iii)image proc(25,25),an image convolution shader.Includes a main con-volution kernel and pass–thru kernel.

(iv)particle cloth(5,10,15),a cloth simulation.Includes six kernels that im-plement the simulation.

(v)volume division(100),a volume isosurface extractor.Includes ten ker-nels for various stages of the extraction.

4Experiments and Results

In this section,we describe two experiments we performed to validate our hy-pothesis that using SSR components in a modern GPU architecture can bene?t certain applications.We show improved performance across a set of test ap-plications with only a minimal impact on GPU die area and also demonstrate that double-precision?oating point capabilities can be added to the fragment pipeline without a?ecting the performance of single-precision applications. 4.1Increased Throughput

We?rst compared the simulated performance of the NV4x-like fragment pipeline to that of an SSR fragment pipeline architecture,whose fragment units are depicted in Fig.4.

Dale et.al7 Crossbar

FAC

FAC

FAC

FAC

Texture Operations

SFU (a)Stage1

FAC

FAC

FAC

FAC

Crossbar

Crossbar

SFU

FAC

FAC

FAC

FAC

Crossbar

(b)Stage2

Figure4.Proposed SSR fragment units for the?rst experiment.FAC modules are Flexible Arithmetic Units,and they replace each of the ADD and MUL units in our baseline architecture.

4.1.1Target SSR architecture.The fragment units in our target SSR ar-chitecture are similar to those in the baseline architecture;however,we replace both the multipliers and adders in stages1and2with single-precision Flexible Arithmetic Units(FACs).An FAC can be very quickly recon?gured to per-form either a multiplication or an addition and uses only slightly more gates than a multiplier.With current technology,these FACs can produce a result every cycle and can be recon?gured between cycles,assuming a400MHz clock and a two-stage pipeline(Vijay Kumar and Lach2003).Finally,in the?rst set of FACs in our SSR architecture,we duplicate the accumulate data paths from the baseline architecture’s ADD units.These data paths require a trivial amount of additional area overhead.

In addition to supporting all the existing functionality of our baseline units, the modi?ed SSR units provide new scheduling opportunities beyond those of the baseline.First,the baseline fragment pipe is only capable of performing a single full-precision4-vector addition per pass in stage2(Seifert2004),while the SSR pipeline is capable of performing three in one pass—one in stage1and two chained additions in stage2(see Fig.5a).Moreover,there is more freedom to schedule dot product and multiply-accumulate operations,both of which are extremely common in fragment programs.For example,the SSR pipeline can execute a32-bit3-channel dot product(DP3)and dependent scalar-vector multiplication—e.g.,the expression( a· b) c—in a single pass by computing the per-channel multiply of a and b in stage1,accumulating the channel products to obtain a· b in the?rst set of FACs in stage2,and performing a scalar-vector multiply in its second set of FACs(Fig.5b).Extending this scheduling approach to co-issue con?gurations is straightforward.

8SSR for Improved Performance and Double-Precision in Graphics Hardware Stage 2Stage 1 (+ Texture)FAC x 4 (1)FAC x 4 (3)FAC x 4 (2)+++

(a)Single-pass con?guration for multi-ple 4-vector additions.Stage 2

Stage 1 (+ Texture)

FAC x 4 (1)

FAC x 4 (3)

FAC x 4 (2)x x +(b)Single-pass con?guration to com-

pute ( a · b ) c .

Figure 5.Two example con?gurations that provide additional scheduling opportunities for the

SSR fragment pipeline.

4.1.2Shader analysis.Considering the additional scheduling opportunities provided by the SSR architecture,as well as the known scheduling constraints of NV4x GPUs,we hand-scheduled each fragment program for our SSR frag-ment engine.As was done for the fragment program schedules on the baseline architecture,we limited program schedules for the SSR architecture to the full-precision path as well.

With 16fragment pipelines and a 400Mz clock,both architectures have a maximum throughput of 6GP/s (gigapixels per second).This assumes a 1-cycle texture lookup.Results for the benchmarks and their constituent shaders are given in Table 1.For most of the shaders,the SSR architecture provides an improvement over the baseline.The amount of improvement of course varies from shader to shader,depending upon the mix of instructions and corre-sponding scheduling advantages for the SSR architecture.Frequent dot prod-uct and multiply-accumulate instructions across the benchmarks led to the performance improvements for the SSR architecture seen in Table 1.

4.1.3Full pipeline simulation.There are a number of possible bottlenecks in the full graphics pipeline,however,that can prevent performance improve-ments in the fragment engine core from being realized across the full pipeline.First,available memory bandwidth and cache performance in the texture units can prevent the pipeline from achieving maximum pixel throughput.GPUs typically incorporate a high degree of multithreading to hide the latency in-troduced by texture memory accesses.For example,when a shader unit arrives at a texture memory instruction,the memory access is initialized and the frag-

Dale et.al9 Table 1.Per-pixel execution time and pixel throughput,in megapixels per-second(MP/s),for the baseline and target fragment units,assuming a1-cycle texture lookup.

Benchmark Shader Execution time(cycles)Throughput(MP/s)

Baseline SSR Baseline SSR bitonic sort01514426.67457.14

1116400.006400.00 doom301211533.33581.82

177914.29914.29 image proc02826228.57246.15

1116400.006400.00 particle cloth0661066.671066.67

11615400.00426.67

21817355.56376.47

3108640.00800.00

4119581.82711.11

5223200.003200.00 volume division0223200.003200.00

1116400.006400.00

22220290.91320.00

33026213.33246.15

41916336.84400.00

55752112.28123.08

61513426.67492.31

71313492.31492.31

85652114.29123.08

9116400.006400.00 ment is temporarily banked until the memory operation is complete.A context switch occurs immediately,at which point the shader unit can continue doing useful work on another fragment’s thread.In a SIMD environment,the thread pool is su?ciently large to e?ectively hide memory latency in this manner, provided that there is also su?cient memory bandwidth and a large ratio of arithmetic instructions to memory instructions.While texture access patterns vary across the benchmarks included here,all of their shaders contain a large percentage of arithmetic instructions,enabling e?ective latency-hiding.Qsil-ver employs a simple probabilistic cache model that can reasonably capture this phenomenon(Shea?er et al.2004)in the full pipeline simulation.

The vertex processor can also be a system bottleneck,particularly for scenes with a large number of extremely small(in screen-space)polygons.However, for many GPGPU applications,and for all of those included in the suite of benchmarks here,the vertex processor is relatively inactive.A typical GPGPU application only renders screen-?lling quadrilaterals;this makes for an ex-tremely low polygon-to-fragment ratio and a nearly-idle vertex processor. Qsilver’s queue-based simulation architecture models the graphics pipeline at a resolution su?cient to capture the relative amounts of activity between the fragment and vertex engines.This is accomplished by aggregating architectural performance counter data across the course of the simulation.Counters for pre-transform-and-lighting vertex queues,as well as those for fragment creation,

10SSR for Improved Performance and Double-Precision in Graphics Hardware 0 0.2

0.4

0.6 0.8 1 0 100 200 300 400 500 600 700 800 900 1000

N o r m a l i z e d F u n c t i o n a l U n i t A c t i v i t y Tens of Thousands of Cycles

Vertex Processor Fragment Processor

Figure 6.Traces of pre-transform-and-lighting vertex queue writes and fragments created during a single frame of doom3,generated from Qsilver counter data.Fragment activity is at its maximum for a substantial portion of the frame,while vertex activity reaches its peak only once.This frame is representative of all frames across the doom3benchmark,indicating that the benchmark is

predominantly fragment-bound.

are among Qsilver’s counters.Full queues block earlier stages in the pipeline and so imply vertex-or fragment-bound behavior,respectively (Shea?er et al.2004).Fig.6was generated from these Qsilver counters and indicates that,like the Brook benchmarks,the Doom III benchmark is also predominantly fragment-bound.

For doom3,there is also the possibility that the ?xed-function path in the fragment engine is used almost exclusively,given that the benchmark only uses two shaders across the entire 50-frame trace.However,the game per-forms per-pixel lighting with a fragment shader for most objects in the game,using OpenGL ?xed-function lighting rarely.The 1-cycle improvement in this heavily-used shader should likely show a signi?cant performance increase over the entire pipeline as well.

Fig.7shows performance results from full-pipeline simulations under Qsil-ver,using the fragment program schedules discussed in the previous section.For all benchmarks,speedup over the entire graphics pipeline for the SSR ar-chitecture does indeed correspond with the fragment processor performance results in Table 1.This indicates that all benchmarks are su?ciently frag-ment processor-bound for the performance increase in the fragment units to translate to a corresponding improvement over the full pipeline.

Equally as important,based on conservative inverter-equivalent gate count

Dale et.al

11

Benchmark F u l l p i p e l i n e s p e e d u p Figure 7.Speedup for the target SSR architecture over the baseline for a full pipeline simulation

under Qsilver.

estimates 1,each FAC requires 12,338gates,only 710more than a single-precision multiplier (11,628gates).Replacing the adders (7,782gates)requires 4,556additional gates.This additionally requires the small overhead of a mul-tiplexer to con?gure the FACs.Given these gate estimates,with 16fragment pipelines,the cost of our proposed use of SSR is 382,464gates,which is less than 0.2%of the total area of NVIDIA’s 6800GT (an estimated 222million transistors(Medvedev and Budankov 2004)).

4.2Dual-Mode IEEE Adders and Multipliers

The GPGPU and scienti?c computing communities would like to have the ability to perform double-precision calculations on the GPU.Unfortunately for them,the gaming industry drives the graphics hardware industry,and games do not currently require double-precision.We present a method here that can satisfy the demands of the scienti?c community without compromising the performance of the single-precision path so crucial to video game performance.A dual-mode ?oating point unit is a small-scale recon?gurable unit capa-ble of performing two simultaneous single-precision operations or one double-precision operation.Dual-mode units can be fully pipelined to produce results every cycle.Like other SSR units,dual-mode multipliers and adders require internal multiplexers for path selection.Additionally,they require a rounding unit capable of ?exible rounding modes.The total additional structure for this modi?cation is insigni?cant (Even et al.1997).These units are also capable of operating at the modest 400MHz clock speed of our baseline architecture.1All area estimates are given in terms of inverter-equivalent gate area unless otherwise speci?ed.

12SSR for Improved Performance and Double-Precision in Graphics Hardware

4.2.1Target SSR architecture.We simulate a pipeline in Qsilver that uses dual-mode multipliers and adders in the fragment engine,where we replace pairs of single-precision FPUs in the baseline architecture with a single corre-sponding dual-mode FPU.This e?ectively gives us an8-wide double-precision fragment engine with approximately half the throughput of the single-precision con?guration.Double-precision con?guration also requires that we retask pairs of32-bit registers as single64-bit registers.With half as many fragment pipelines,each double-precision pipe has the same number of available64-bit registers as each single-precision pipe has32-bit registers(four32-bit registers per fragment in the case of NV4x GPUs(Kilgari?and Fernando2005)).By a similar argument,the bandwidth requirements for the memory and register bus systems in8-wide double-precision mode should not exceed those of the original16-wide single-precision con?guration.

We have conservative area estimates for a double-precision adder and mul-tiplier of13,456gates and37,056gates,respectively.The real overhead here comes from replacing each pair of single-precision FPUs with one dual-mode FPU,at an approximate cost of815,744gates over the entire fragment en-gine,or0.4%of the6800GT’s total area.Note that we have modi?ed only the multiplication and addition units,so additional precision is not available for specialized operations such as logarithms or square roots.Although many scienti?c applications would bene?t greatly from high precision addition and multiplication alone,a full double-precision arithmetic engine would be ideal. Dual-mode reciprocal,square-root,logarithm,and other specialized units are a topic for future exploration.

Table 2.Single-and double-precision GPGPU computations using SSR.Each application comes with the Brook distribution.The32-bit cycles row shows the GPU cycle count for our NV4x-like architecture.Note that these timings are identical whether we are using a dual-mode unit con?gured in single-precision mode or a dedicated single-precision unit.The64-bit cycles row shows the cycles required for double-precision after recon?guration.As expected,none of the programs takes more than twice as long with double-precision than with single-precision.

Benchmark bitonic sort image proc particle cloth volume division 32-bit cycles6201,25219,504254,923,418

64-bit cycles11772,44538,959509,846,783

32→64-bit speedup.527.512.501.500

4.2.2Full pipeline simulation.To validate our SSR-based graphics archi-tecture capable of both single-and double-precision,we traced the four Brook benchmarks through Qsilver.Results are summarized in Table2.This table lists the cycle counts for each application in both single-and double-precision modes.Note that the double-precision calculations never require more than twice as long as the corresponding single-precision calculation.Because the

Dale et.al13 timing results are identical for dual-mode units con?gured in single-precision mode and dedicated single-precision units,we have shown that by using SSR we can add double-precision addition and multiplication to the graph-ics pipeline with only a modest increase in gate count and without a?ecting the performance of the commonly-used single-precision path.

5Conclusions

We have extended Qsilver to record information on fragment program state in its annotated trace.Our modi?ed Qsilver core then uses this new informa-tion,along with fragment program listings and timing information,to model the programmable fragment engine of an NV4x-like architecture.With this framework in place,we have demonstrated the applicability of Small-Scale Recon?gurability to graphics architectures.We have shown that it is possible to increase the throughput of the fragment engine with only a small increase in die area.In addition,we have demonstrated that dual-mode multipliers and adders can provide double-precision in the fragment engine to support scien-ti?c computing in the GPGPU community with no detriment to the gamers who drive the market.The vector-like operations performed on GPUs make them a particularly good target for such techniques,since need for recon?gu-ration is rare in SIMD environments,and since the cost of recon?guration is amortized over many operations.

6Future Work

The fragment engine is one of many elements of the graphics pipeline.Ap-plications of SSR will likely yield similar performance improvements in other units as well.Another area of exploration that is likely to be fruitful for SSR is power consumption.Whenever portions of a chip are unused,they use no dynamic power,but they leak static power.By their very nature,SSR com-ponents are rarely idle,and should therefore leak a minimum of static power. Power leakage is currently a major issue with GPUs,and reducing leakage be-comes crucial as continuing improvements in chip manufacturing technology exacerbate this problem(Shea?er et al.2004).

7Acknowledgments

We would like to thank John Lach for his input on SSR and Peter Djeu for his collaboration on Chromium extensions.This work was funded by NSF grants CCF-0429765,CCR-0306404,and CCF-0205324.

14REFERENCES

REFERENCES

Buck,I.,Foley,T.,Horn,D.,Sugerman,J.,Fatahalian,K.,Houston,M.,and Han-rahan,P.(2004),‘Brook for GPUs:Stream computing on graphics hardware’, ACM Transactions on Graphics.

Chiou,L.-Y.,Bhunia,S.and Roy,K.(2005),‘Synthesis of application-speci?c highly e?cient multi-mode cores for embedded systems’,ACM Transactions on Embed-ded Computing Systems.

Chiricescu,S.,Schuette,M.,Glinton,R.and Schmit,H.(2002),Morphable multi-pliers,in‘Proceedings of the International Conference on Field Programmable Logic and Applications’.

Compton,K.and Hauck,S.(2004),Flexibility measurement of domain-speci?c recon-?gurable hardware,in‘Proceedings of the ACM/SIGDA Symposium on Field-programmable Gate Arrays’.

Even,G.,Mueller,S.M.and Seidel,P.-M.(1997),A dual mode IEEE multiplier,in ‘Proceedings of the International Conference on Innovative Systems in Silicon’. Guerra,L.M.,Potkonjak,M.and Rabaey,J.M.(1998),‘Behavioral-level synthesis of heterogeneous BISR recon?gurable ASIC’s’,IEEE Transactions on VLSI. Humphreys,G.,Houston,M.,Ng,R.,Ahern,S.,Frank,R.,Kirchner,P.and Klosowski,J.T.(2002),‘Chromium:A stream processing framework for in-teractive graphics on clusters of workstations’,ACM Transactions on Graphics 21(3),693–702.

Kilgari?, E.and Fernando,R.(2005),The GeForce6Series GPU Architecture, Addison-Wesley Pub Co,pp.471–491.

Kim,K.,Karri,R.and Potkonjak,M.(1997),Synthesis of application speci?c pro-grammable processors,in‘Proceedings of Design Automation’.

Medvedev,A.and Budankov,K.(2004),‘NVIDIA GeForce6800Ultra(NV40)’.http: //https://www.wendangku.net/doc/8316305976.html,/articles2/gffx/nv40-part1-a.html.

Seifert,A.(2004),‘NV40technology explained’.https://www.wendangku.net/doc/8316305976.html,/artikel/ nv40pipeline/index3e.php.

Shea?er,J.W.,Luebke,D.P.and Skadron,K.(2004),A?exible simulation frame-work for graphics architectures,in‘Proceedings of SIGGRAPH/Eurographics Workshop on Graphics Hardware’.

Shea?er,J.W.,Skadron,K.and Luebke,D.P.(2005),Studying thermal management for graphics-processor architectures,in‘Proceedings of2005IEEE International Symposium on Performance Analysis of Systems and Software’.

Vijay Kumar,V.and Lach,J.(2003),Designing,scheduling,and allocating?exible arithmetic components,in‘Proceedings of the International Conference on Field Programmable Logic and Applications’.

to与for的用法和区别

to与for的用法和区别 一般情况下, to后面常接对象; for后面表示原因与目的为多。 Thank you for helping me. Thanks to all of you. to sb.表示对某人有直接影响比如,食物对某人好或者不好就用to; for表示从意义、价值等间接角度来说,例如对某人而言是重要的,就用for. for和to这两个介词,意义丰富,用法复杂。这里仅就它们主要用法进行比较。 1. 表示各种“目的” 1. What do you study English for? 你为什么要学英语? 2. She went to france for holiday. 她到法国度假去了。 3. These books are written for pupils. 这些书是为学生些的。 4. hope for the best, prepare for the worst. 作最好的打算,作最坏的准备。 2.对于 1.She has a liking for painting. 她爱好绘画。 2.She had a natural gift for teaching. 她对教学有天赋/ 3.表示赞成同情,用for不用to. 1. Are you for the idea or against it? 你是支持还是反对这个想法? 2. He expresses sympathy for the common people.. 他表现了对普通老百姓的同情。 3. I felt deeply sorry for my friend who was very ill. 4 for表示因为,由于(常有较活译法) 1 Thank you for coming. 谢谢你来。 2. France is famous for its wines. 法国因酒而出名。 5.当事人对某事的主观看法,对于(某人),对…来说(多和形容词连用)用介词to,不用for.. He said that money was not important to him. 他说钱对他并不重要。 To her it was rather unusual. 对她来说这是相当不寻常的。 They are cruel to animals. 他们对动物很残忍。 6.for和fit, good, bad, useful, suitable 等形容词连用,表示适宜,适合。 Some training will make them fit for the job. 经过一段训练,他们会胜任这项工作的。 Exercises are good for health. 锻炼有益于健康。 Smoking and drinking are bad for health. 抽烟喝酒对健康有害。 You are not suited for the kind of work you are doing. 7. for表示不定式逻辑上的主语,可以用在主语、表语、状语、定语中。 1.It would be best for you to write to him. 2.The simple thing is for him to resign at once. 3.There was nowhere else for me to go. 4.He opened a door and stood aside for her to pass.

初中语文古文赏析曹操《短歌行》赏析(林庚)

教育资料 《短歌行》 《短歌行》赏析(林庚) 曹操这一首《短歌行》是建安时代杰出的名作,它代表着人生的两面,一方面是人生的忧患,一方面是人生的欢乐。而所谓两面也就是人生的全面。整个的人生中自然含有一个生活的态度,这就具体地表现在成为《楚辞》与《诗经》传统的产儿。它一方面不失为《楚辞》中永恒的追求,一方面不失为一个平实的生活表现,因而也就为建安诗坛铺平了道路。 这首诗从“对酒当歌,人生几何”到“但为君故,沉吟至今”,充分表现着《楚辞》里的哀怨。一方面是人生的无常,一方面是永恒的渴望。而“呦呦鹿鸣”以下四句却是尽情的欢乐。你不晓得何以由哀怨这一端忽然会走到欢乐那一端去,转折得天衣无缝,仿佛本来就该是这么一回事似的。这才是真正的人生的感受。这一段如是,下一段也如是。“明明如月,何时可掇?忧从中来,不可断绝。越陌度阡,枉用相存。契阔谈宴,心念旧恩。月明星稀,乌鹊南飞。绕树三匝,何枝可依。”缠绵的情调,把你又带回更深的哀怨中去。但“山不厌高,海不厌深”,终于走入“周公吐哺,天下归心”的结论。上下两段是一个章法,但是你并不觉得重复,你只觉得卷在悲哀与欢乐的旋涡中,不知道什么时候悲哀没有了,变成欢乐,也不知道什么时候欢乐没有了,又变成悲哀,这岂不是一个整个的人生吗?把整个的人生表现在一个刹那的感觉上,又都归于一个最实在的生活上。“我有嘉宾,鼓瑟吹笙”,不正是当时的情景吗?“周公吐哺,天下归心”,不正是当时的信心吗? “青青子衿”到“鼓瑟吹笙”两段连贯之妙,古今无二。《诗经》中现成的句法一变而有了《楚辞》的精神,全在“沉吟至今”的点窜,那是“青青子衿”的更深的解释,《诗经》与《楚辞》因此才有了更深的默契,从《楚辞》又回到《诗经》,这样与《鹿鸣》之诗乃打成一片,这是一个完满的行程,也便是人生旅程的意义。“月明星稀”何以会变成“山不厌高,海不厌深”?几乎更不可解。莫非由于“明月出天山”,“海上生明月”吗?古辞说:“枯桑知天风,海水知天寒”,枯桑何以知天风,因为它高;海水何以知天寒,因为它深。唐人诗“一叶落知天下秋”,我们对于宇宙万有正应该有一个“知”字。然则既然是山,岂可不高?既然是海,岂可不深呢?“并刀如水,吴盐胜雪”,既是刀,就应该雪亮;既是盐,就应该雪白,那么就不必问山与海了。 山海之情,成为漫漫旅程的归宿,这不但是乌鹊南飞,且成为人生的思慕。山既尽其高,海既尽其深。人在其中乃有一颗赤子的心。孟子主尽性,因此养成他浩然之气。天下所以归心,我们乃不觉得是一个夸张。 .

延时子程序计算方法

学习MCS-51单片机,如果用软件延时实现时钟,会接触到如下形式的延时子程序:delay:mov R5,#data1 d1:mov R6,#data2 d2:mov R7,#data3 d3:djnz R7,d3 djnz R6,d2 djnz R5,d1 Ret 其精确延时时间公式:t=(2*R5*R6*R7+3*R5*R6+3*R5+3)*T (“*”表示乘法,T表示一个机器周期的时间)近似延时时间公式:t=2*R5*R6*R7 *T 假如data1,data2,data3分别为50,40,248,并假定单片机晶振为12M,一个机器周期为10-6S,则10分钟后,时钟超前量超过1.11秒,24小时后时钟超前159.876秒(约2分40秒)。这都是data1,data2,data3三个数字造成的,精度比较差,建议C描述。

上表中e=-1的行(共11行)满足(2*R5*R6*R7+3*R5*R6+3*R5+3)=999,999 e=1的行(共2行)满足(2*R5*R6*R7+3*R5*R6+3*R5+3)=1,000,001 假如单片机晶振为12M,一个机器周期为10-6S,若要得到精确的延时一秒的子程序,则可以在之程序的Ret返回指令之前加一个机器周期为1的指令(比如nop指令), data1,data2,data3选择e=-1的行。比如选择第一个e=-1行,则精确的延时一秒的子程序可以写成: delay:mov R5,#167 d1:mov R6,#171 d2:mov R7,#16 d3:djnz R7,d3 djnz R6,d2

djnz R5,d1 nop ;注意不要遗漏这一句 Ret 附: #include"iostReam.h" #include"math.h" int x=1,y=1,z=1,a,b,c,d,e(999989),f(0),g(0),i,j,k; void main() { foR(i=1;i<255;i++) { foR(j=1;j<255;j++) { foR(k=1;k<255;k++) { d=x*y*z*2+3*x*y+3*x+3-1000000; if(d==-1) { e=d;a=x;b=y;c=z; f++; cout<<"e="<

高中语文文言文曹操《短歌行(对酒当歌)》原文、翻译、赏析

曹操《短歌行【对酒当歌】》原文、翻译、赏析译文 原文 面对美酒应该高歌,人生短促日月如梭。对酒当歌,人生几何? 好比晨露转瞬即逝,失去的时日实在太多!譬如朝露,去日苦多。 席上歌声激昂慷慨,忧郁长久填满心窝。慨当以慷,忧思难忘。 靠什么来排解忧闷?唯有狂饮方可解脱。何以解忧?唯有杜康。 那穿着青领(周代学士的服装)的学子哟,你们令我朝夕思慕。青青子衿,悠悠我心。 正是因为你们的缘故,我一直低唱着《子衿》歌。但为君故,沉吟至今。 阳光下鹿群呦呦欢鸣,悠然自得啃食在绿坡。呦呦鹿鸣,食野之苹。 一旦四方贤才光临舍下,我将奏瑟吹笙宴请宾客。我有嘉宾,鼓瑟吹笙。 当空悬挂的皓月哟,你运转着,永不停止;明明如月,何时可掇? 我久蓄于怀的忧愤哟,突然喷涌而出汇成长河。忧从中来,不可断绝。 远方宾客踏着田间小路,一个个屈驾前来探望我。越陌度阡,枉用相存。 彼此久别重逢谈心宴饮,争着将往日的情谊诉说。契阔谈讌,心念旧恩。 明月升起,星星闪烁,一群寻巢乌鹊向南飞去。月明星稀,乌鹊南飞。 绕树飞了三周却没敛绕树三匝,何枝

翅,哪里才有它们栖身之 所? 可依? 高山不辞土石才见巍 峨,大海不弃涓流才见壮阔。(比喻用人要“唯才是举”,多多益善。)山不厌高,水不厌深。 只有像周公那样礼待贤 才(周公见到贤才,吐出口 中正在咀嚼的食物,马上接 待。《史记》载周公自谓: “一沐三握发,一饭三吐哺, 犹恐失天下之贤。”),才 能使天下人心都归向我。 周公吐哺,天 赏析 曹操是汉末杰出的政治家、军事家和文学家,他雅好诗章,好作乐府歌辞,今存诗22首,全是乐府诗。曹操的乐府诗多描写他本人的政治主张和统一天下的雄心壮志。如他的《短歌行》,充分表达了诗人求贤若渴以及统一天下的壮志。 《短歌行》是政治性很强的诗作,主要是为曹操当时所实行的政治路线和政策策略服务的,但是作者将政治内容和意义完全熔铸在浓郁的抒情意境之中,全诗充分发挥了诗歌创作的特长,准确而巧妙地运用了比兴手法,寓理于情,以情感人。诗歌无论在思想内容还是在艺术上都取得了极高的成就,语言质朴,立意深远,气势充沛。这首带有建安时代"志深比长""梗概多气"的时代特色的《短歌行》,读后不觉思接千载,荡气回肠,受到强烈的感染。 对酒当歌,人生几何? 譬如朝露,去日苦多。 慨当以慷,幽思难忘。 何以解忧,唯有杜康。 青青子衿,悠悠我心。 但为君故,沈吟至今。 呦呦鹿鸣,食野之苹。 我有嘉宾,鼓瑟吹笙。 明明如月,何时可掇? 忧从中来,不可断绝。 越陌度阡,枉用相存。 契阔谈,心念旧恩。 月明星稀,乌鹊南飞, 绕树三匝,何枝可依? 山不厌高,海不厌深, 周公吐哺,天下归心。 《短歌行》是汉乐府的旧题,属于《相和歌?平调曲》。这就是说它本来是一个乐曲的名称,这种乐曲怎么唱法,现在当然是不知道了。但乐府《相和歌?平调曲》中除了《短歌行》还有《长歌行》,唐代吴兢《乐府古题要解》引证古诗“长歌正激烈”,魏文帝曹丕《燕歌行》“短歌微吟不能长”和晋代傅玄《艳歌行》“咄来长歌续短歌”等句,认为“长歌”、“短

外国文学名著鉴赏期末论文

外国文学名著鉴赏期末论文院—系:数学学院 科目:外国文学名著鉴赏(期末论文)班级: 08级数学与应用数学A班 姓名:沈铁 学号: 200805050149 上课时段:周五晚十、十一节课

奋斗了,才有出路 ——读《鲁宾逊漂游记》有感小说《鲁宾逊漂游记》一直深受人们的喜爱。读完这篇小说,使我对人生应该有自己的一个奋斗历程而受益匪浅。当一个人已经处于绝境的时候,还能够满怀信心的去面对和挑战生活,实在是一种可贵的精神。他使我认识到,人无论何时何地,不管遇到多大的困难,都不能被困难所吓倒,我们要勇敢的面对困难,克服困难,始终保持一种积极向上、乐观的心态去面对。在当今社会只有努力去奋斗,才会有自己的出路! 其实现在的很多人都是那些遇到困难就退缩,不敢勇敢的去面对它。不仅如此,现在很多人都是独生子女,很多家长视子女为掌上明珠,不要说是冒险了,就连小小的家务活也不让孩子做,天天总是说:“我的小宝贝啊,你读好书就行了,其它的爸爸妈妈做就可以了。”读书固然重要,但生活中的小事也不能忽略。想一想,在荒无人烟的孤岛上,如果你连家务活都不会做,你能在那里生存吗?读完这部著作后,我不禁反问自己:“如果我像书中的鲁宾逊那样在大海遭到风暴,我能向他那样与风暴搏斗,最后逃离荒岛得救吗?恐怕我早已经被大海所淹没;如果我漂流到孤岛,能活几天?我又能干些什么?我会劈柴吗?会打猎做饭吗?我连洗洗自己的衣服还笨手笨脚的。”我们应该学习鲁宾逊这种不怕困难的精神,无论何时何地都有坚持地活下去,哪怕只有一线希望也要坚持到底,决不能放弃!我们要像鲁宾

逊那样有志气、有毅力、爱劳动,凭自己的双手创造财富,创造奇迹,取得最后的胜利。这样的例子在我们的生活中屡见不鲜。 《史记》的作者司马迁含冤入狱,可它依然在狱中完成《史记》一书,他之所以能完成此书,靠的也是他心中那顽强的毅力,永不放弃的不断努力的精神。著名作家爱迪生从小就生活在一个贫困的家庭中,可是他从小就表现出了科学方面的天赋。长大后爱迪生着力于电灯的发明与研究,他经过了九百多次的失败,可它依然没有放弃,不断努力,最后终于在第一千次实验中取得了成功。 鲁宾逊在岛上生活了二十八年,他面对了各种各样的困难和挫折,克服了许多常人无法想象的困难,自己动手,丰衣足食,以惊人的毅力,顽强的活了下来。他自从大船失事后,找了一些木材,在岛上盖了一间房屋,为防止野兽,还在房子周围打了木桩,来到荒岛,面对着的首要的就是吃的问题,船上的东西吃完以后,鲁宾逊开始打猎,有时可能会饿肚子,一是他决定播种,几年后他终于可以吃到自己的劳动成果,其实学习也是这样,也有这样一个循序渐进的过程,现在的社会,竞争无处不在,我们要懂得只有付出才会有收获,要勇于付出,在战胜困难的同时不断取得好成绩。要知道只有付出,才会有收获。鲁宾逊在失败后总结教训,终于成果;磨粮食没有石磨,他就用木头代替;没有筛子,就用围巾。鲁宾逊在荒岛上解决了自己的生存难题,面对人生挫折,鲁宾逊的所作所为充分显示了他坚毅的性格和顽强的精神。同样我们在学习上也可以做一些创新,养成一种创新精神,把鲁宾逊在荒岛,不畏艰险,不怕失败挫折,艰苦奋斗的精

常用介词用法(for to with of)

For的用法 1. 表示“当作、作为”。如: I like some bread and milk for breakfast. 我喜欢把面包和牛奶作为早餐。 What will we have for supper? 我们晚餐吃什么? 2. 表示理由或原因,意为“因为、由于”。如: Thank you for helping me with my English. 谢谢你帮我学习英语。 3. 表示动作的对象或接受者,意为“给……”、“对…… (而言)”。如: Let me pick it up for you. 让我为你捡起来。 Watching TV too much is bad for your health. 看电视太多有害于你的健康。 4. 表示时间、距离,意为“计、达”。如: I usually do the running for an hour in the morning. 我早晨通常跑步一小时。 We will stay there for two days. 我们将在那里逗留两天。 5. 表示去向、目的,意为“向、往、取、买”等。如: Let’s go for a walk. 我们出去散步吧。 I came here for my schoolbag.我来这儿取书包。 I paid twenty yuan for the dictionary. 我花了20元买这本词典。 6. 表示所属关系或用途,意为“为、适于……的”。如: It’s time for school. 到上学的时间了。 Here is a letter for you. 这儿有你的一封信。 7. 表示“支持、赞成”。如: Are you for this plan or against it? 你是支持还是反对这个计划? 8. 用于一些固定搭配中。如: Who are you waiting for? 你在等谁? For example, Mr Green is a kind teacher. 比如,格林先生是一位心地善良的老师。 尽管for 的用法较多,但记住常用的几个就可以了。 to的用法: 一:表示相对,针对 be strange (common, new, familiar, peculiar) to This injection will make you immune to infection. 二:表示对比,比较 1:以-ior结尾的形容词,后接介词to表示比较,如:superior ,inferior,prior,senior,junior 2: 一些本身就含有比较或比拟意思的形容词,如equal,similar,equivalent,analogous A is similar to B in many ways.

曹操《短歌行》其二翻译及赏析

曹操《短歌行》其二翻译及赏析 引导语:曹操(155—220),字孟德,小名阿瞒,《短歌行 二首》 是曹操以乐府古题创作的两首诗, 第一首诗表达了作者求贤若渴的心 态,第二首诗主要是曹操向内外臣僚及天下表明心迹。 短歌行 其二 曹操 周西伯昌,怀此圣德。 三分天下,而有其二。 修奉贡献,臣节不隆。 崇侯谗之,是以拘系。 后见赦原,赐之斧钺,得使征伐。 为仲尼所称,达及德行, 犹奉事殷,论叙其美。 齐桓之功,为霸之首。 九合诸侯,一匡天下。 一匡天下,不以兵车。 正而不谲,其德传称。 孔子所叹,并称夷吾,民受其恩。 赐与庙胙,命无下拜。 小白不敢尔,天威在颜咫尺。 晋文亦霸,躬奉天王。 受赐圭瓒,钜鬯彤弓, 卢弓矢千,虎贲三百人。 威服诸侯,师之所尊。 八方闻之,名亚齐桓。 翻译 姬昌受封为西伯,具有神智和美德。殷朝土地为三份,他有其中两分。 整治贡品来进奉,不失臣子的职责。只因为崇侯进谗言,而受冤拘禁。 后因为送礼而赦免, 受赐斧钺征伐的权利。 他被孔丘称赞, 品德高尚地位显。 始终臣服殷朝帝王,美名后世流传遍。齐桓公拥周建立功业,存亡继绝为霸 首。

聚合诸侯捍卫中原,匡正天下功业千秋。号令诸侯以匡周室,主要靠的不是 武力。 行为磊落不欺诈,美德流传于身后。孔子赞美齐桓公,也称赞管仲。 百姓深受恩惠,天子赐肉与桓公,命其无拜来接受。桓公称小白不敢,天子 威严就在咫尺前。 晋文公继承来称霸,亲身尊奉周天王。周天子赏赐丰厚,仪式隆重。 接受玉器和美酒,弓矢武士三百名。晋文公声望镇诸侯,从其风者受尊重。 威名八方全传遍,名声仅次于齐桓公。佯称周王巡狩,招其天子到河阳,因 此大众议论纷纷。 赏析 《短歌行》 (“周西伯昌”)主要是曹操向内外臣僚及天下表明心 迹,当他翦灭群凶之际,功高震主之时,正所谓“君子终日乾乾,夕惕若 厉”者,但东吴孙权却瞅准时机竟上表大说天命而称臣,意在促曹操代汉 而使其失去“挟天子以令诸侯”之号召, 故曹操机敏地认识到“ 是儿欲据吾著炉上郁!”故曹操运筹谋略而赋此《短歌行 ·周西伯 昌》。 西伯姬昌在纣朝三分天下有其二的大好形势下, 犹能奉事殷纣, 故孔子盛称 “周之德, 其可谓至德也已矣。 ”但纣王亲信崇侯虎仍不免在纣王前 还要谗毁文王,并拘系于羑里。曹操举此史实,意在表明自己正在克心效法先圣 西伯姬昌,并肯定他的所作所为,谨慎惕惧,向来无愧于献帝之所赏。 并大谈西伯姬昌、齐桓公、晋文公皆曾受命“专使征伐”。而当 今天下时势与当年的西伯、齐桓、晋文之际颇相类似,天子如命他“专使 征伐”以讨不臣,乃英明之举。但他亦效西伯之德,重齐桓之功,戒晋文 之诈。然故作谦恭之辞耳,又谁知岂无更讨封赏之意乎 ?不然建安十八年(公元 213 年)五月献帝下诏曰《册魏公九锡文》,其文曰“朕闻先王并建明德, 胙之以土,分之以民,崇其宠章,备其礼物,所以藩卫王室、左右厥世也。其在 周成,管、蔡不静,惩难念功,乃使邵康公赐齐太公履,东至于海,西至于河, 南至于穆陵,北至于无棣,五侯九伯,实得征之。 世祚太师,以表东海。爰及襄王,亦有楚人不供王职,又命晋文登为侯伯, 锡以二辂、虎贲、斧钺、禾巨 鬯、弓矢,大启南阳,世作盟主。故周室之不坏, 系二国是赖。”又“今以冀州之河东、河内、魏郡、赵国、中山、常 山,巨鹿、安平、甘陵、平原凡十郡,封君为魏公。锡君玄土,苴以白茅,爰契 尔龟。”又“加君九锡,其敬听朕命。” 观汉献帝下诏《册魏公九锡文》全篇,尽叙其功,以为其功高于伊、周,而 其奖却低于齐、晋,故赐爵赐土,又加九锡,奖励空前。但曹操被奖愈高,心内 愈忧。故曹操在曾早在五十六岁写的《让县自明本志令》中谓“或者人见 孤强盛, 又性不信天命之事, 恐私心相评, 言有不逊之志, 妄相忖度, 每用耿耿。

2008年浙师大《外国文学名著鉴赏》期末考试答案

(一)文学常识 一、古希腊罗马 1.(1)宙斯(罗马神话称为朱庇特),希腊神话中最高的天神,掌管雷电云雨,是人和神的主宰。 (2)阿波罗,希腊神话中宙斯的儿子,主管光明、青春、音乐、诗歌等,常以手持弓箭的少年形象出现。 (3)雅典那,希腊神话中的智慧女神,雅典城邦的保护神。 (4)潘多拉,希腊神话中的第一个女人,貌美性诈。私自打开了宙斯送她的一只盒子,里面装的疾病、疯狂、罪恶、嫉妒等祸患,一齐飞出,只有希望留在盒底,人间因此充满灾难。“潘多拉的盒子”成为“祸灾的来源”的同义语。 (5)普罗米修斯,希腊神话中造福人间的神。盗取天火带到人间,并传授给人类多种手艺,触怒宙斯,被锁在高加索山崖,受神鹰啄食,是一个反抗强暴、不惜为人类牺牲一切的英雄。 (6)斯芬克司,希腊神话中的狮身女怪。常叫过路行人猜谜,猜不出即将行人杀害;后因谜底被俄底浦斯道破,即自杀。后常喻“谜”一样的人物。与埃及狮身人面像同名。 2.荷马,古希腊盲诗人。主要作品有《伊利亚特》和《奥德赛》,被称为荷马史诗。《伊利亚特》叙述十年特洛伊战争。《奥德赛》写特洛伊战争结束后,希腊英雄奥德赛历险回乡的故事。马克思称赞它“显示出永久的魅力”。 3.埃斯库罗斯,古希腊悲剧之父,代表作《被缚的普罗米修斯》。6.阿里斯托芬,古希腊“喜剧之父”代表作《阿卡奈人》。 4.索福克勒斯,古希腊重要悲剧作家,代表作《俄狄浦斯王》。5.欧里庇得斯,古希腊重要悲剧作家,代表作《美狄亚》。 二、中世纪文学 但丁,意大利人,伟大诗人,文艺复兴的先驱。恩格斯称他是“中世纪的最后一位诗人,同时又是新时代的最初一位诗人”。主要作品有叙事长诗《神曲》,由地狱、炼狱、天堂三部分组成。《神曲》以幻想形式,写但丁迷路,被人导引神游三界。在地狱中见到贪官污吏等受着惩罚,在净界中见到贪色贪财等较轻罪人,在天堂里见到殉道者等高贵的灵魂。 三、文艺复兴时期 1.薄迦丘意大利人短篇小说家,著有《十日谈》拉伯雷,法国人,著《巨人传》塞万提斯,西班牙人,著《堂?吉诃德》。 2.莎士比亚,16-17世纪文艺复兴时期英国伟大的剧作家和诗人,主要作品有四大悲剧——《哈姆雷特》、《奥赛罗》《麦克白》、《李尔王》,另有悲剧《罗密欧与朱丽叶》等,喜剧有《威尼斯商人》《第十二夜》《皆大欢喜》等,历史剧有《理查二世》、《亨利四世》等。马克思称之为“人类最伟大的戏剧天才”。 四、17世纪古典主义 9.笛福,17-18世纪英国著名小说家,被誉为“英国和欧洲小说之父”,主要作品《鲁滨逊漂流记》,是英国第一部现实主义长篇小说。10.弥尔顿,17世纪英国诗人,代表作:长诗《失乐园》,《失乐园》,表现了资产阶级清教徒的革命理想和英雄气概。 25.拉伯雷,16世纪法国作家,代表作:长篇小说《巨人传》。 26.莫里哀,法国17世纪古典主义文学最重要的作家,法国古典主义喜剧的创建者,主要作品为《伪君子》《悭吝人》(主人公叫阿巴公)等喜剧。 五、18世纪启蒙运动 1)歌德,德国文学最高成就的代表者。主要作品有书信体小说《少年维特之烦恼》,诗剧《浮士德》。 11.斯威夫特,18世纪英国作家,代表作:《格列佛游记》,以荒诞的情节讽刺了英国现实。 12.亨利·菲尔丁,18世纪英国作家,代表作:《汤姆·琼斯》。 六、19世纪浪漫主义 (1拜伦, 19世纪初期英国伟大的浪漫主义诗人,代表作为诗体小说《唐璜》通过青年贵族唐璜的种种经历,抨击欧洲反动的封建势力。《恰尔德。哈洛尔游记》 (2雨果,伟大作家,欧洲19世纪浪漫主义文学最卓越的代表。主要作品有长篇小说《巴黎圣母院》、《悲惨世界》、《笑面人》、《九三年》等。《悲惨世界》写的是失业短工冉阿让因偷吃一片面包被抓进监狱,后改名换姓,当上企业主和市长,但终不能摆脱迫害的故事。《巴黎圣母院》 弃儿伽西莫多,在一个偶然的场合被副主教克洛德.孚罗洛收养为义子,长大后有让他当上了巴黎圣母院的敲钟人。他虽然十分丑陋而且有多种残疾,心灵却异常高尚纯洁。 长年流浪街头的波希米亚姑娘拉.爱斯梅拉达,能歌善舞,天真貌美而心地淳厚。青年贫诗人尔比埃尔.甘果瓦偶然同她相遇,并在一个更偶然的场合成了她名义上的丈夫。很有名望的副教主本来一向专心于"圣职",忽然有一天欣赏到波希米亚姑娘的歌舞,忧千方百计要把她据为己有,对她进行了种种威胁甚至陷害,同时还为此不惜玩弄卑鄙手段,去欺骗利用他的义子伽西莫多和学生甘果瓦。眼看无论如何也实现不了占有爱斯梅拉达的罪恶企图,最后竟亲手把那可爱的少女送上了绞刑架。 另一方面,伽西莫多私下也爱慕着波希米亚姑娘。她遭到陷害,被伽西莫多巧计救出,在圣母院一间密室里避难,敲钟人用十分纯朴和真诚的感情去安慰她,保护她。当她再次处于危急中时,敲钟人为了援助她,表现出非凡的英勇和机智。而当他无意中发现自己的"义父"和"恩人"远望着高挂在绞刑架上的波希米亚姑娘而发出恶魔般的狞笑时,伽西莫多立即对那个伪善者下了最后的判决,亲手把克洛德.孚罗洛从高耸入云的钟塔上推下,使他摔的粉身碎骨。 (3司汤达,批判现实主义作家。代表作《红与黑》,写的是不满封建制度的平民青年于连,千方百计向上爬,最终被送上断头台的故事。“红”是将军服色,指“入军界”的道路;“黑”是主教服色,指当神父、主教的道路。 14.雪莱,19世纪积极浪漫主义诗人,欧洲文学史上最早歌颂空想社会主义的诗人之一,主要作品为诗剧《解放了的普罗米修斯》,抒情诗《西风颂》等。 15.托马斯·哈代,19世纪英国作家,代表作:长篇小说《德伯家的苔丝》。 16.萨克雷,19世纪英国作家,代表作:《名利场》 17.盖斯凯尔夫人,19世纪英国作家,代表作:《玛丽·巴顿》。 18.夏洛蒂?勃朗特,19世纪英国女作家,代表作:长篇小说《简?爱》19艾米丽?勃朗特,19世纪英国女作家,夏洛蒂?勃朗特之妹,代表作:长篇小说《呼啸山庄》。 20.狄更斯,19世纪英国批判现实主义文学的重要代表,主要作品为长篇小说《大卫?科波菲尔》、《艰难时世》《双城记》《雾都孤儿》。21.柯南道尔,19世纪英国著名侦探小说家,代表作品侦探小说集《福尔摩斯探案》是世界上最著名的侦探小说。 七、19世纪现实主义 1、巴尔扎克,19世纪上半叶法国和欧洲批判现实主义文学的杰出代表。主要作品有《人间喜剧》,包括《高老头》、《欧也妮·葛朗台》、《贝姨》、《邦斯舅舅》等。《人间喜剧》是世界文学中规模最宏伟的创作之一,也是人类思维劳动最辉煌的成果之一。马克思称其“提供了一部法国社会特别是巴黎上流社会的卓越的现实主义历史”。

单片机C延时时间怎样计算

C程序中可使用不同类型的变量来进行延时设计。经实验测试,使用unsigned char类型具有比unsigned int更优化的代码,在使用时 应该使用unsigned char作为延时变量。以某晶振为12MHz的单片 机为例,晶振为12M H z即一个机器周期为1u s。一. 500ms延时子程序 程序: void delay500ms(void) { unsigned char i,j,k; for(i=15;i>0;i--) for(j=202;j>0;j--) for(k=81;k>0;k--); } 计算分析: 程序共有三层循环 一层循环n:R5*2 = 81*2 = 162us DJNZ 2us 二层循环m:R6*(n+3) = 202*165 = 33330us DJNZ 2us + R5赋值 1us = 3us 三层循环: R7*(m+3) = 15*33333 = 499995us DJNZ 2us + R6赋值 1us = 3us

循环外: 5us 子程序调用 2us + 子程序返回2us + R7赋值 1us = 5us 延时总时间 = 三层循环 + 循环外 = 499995+5 = 500000us =500ms 计算公式:延时时间=[(2*R5+3)*R6+3]*R7+5 二. 200ms延时子程序 程序: void delay200ms(void) { unsigned char i,j,k; for(i=5;i>0;i--) for(j=132;j>0;j--) for(k=150;k>0;k--); } 三. 10ms延时子程序 程序: void delay10ms(void) { unsigned char i,j,k; for(i=5;i>0;i--) for(j=4;j>0;j--) for(k=248;k>0;k--);

for和to区别

1.表示各种“目的”,用for (1)What do you study English for 你为什么要学英语? (2)went to france for holiday. 她到法国度假去了。 (3)These books are written for pupils. 这些书是为学生些的。 (4)hope for the best, prepare for the worst. 作最好的打算,作最坏的准备。 2.“对于”用for (1)She has a liking for painting. 她爱好绘画。 (2)She had a natural gift for teaching. 她对教学有天赋/ 3.表示“赞成、同情”,用for (1)Are you for the idea or against it 你是支持还是反对这个想法? (2)He expresses sympathy for the common people.. 他表现了对普通老百姓的同情。 (3)I felt deeply sorry for my friend who was very ill. 4. 表示“因为,由于”(常有较活译法),用for (1)Thank you for coming. 谢谢你来。

(2)France is famous for its wines. 法国因酒而出名。 5.当事人对某事的主观看法,“对于(某人),对…来说”,(多和形容词连用),用介词to,不用for. (1)He said that money was not important to him. 他说钱对他并不重要。 (2)To her it was rather unusual. 对她来说这是相当不寻常的。 (3)They are cruel to animals. 他们对动物很残忍。 6.和fit, good, bad, useful, suitable 等形容词连用,表示“适宜,适合”,用for。(1)Some training will make them fit for the job. 经过一段训练,他们会胜任这项工作的。 (2)Exercises are good for health. 锻炼有益于健康。 (3)Smoking and drinking are bad for health. 抽烟喝酒对健康有害。 (4)You are not suited for the kind of work you are doing. 7. 表示不定式逻辑上的主语,可以用在主语、表语、状语、定语中。 (1)It would be best for you to write to him. (2) The simple thing is for him to resign at once.

外国名著赏析论文

题目:浅析从简爱到女性的尊严和爱 学院工商学院 专业新闻学3 学号 姓名闫万里 学科外国文学名着赏析 [摘要] 十九世纪中期,英国伟大的女性存在主义先驱,着名作家夏洛蒂勃朗特创作出了她的代表作--《简爱》,当时轰动了整个文坛,它是一部具有浓厚浪漫主义色彩的现实主义小说,被认为是作者"诗意的生平"的写照。它在问世后的一百多年里,它始终保持着历史不败的艺术感染力。直到现在它的影响还继续存在。在作品的序幕、发展、高潮和结尾中,女主人公的叛逆、自由、平等、自尊、纯洁的个性都是各个重点章节的主旨,而这些主旨则在女主人公的爱情观中被展露的淋漓尽致,它们如同乌云上方的星汉,灼灼闪耀着光芒,照亮着后来的女性者们追求爱情的道路。? [关键词] 自尊个性独特新女性主义自由独立平等 《简爱》是一部带有自转色彩的长篇小说,它阐释了这样一个主题:人的价值=尊严+爱。从小就成长在一个充满暴力的环境中的简爱,经历了同龄人没有的遭遇。她要面对的是舅妈的毫无人性的虐待,表兄的凶暴专横和表姐的傲慢冷漠,尽管她尽力想“竭力赢得别人的好感”,但是事实告诉她这都是白费力气的,因此她发出了“不公平啊!--不公平!”的近乎绝望的呼喊。不公平的生长环境,使得简爱从小就向往平等、自由和爱,这些愿望在她后来的成长过程中表现无疑,

譬如在她的爱情观中的种种体现。? 1.桑菲尔德府? 谭波儿小姐因为出嫁,离开了洛伍德学校,同时也离开了简爱,这使简爱感觉到了“一种稳定的感觉,一切使我觉得洛伍德学校有点像我家的联想,全都随着她消失了”,她意识到:真正的世界是广阔的,一个充满希望和忧虑、激动和兴奋的变化纷呈的天地,正等待着敢于闯入、甘冒风险寻求人生真谛的人们。意识形态的转变促使着简爱走向更广阔的社会,接受社会的挑战,尽管她才只有十八岁。于是,简爱来到了桑菲尔德府,当了一名在当时地位不高的家庭教师。?桑菲尔德府使简爱感受到“这儿有想象中的完美无缺的家庭安乐气氛”,事实证明了她的预感的正确性,。从和简爱相见、相识到相爱的过程当中,简爱的那种叛逆精神、自强自尊的品质深深地征服了罗切斯特,而罗切斯特的优雅风度和渊博知识同样也征服了简爱。最初开始,简爱一直以为罗切斯特会娶高贵漂亮的英格拉姆为妻,她在和罗切斯特谈到婚姻时,曾经义正言辞的对罗切斯特说:“你以为因为我穷,低微,不美,矮小,就没有灵魂了吗?你想错了!我跟你一样有灵魂—也同样有一颗心!我现在不是凭着肉体凡胎跟你说话,而是我的心灵在和你的心灵说话,就好像我们都已经离开人世,两人平等地站在上帝面前—因为我们本来就是平等的。”这充分表现出简爱的叛逆,她这种维护妇女独立人格、主张婚姻独立自主以及男女平等的主张可以看成是他对整个人类社会自由平等的向往追求,罗切斯特正是爱上了她这样的独特个性,同时他也同样重复道:我们本来就是平等的。罗切斯特自始自终爱的是简爱的心灵—有着意志的力量,美德和纯洁的心灵,正是基于如此,简爱才真正的爱着罗切斯特。因为爱情是来不得半点虚假的,一方为另一方付出了真情的爱,假如得到对方的是虚情假意,那么这份爱

51单片机延时时间计算和延时程序设计

一、关于单片机周期的几个概念 ●时钟周期 时钟周期也称为振荡周期,定义为时钟脉冲的倒数(可以这样来理解,时钟周期就是单片机外接晶振的倒数,例如12MHz的晶振,它的时间周期就是1/12 us),是计算机中最基本的、最小的时间单位。 在一个时钟周期内,CPU仅完成一个最基本的动作。 ●机器周期 完成一个基本操作所需要的时间称为机器周期。 以51为例,晶振12M,时钟周期(晶振周期)就是(1/12)μs,一个机器周期包 执行一条指令所需要的时间,一般由若干个机器周期组成。指令不同,所需的机器周期也不同。 对于一些简单的的单字节指令,在取指令周期中,指令取出到指令寄存器后,立即译码执行,不再需要其它的机器周期。对于一些比较复杂的指令,例如转移指令、乘法指令,则需要两个或者两个以上的机器周期。 1.指令含义 DJNZ:减1条件转移指令 这是一组把减1与条件转移两种功能结合在一起的指令,共2条。 DJNZ Rn,rel ;Rn←(Rn)-1 ;若(Rn)=0,则PC←(PC)+2 ;顺序执行 ;若(Rn)≠0,则PC←(PC)+2+rel,转移到rel所在位置DJNZ direct,rel ;direct←(direct)-1 ;若(direct)= 0,则PC←(PC)+3;顺序执行 ;若(direct)≠0,则PC←(PC)+3+rel,转移到rel 所在位置 2.DJNZ Rn,rel指令详解 例:

MOV R7,#5 DEL:DJNZ R7,DEL; rel在本例中指标号DEL 1.单层循环 由上例可知,当Rn赋值为几,循环就执行几次,上例执行5次,因此本例执行的机器周期个数=1(MOV R7,#5)+2(DJNZ R7,DEL)×5=11,以12MHz的晶振为例,执行时间(延时时间)=机器周期个数×1μs=11μs,当设定立即数为0时,循环程序最多执行256次,即延时时间最多256μs。 2.双层循环 1)格式: DELL:MOV R7,#bb DELL1:MOV R6,#aa DELL2:DJNZ R6,DELL2; rel在本句中指标号DELL2 DJNZ R7,DELL1; rel在本句中指标号DELL1 注意:循环的格式,写错很容易变成死循环,格式中的Rn和标号可随意指定。 2)执行过程

双宾语tofor的用法

1. 两者都可以引出间接宾语,但要根据不同的动词分别选用介词to 或for: (1) 在give, pass, hand, lend, send, tell, bring, show, pay, read, return, write, offer, teach, throw 等之后接介词to。 如: 请把那本字典递给我。 正:Please hand me that dictionary. 正:Please hand that dictionary to me. 她去年教我们的音乐。 正:She taught us music last year. 正:She taught music to us last year. (2) 在buy, make, get, order, cook, sing, fetch, play, find, paint, choose,prepare, spare 等之后用介词for 。如: 他为我们唱了首英语歌。 正:He sang us an English song. 正:He sang an English song for us. 请帮我把钥匙找到。 正:Please find me the keys. 正:Please find the keys for me. 能耽搁你几分钟吗(即你能为我抽出几分钟吗)? 正:Can you spare me a few minutes? 正:Can you spare a few minutes for me? 注:有的动词由于搭配和含义的不同,用介词to 或for 都是可能的。如: do sb a favou r do a favour for sb 帮某人的忙 do sb harnn= do harm to sb 对某人有害

短歌行赏析介绍

短歌行赏析介绍 说道曹操, 大家一定就联想到三国那些烽火狼烟岁月吧。 但是曹操其实也是 一位文学 大家,今天就来分享《短歌行 》赏析。 《短歌行》短歌行》是汉乐府旧题,属于《相和歌辞·平调曲》。这就是说 它本来是一个乐曲名称。最初古辞已经失传。乐府里收集同名诗有 24 首,最早 是曹操这首。 这种乐曲怎么唱法, 现在当然是不知道。 但乐府 《相和歌·平调曲》 中除《短歌行》还有《长歌行》,唐代吴兢《乐府古题要解》引证古诗 “长歌正激烈”, 魏文帝曹丕 《燕歌行》 “短歌微吟不能长”和晋代傅玄 《艳 歌行》 “咄来长歌续短歌”等句, 认为“长歌”、 “短歌”是指“歌声有长短”。 我们现在也就只能根据这一点点材料来理解《短歌行》音乐特点。《短歌行》这 个乐曲,原来当然也有相应歌辞,就是“乐府古辞”,但这古辞已经失传。现在 所能见到最早《短歌行》就是曹操所作拟乐府《短歌行》。所谓“拟乐府”就是 运用乐府旧曲来补作新词,曹操传世《短歌行》共有两首,这里要介绍是其中第 一首。 这首《短歌行》主题非常明确,就是作者希望有大量人才来为自己所用。曹 操在其政治活动中,为扩大他在庶族地主中统治基础,打击反动世袭豪强势力, 曾大力强调“唯才是举”,为此而先后发布“求贤令”、“举士令”、“求逸才 令”等;而《短歌行》实际上就是一曲“求贤歌”、又正因为运用诗歌 形式,含有丰富抒情成分,所以就能起到独特感染作用,有力地宣传他所坚 持主张,配合他所颁发政令。 《短歌行》原来有“六解”(即六个乐段),按照诗意分为四节来读。 “对酒当歌,人生几何?譬如朝露,去日苦多。慨当以慷,忧思难忘。何以 解忧,唯有杜康。” 在这八句中,作者强调他非常发愁,愁得不得。那么愁是什么呢?原来他是 苦于得不到众多“贤才”来同他合作, 一道抓紧时间建功立业。 试想连曹操这样 位高权重人居然在那里为“求贤”而发愁, 那该有多大宣传作用。 假如庶族地主 中真有“贤才”话, 看这些话就不能不大受感动和鼓舞。 他们正苦于找不到出路

外国文学名著赏析

外国文学名著赏析 ——对《哈姆雷特》与《堂吉诃德》人物形象及其现实意义的 分析 班级:学号:姓名: 摘要:《哈姆雷特》和《堂吉诃德》是文艺复兴时期涌现出来的一批比较先进的文学作品,通过对两部文学作品的阅读,我们不难发现两部作品所展现出来的人物形象都反映出了当时不同的社会现实,而且两部文学作品都表现出了浓厚的人文主义色彩,并且在当时也带来了不同的社会作用。堂吉诃德是塞万提斯塑造的一个为了打击骑士文学的文学形象,作者通过对堂吉诃德的描写,生动的说明了骑士文学给世人带来的负面影响,从而给骑士文学以致命的打击。莎士比亚笔下的哈姆雷特则是一个孤独的人文主义者,他以失败告终,哈姆雷特的失败向人们揭示了人文主义时代的悲剧。 Abstract: Hamlet a nd Don Quixote were the advanced literary in the Renaissance Period .After read the two novels, it is not difficult to find that different characters in these novels reflected different social problems. And they all expressed the ideas of humanism. These two images also brought different social functions at that time .Don Quixote is one of which Cervantes’s molds in order to attack the knight literature .According to the description of Don Quixote, vivid explain ed that knight literature brought a serious of bad influence to the human beings, thus gives the knight literature by the fatal attack. Hamlet is a lonely humanism, he is end in failure .His failure has indicated the tragedy which in the Humanism Era. 关键词:人文主义、哈姆雷特、人物形象、艺术特色、堂吉诃德、现实意义、悲剧 Key words:humanism; Hamlet; characters; art features; Don Quixote; realistic significance; tragedy 前言: 莎士比亚和塞万提斯都是文艺复兴时期伟大的戏剧家,作为戏剧艺术的大师,他们的作品都达到了世界文学的巅峰。《哈姆雷特》是莎士比亚最著名的悲剧之一,代表了莎士比亚的艺术成就,剧中莎士比亚塑造的著名人物哈姆雷特也被列入了世界文学的艺术画廊。塞万提斯是文艺复兴时期西班牙伟大的作家,在他创作的作品中,以《堂吉诃德》最为著名,影响也最大,是文艺

相关文档