文档库 最新最全的文档下载
当前位置:文档库 › OpenMP 简明教程(by Blaise Barney, Lawrence Livermore National Laboratory)

OpenMP 简明教程(by Blaise Barney, Lawrence Livermore National Laboratory)

OpenMP 简明教程(by Blaise Barney, Lawrence Livermore National Laboratory)
OpenMP 简明教程(by Blaise Barney, Lawrence Livermore National Laboratory)

References:

Which Led To...

Release History

Standardization: Lean and Mean: Ease of Use:

Portability:

Shared Memory, Thread Based Parallelism:

Shared Memory, Thread Based Parallelism:

OpenMP is based upon the existence of multiple threads in the shared memory programming paradigm. A shared memory process consists of multiple threads. Explicit Parallelism:

Explicit Parallelism:

OpenMP is an explicit (not automatic) programming model, offering the

programmer full control over parallelization.

Fork - Join Model:

Fork - Join Model:

OpenMP uses the fork-join model of parallel execution:

All OpenMP programs begin as a single process: the master thread. The master thread executes sequentially until the first parallel region

construct is encountered.

FORK: the master thread then creates a team of parallel threads

FORK:

The statements in the program that are enclosed by the parallel region construct are then executed in parallel among the various team threads JOIN: When the team threads complete the statements in the parallel region JOIN:

construct, they synchronize and terminate, leaving only the master thread

Compiler Directive Based:

Compiler Directive Based:

Most OpenMP parallelism is specified through the use of compiler directives which are imbedded in C/C++ or Fortran source code.

Nested Parallelism Support:

Nested Parallelism Support:

The API provides for the placement of parallel constructs inside of other parallel constructs.

Implementations may or may not support this feature.

Dynamic Threads:

Dynamic Threads:

The API provides for dynamically altering the number of threads which may used to execute different parallel regions.

Implementations may or may not support this feature.

I/O:

I/O:

OpenMP specifies nothing about parallel I/O. This is particularly important if multiple threads attempt to write/read from the same file.

If every thread conducts I/O to a different file, the issues are not as

significant.

It is entirely up to the programmer to insure that I/O is conducted

correctly within the context of a multi-threaded program.

Memory Model: FLUSH Often?

Memory Model: FLUSH Often?

OpenMP provides a "relaxed-consistency" and "temporary" view of thread

memory (in their words). In other words, threads can "cache" their data and are not required to maintain exact consistency with real memory all of the time.

When it is critical that all threads view a shared variable identically, the programmer is responsible for insuring that the variable is FLUSHed by all threads as needed.

More on this later...

Example OpenMP Code Structure

Fortran - General Code Structure

Fortran - General Code Structure

PROGRAM HELLO

INTEGER VAR1, VAR2, VAR3

Serial code

.

.

.

Beginning of parallel section. Fork a team of threads.

Specify variable scoping

!$OMP PARALLEL PRIVATE(VAR1, VAR2) SHARED(VAR3)

Parallel section executed by all threads

.

.

.

All threads join master thread and disband

!$OMP END PARALLEL

Resume serial code

.

.

.

END

C / C++ - General Code Structure

C / C++ - General Code Structure

#include

main () {

int var1, var2, var3;

Serial code

.

.

.

Beginning of parallel section. Fork a team of threads.

Specify variable scoping

#pragma omp parallel private(var1, var2) shared(var3)

{

Parallel section executed by all threads

.

.

.

All threads join master thread and disband }

Resume serial code

.

.

.

}

OpenMP Directives

Fortran Directives Format

Format: (case insensitive)

Format:

Format:

sentinel directive-name

[clause ...]

All Fortran OpenMP directives must begin with a sentinel. The accepted sentinels depend upon the type of Fortran source. Possible sentinels are: !$OMP

C$OMP

*$OMP A valid OpenMP

directive. Must

appear after the

sentinel and before

any clauses.

Optional. Clauses

can be in any

order, and

repeated as

necessary unless

otherwise

restricted.

Example:

Example:

!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(BETA,PI)

Fixed Form Source:

Fixed Form Source:

!$OMP C$OMP *$OMP

!$OMP C$OMP *$OMP are accepted sentinels and must start in column 1

All Fortran fixed form rules for line length, white space, continuation and comment columns apply for the entire directive line

Initial directive lines must have a space/zero in column 6.

Continuation lines must have a non-space/zero in column 6.

Free Form Source:

Free Form Source:

!$OMP

!$OMP is the only accepted sentinel. Can appear in any column, but must be preceded by white space only.

All Fortran free form rules for line length, white space, continuation and comment columns apply for the entire directive line

Initial directive lines must have a space after the sentinel.

Continuation lines must have an ampersand as the last non-blank character in

a line. The following line must begin with a sentinel and then the

continuation directives.

General Rules:

General Rules:

Comments can not appear on the same line as a directive

Only one directive-name may be specified per directive

Fortran compilers which are OpenMP enabled generally include a command line option which instructs the compiler to activate and interpret all OpenMP directives.

Several Fortran OpenMP directives come in pairs and have the form shown below. The "end" directive is optional but advised for readability.

!$OMP directive

[ structured block of code ]!$OMP end directive

OpenMP Directives

C / C++ Directives Format

Format:Format:

#pragma omp directive-name [clause, ...]newline Required for

all OpenMP C/C++

directives.

A valid OpenMP directive. Must appear after the pragma and before any clauses.

Optional.

Clauses can be in any order,and repeated as necessary unless otherwise restricted.

Required.Precedes the structured block which is enclosed by this

directive.

Example:Example:

#pragma omp parallel default(shared) private(beta,pi)

General Rules:General Rules:Case sensitive

Directives follow conventions of the C/C++ standards for compiler directives Only one directive-name may be specified per directive

Each directive applies to at most one succeeding statement, which must be a structured block.

Long directive lines can be "continued" on succeeding lines by escaping the newline character with a backslash ("\") at the end of a directive line.

OpenMP Directives Directive Scoping

Do we do this now...or do it later? Oh well, let's get it over with early...

Static (Lexical) Extent:Static (Lexical) Extent:

The code textually enclosed between the beginning and the end of a structured block following a directive.

The static extent of a directives does not span multiple routines or code files

Orphaned Directive:Orphaned Directive:

An OpenMP directive that appears independently from another enclosing

directive is said to be an orphaned directive. It exists outside of another directive's static (lexical) extent.Will span routines and possibly code files Dynamic Extent:Dynamic Extent:

The dynamic extent of a directive includes both its static (lexical) extent and the extents of its orphaned directives. Example:Example:

PROGRAM TEST

...

!$OMP PARALLEL ...!$OMP DO

DO I=... ...

CALL SUB1 ... ENDDO ...

CALL SUB2 ...

!$OMP END PARALLEL

SUBROUTINE SUB1 ...

!$OMP CRITICAL ...

!$OMP END CRITICAL END

SUBROUTINE SUB2 ...

!$OMP SECTIONS ...

!$OMP END SECTIONS ... END

STATIC EXTENT

The DO directive occurs within an enclosing parallel region

ORPHANED DIRECTIVES The CRITICAL and SECTIONS directives occur outside an enclosing parallel region

DYNAMIC EXTENT

The CRITICAL and SECTIONS directives occur within the dynamic

extent of the DO and PARALLEL directives.

Why Is This Important? Format:

Notes:

When a thread reaches a PARALLEL directive, it creates a team of threads and becomes the master of the team. The master is a member of that team and has thread number 0 within that team.

Starting from the beginning of this parallel region, the code is duplicated and all threads will execute that code.

There is an implied barrier at the end of a parallel section. Only the master thread continues execution past this point.

If any thread terminates within a parallel region, all threads in the team will terminate, and the work done up until that point is undefined.

How Many Threads?How Many Threads?

The number of threads in a parallel region is determined by the following factors, in order of precedence:Evaluation of the IF IF IF clause 1.Setting of the NUM_THREADS NUM_THREADS

NUM_THREADS clause https://www.wendangku.net/doc/ee13646906.html,e of the omp_set_num_threads()omp_set_num_threads()omp_set_num_threads() library function 3.Setting of the OMP_NUM_THREADS OMP_NUM_THREADS

OMP_NUM_THREADS environment variable 4.Implementation default - usually the number of CPUs on a node, though it could be dynamic (see next bullet).5.Threads are numbered from 0 (master thread) to N-1

Dynamic Threads:Dynamic Threads:

Use the omp_get_dynamic()omp_get_dynamic()omp_get_dynamic() library function to determine if dynamic threads are enabled.

If supported, the two methods available for enabling dynamic threads are:The omp_set_dynamic()omp_set_dynamic()

omp_set_dynamic() library routine 1.Setting of the OMP_DYNAMIC OMP_DYNAMIC OMP_DYNAMIC environment variable to TRUE 2.

Nested Parallel Regions:Nested Parallel Regions:

Use the omp_get_nested()omp_get_nested()omp_get_nested() library function to determine if nested parallel regions are enabled.

The two methods available for enabling nested parallel regions (if supported) are:

The omp_set_nested()omp_set_nested()

omp_set_nested() library routine 1.Setting of the OMP_NESTED OMP_NESTED

OMP_NESTED environment variable to TRUE 2.If not supported, a parallel region nested within another parallel region results in the creation of a new team, consisting of one thread, by default.

Clauses:

Fortran - Parallel Region Example

C / C++ - Parallel Region Example

C / C++ - Parallel Region Example

#include

main () {

int nthreads, tid;

/* Fork a team of threads with each thread having a private tid variable */ #pragma omp parallel private(tid)

{

/* Obtain and print thread id */

tid = omp_get_thread_num();

printf("Hello World from thread = %d\n", tid);

/* Only master thread does this */

if (tid == 0)

{

nthreads = omp_get_num_threads();

printf("Number of threads = %d\n", nthreads);

}

} /* All threads join master thread and terminate */

}

OpenMP Directives

Work-Sharing Constructs

A work-sharing construct divides the execution of the enclosed code region

among the members of the team that encounter it.

Work-sharing constructs do not launch new threads

There is no implied barrier upon entry to a work-sharing construct, however there is an implied barrier at the end of a work sharing construct.

Types of Work-Sharing Constructs:

Types of Work-Sharing Constructs:

NOTE: The Fortran workshare

workshare

workshare construct is not shown here, but is discussed later.

DO / for

DO / for - shares iterations of a loop across the team. Represents a type of "data parallelism".SECTIONS

SECTIONS - breaks work

into separate, discrete

sections. Each section

is executed by a thread.

Can be used to implement

a type of "functional

SINGLE

SINGLE - serializes a

section of code

parallelism".

Restrictions:

Restrictions:

A work-sharing construct must be enclosed dynamically within a parallel

region in order for the directive to execute in parallel.

Work-sharing constructs must be encountered by all members of a team or none at all

Successive work-sharing constructs must be encountered in the same order by all members of a team

OpenMP Directives Work-Sharing Constructs DO / for Directive

Purpose:Purpose:

The DO / for directive specifies that the iterations of the loop immediately following it must be executed in parallel by the team. This assumes a

parallel region has already been initiated, otherwise it executes in serial on a single processor. Format:Format:

Fortran !$OMP DO [clause ...]

SCHEDULE (type [,chunk]) ORDERED

PRIVATE (list)

FIRSTPRIVATE (list)

LASTPRIVATE (list) SHARED (list)

REDUCTION (operator | intrinsic : list)

COLLAPSE (n) do_loop

!$OMP END DO [ NOWAIT ]

C/C++#pragma omp for [clause ...] newline

schedule (type [,chunk]) ordered

private (list)

firstprivate (list)

lastprivate (list)

shared (list)

reduction (operator: list) collapse (n)

nowait

for_loop

Clauses:

Clauses:

SCHEDULE

SCHEDULE: Describes how iterations of the loop are divided among the

threads in the team. The default schedule is implementation dependent.

STATIC

Loop iterations are divided into pieces of size chunk and then

statically assigned to threads. If chunk is not specified, the

iterations are evenly (if possible) divided contiguously among the

threads.

DYNAMIC

Loop iterations are divided into pieces of size chunk, and dynamically

scheduled among the threads; when a thread finishes one chunk, it is

dynamically assigned another. The default chunk size is 1.

GUIDED

For a chunk size of 1, the size of each chunk is proportional to the

number of unassigned iterations divided by the number of threads,

decreasing to 1. For a chunk size with value k (greater than 1), the

size of each chunk is determined in the same way with the restriction

that the chunks do not contain fewer than k iterations (except for the

last chunk to be assigned, which may have fewer than k iterations). The default chunk size is 1.

RUNTIME

The scheduling decision is deferred until runtime by the environment

variable OMP_SCHEDULE. It is illegal to specify a chunk size for this

clause.

AUTO

The scheduling decision is delegated to the compiler and/or runtime

Restrictions:

Fortran - DO Directive Example

! Some initializations

DO I = 1, N

A(I) = I * 1.0

B(I) = A(I)

ENDDO

CHUNK = CHUNKSIZE

!$OMP PARALLEL SHARED(A,B,C,CHUNK) PRIVATE(I)

!$OMP DO SCHEDULE(DYNAMIC,CHUNK)

DO I = 1, N

C(I) = A(I) + B(I)

ENDDO

!$OMP END DO NOWAIT

!$OMP END PARALLEL

END

C / C++ - for Directive Example

C / C++ - for Directive Example

#include

#define CHUNKSIZE 100

#define N 1000

main ()

{

int i, chunk;

float a[N], b[N], c[N];

/* Some initializations */

for (i=0; i < N; i++)

a[i] = b[i] = i * 1.0;

chunk = CHUNKSIZE;

#pragma omp parallel shared(a,b,c,chunk) private(i) {

#pragma omp for schedule(dynamic,chunk) nowait for (i=0; i < N; i++)

c[i] = a[i] + b[i];

} /* end of parallel section */

}

OpenMP Directives

Work-Sharing Constructs SECTIONS Directive

Purpose:

Purpose:

The SECTIONS directive is a non-iterative work-sharing construct. It

specifies that the enclosed section(s) of code are to be divided among the threads in the team.

Independent SECTION directives are nested within a SECTIONS directive. Each SECTION is executed once by a thread in the team. Different sections may be executed by different threads. It is possible that for a thread to execute more than one section if it is quick enough and the implementation permits such. Format:Format:

Fortran !$OMP SECTIONS [clause ...]

PRIVATE (list)

FIRSTPRIVATE (list) LASTPRIVATE (list)

REDUCTION (operator | intrinsic : list) !$OMP SECTION

block

!$OMP SECTION block

!$OMP END SECTIONS [ NOWAIT ]

C/C++

#pragma omp sections [clause ...] newline private (list)

firstprivate (list) lastprivate (list)

reduction (operator: list) nowait {

#pragma omp section newline structured_block

#pragma omp section newline structured_block }

Clauses:Clauses:

OpenMP API 用户指南

OpenMP API 用户指南 Sun? Studio 11 Sun Microsystems, Inc. https://www.wendangku.net/doc/ee13646906.html, 文件号码 819-4818-10 2005 年 11 月,修订版 A 请将关于本文档的意见和建议提交至:https://www.wendangku.net/doc/ee13646906.html,/hwdocs/feedback

版权所有 ? 2005 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. 保留所有权利。 美国政府权利-商业用途。政府用户应遵循 Sun Microsystems, Inc. 的标准许可协议,以及 FAR(Federal Acquisition Regulations,即“联邦政府采购法规”)的适用条款及其补充条款。必须依据许可证条款使用。 本发行版可能包含由第三方开发的内容。 本产品的某些部分可能是从 Berkeley BSD 系统衍生出来的,并获得了加利福尼亚大学的许可。UNIX 是 X/Open Company, Ltd. 在美国和其他国家/地区独家许可的注册商标。 Sun、Sun Microsystems、Sun 徽标、Java 和 JavaHelp 是 Sun Microsystems, Inc. 在美国和其他国家/地区的商标或注册商标。所有的 SPARC 商标的使用均已获得许可,它们是 SPARC International, Inc. 在美国和其他国家/地区的商标或注册商标。标有 SPARC 商标的产品均基于由 Sun Microsystems, Inc. 开发的体系结构。 本服务手册所介绍的产品以及所包含的信息受美国出口控制法制约,并应遵守其他国家/地区的进出口法律。严禁将本产品直接或间接地用于核设施、导弹、生化武器或海上核设施,也不能直接或间接地出口给核设施、导弹、生化武器或海上核设施的最终用户。严禁出口或转口到美国禁运的国家/地区以及美国禁止出口清单中所包含的实体,包括但不限于被禁止的个人以及特别指定的国家/地区的公民。 本文档按“原样”提供,对于所有明示或默示的条件、陈述和担保,包括对适销性、适用性或非侵权性的默示保证,均不承担任何责任,除非此免责声明的适用范围在法律上无效。

openMP实验总结报告

openMP实验报告 目录 openMP实验报告.............................................. 错误!未定义书签。 OpenMP简介.............................................. 错误!未定义书签。 实验一................................................... 错误!未定义书签。 实验二................................................... 错误!未定义书签。 实验三................................................... 错误!未定义书签。 实验四................................................... 错误!未定义书签。 实验五................................................... 错误!未定义书签。 实验六................................................... 错误!未定义书签。 实验七................................................... 错误!未定义书签。 实验八................................................... 错误!未定义书签。 实验总结................................................. 错误!未定义书签。 在学习了MPI之后,我们又继续学习了有关openMP的并行运算,通过老师的细致讲解,我们对openMP有了一个初步的了解: OpenMP简介 OpenMP是一种用于共享内存并行系统的多线程程序设计的库(Compiler Directive),特别适合于多核CPU上的并行程序开发设计。它支持的语言包括:C 语言、C++、Fortran;不过,用以上这些语言进行程序开发时,并非需要特别关注的地方,因为现如今的大多数编译器已经支持了OpenMP,例如:Sun Compiler,GNU Compiler、Intel Compiler、Visual Studio等等。程序员在编程时,只需要在特定的源代码片段的前面加入OpenMP专用的#pargma omp预编译指令,就可以“通知”编译器将该段程序自动进行并行化处理,并且在必要的时候加入线程同步及通信机制。当编译器选择忽略#pargma omp预处理指令时,或者编译器不支持OpenMP时,程序又退化为一般的通用串行程序,此时,代码

并行计算环境介绍

并行计算环境介绍 计算机系04 级研究生 武志鹏 1 MPI简介 目前两种最重要的并行编程模型是数据并行和消息传递。 数据并 行编程模型的编程级别比较高,编程相对简单,但它仅适用于数据并 行问题;消息传递编程模型的编程级别相对较低,但消息传递编程模 型可以有更广泛的应用范围。 MPI就是一种消息传递编程模型,并成为这种编程模型的代表和 事实上的标准。 1.1什么是 MPI 对MPI的定义是多种多样的,但不外乎下面三个方面: (1) MPI是一个库,而不是一门语言; (2) MPI是一种标准或规范的代表,而不特指某一个对它的实现; (3) MPI是一种消息传递编程模型,MPI虽然很庞大,但是它的最 终目的是服务于进程间通信这一目标的。 1.2 MPI的历史 MPI的标准化开始于1992年4月在威吉尼亚的威廉姆斯堡召开的分 布存储环境中消息传递标准的讨论会,由Dongarra,Hempel,Hey和 Walker建议的初始草案,于1992年11月推出并在1993年2月完成了修

订版,这就是MPI 1.0。 1995年6月推出了MPI的新版本MPI1.1,对原来的MPI作了进一步 的修改完善和扩充。 在1997年7月在对原来的MPI作了重大扩充的基础上又推出了MPI 的扩充部分MPI-2,而把原来的MPI各种版本称为MPI-1。 MPI-2的扩 充很多但主要是三个方面:并行I/O、远程存储访问和动态进程管理。 1.3 MPI的语言绑定 在MPI-1中明确提出了MPI和FORTRAN 77与C语言的绑定,并且 给出了通用接口和针对FORTRAN 77与C的专用接口说明。在MPI-2 中除了和原来的FORTRAN 77和C语言实现绑定之外,进一步与 Fortran90和C++结合起来。 1.4 MPI的实现版本 MPICH是一种最重要的MPI实现, 它是与MPI-1规范同步发展的版 本,每当MPI推出新的版本,就会有相应的MPICH的实现版本,另外 它还支持部分MPI-2的特征。 LAM-MPI也是一种MPI实现, 主要用于异构的计算机网络计算系统。 以上2种版本的MPI实现可以分别从以下网址下载: MPICH(最新版本1.2.7): https://www.wendangku.net/doc/ee13646906.html,/mpi/mpich/ LAM-MPI(最新版本7.1.2):

OpenMP例程使用手册

OpenMP例程使用手册 目录 1 OpenMP简介 (2) 2 OpenMP例程编译 (2) 2.1安装gawk (2) 2.2编译例程 (2) 2.3拷贝例程到开发板 (3) 3例程测试 (5) 3.1 dspheap (5) 3.2 vecadd (6) 3.3 vecadd_complex (6) 3.4 其他例程测试说明 (7) 更多帮助.................................................................................................... 错误!未定义书签。 公司官网:https://www.wendangku.net/doc/ee13646906.html, 销售邮箱:sales@https://www.wendangku.net/doc/ee13646906.html, 公司总机:020-8998-6280 1/7技术论坛:https://www.wendangku.net/doc/ee13646906.html, 技术邮箱:support@https://www.wendangku.net/doc/ee13646906.html, 技术热线:020-3893-9734

1 OpenMP简介 OpenMP用于共享内存并行系统的多处理器程序设计的一套指导性的编译处理方案(Compiler Directive)。它是为在多处理机上编写并行程序而设计的一个应用编程接口。它包括一套编译指导语句和一个用来支持它的函数库。 OpenMP提供的这种对于并行描述的高层抽象降低了并行编程的难度和复杂度,这样程序员可以把更多的精力投入到并行算法本身,而非其具体实现细节。对基于数据分集的多线程程序设计,OpenMP是一个很好的选择。同时,使用OpenMP也提供了更强的灵活性,可以较容易的适应不同的并行系统配置。线程粒度和负载平衡等是传统多线程程序设计中的难题,但在OpenMP中,OpenMP库从程序员手中接管了部分这两方面的工作。 但是,作为高层抽象,OpenMP并不适合需要复杂的线程间同步和互斥的场合。OpenMP的另一个缺点是不能在非共享内存系统(如计算机集群)上使用。在这样的系统上,MPI使用较多。 2 OpenMP例程编译 2.1安装gawk 此工具为编译的必要工具,在Ubuntu下安装: Host#sudoapt-get install gawk 图1 2.2编译例程 请先安装ti-processor-sdk-linux-am57xx-evm-03.01.00.06,安装步骤请参照《相关软件

OpenMP API 用户指南

Sun Studio12Update 1:OpenMP API用户指南 Sun Microsystems,Inc. 4150Network Circle Santa Clara,CA95054 U.S.A. 文件号码821–0393 2009年9月

版权所有2009Sun Microsystems,Inc.4150Network Circle,Santa Clara,CA95054U.S.A.保留所有权利。 对于本文档中介绍的产品,Sun Microsystems,Inc.对其所涉及的技术拥有相关的知识产权。需特别指出的是(但不局限于此),这些知识产权可能包含一项或多项美国专利,以及在美国和其他国家/地区申请的待批专利。 美国政府权利-商业软件。政府用户应遵循Sun Microsystems,Inc.的标准许可协议,以及FAR(Federal Acquisition Regulations,即“联邦政府采购法规”)的适用条款及其补充条款。 本发行版可能包含由第三方开发的内容。 本产品的某些部分可能是从Berkeley BSD系统衍生出来的,并获得了加利福尼亚大学的许可。UNIX是X/Open Company,Ltd.在美国和其他国家/地区独家许可的注册商标。 Sun、Sun Microsystems、Sun徽标、Solaris徽标、Java咖啡杯徽标、https://www.wendangku.net/doc/ee13646906.html,、Java和Solaris是Sun Microsystems,Inc.在美国和其他国家/地区的商标或注册商标。所有SPARC商标的使用均已获得许可,它们是SPARC International,Inc.在美国和其他国家/地区的商标或注册商标。标有SPARC商标的产品均基于由Sun Microsystems,Inc.开发的体系结构。 OPEN LOOK和Sun TM图形用户界面是Sun Microsystems,Inc.为其用户和许可证持有者开发的。Sun感谢Xerox在研究和开发可视或图形用户界面的概念方面为计算机行业所做的开拓性贡献。Sun已从Xerox获得了对Xerox图形用户界面的非独占性许可证,该许可证还适用于实现OPEN LOOK GUI 和在其他方面遵守Sun书面许可协议的Sun许可证持有者。 本出版物所介绍的产品以及所包含的信息受美国出口控制法制约,并应遵守其他国家/地区的进出口法律。严禁将本产品直接或间接地用于核设 施、导弹、生化武器或海上核设施,也不能直接或间接地出口给核设施、导弹、生化武器或海上核设施的最终用户。严禁出口或转口到美国禁运的国家/地区以及美国禁止出口清单中所包含的实体,包括但不限于被禁止的个人以及特别指定的国家/地区的公民。 本文档按“原样”提供,对于所有明示或默示的条件、陈述和担保,包括对适销性、适用性或非侵权性的默示保证,均不承担任何责任,除非此免责声明的适用范围在法律上无效。 090904@22749

openmp简介(DOC)

OpenMP编程基础 1、可以说OpenMP制导指令将C语言扩展为一个并行语言,但OpenMP本身不是一种独立的并行语 言,而是为多处理器上编写并行程序而设计的、指导共享内存、多线程并行的编译制导指令和应用程序编程接口(API),可在C/C++和Fortran(77、90和95)中应用,并在串行代码中以编译器可识别的注释形式出现。OpenMP标准是由一些具有国际影响力的软件和硬件厂商共同定义和提出,是一种在共享存储体系结构的可移植编程模型,广泛应用与Unix、Linux、Windows等多种平台上。 2.1 OpenMP基本概念 首先来了解OpenMP的执行模式和三大要素。 2.1.1 执行模式 OpenMP的执行模型采用fork-join的形式,其中fork创建新线程或者唤醒已有线程;join即多线程的会合。fork-join执行模型在刚开始执行的时候,只有一个称为“主线程”的运行线程存在。主线程在运行过程中,当遇到需要进行并行计算的时候,派生出线程来执行并行任务。在并行执行的时候,主线程和派生线程共同工作。在并行代码执行结束后,派生线程退出或者阻塞,不再工作,控制流程回到单独的主线程中。 OpenMP的编程者需要在可并行工作的代码部分用制导指令向编译器指出其并行属性,而且这些并行区域可以出现嵌套的情况,如图2.1所示。 对并行域(Paralle region)作如下定义:在成对的fork和join之间的区域,称为并行域,它既表示代码也表示执行时间区间。 对OpenMP线程作如下定义:在OpenMP程序中用于完成计算任务的一个执行流的执行实体,可

以是操作系统的线程也可以是操作系统上的进程。 2.1.2 OpenMP编程要素 OpenMP编程模型以线程为基础,通过编译制导指令来显式地指导并行化,OpenMP为编程人员提供了三种编程要素来实现对并行化的完善控制。它们是编译制导、API函数集和环境变量。 编译制导 在C/C++程序中,OpenMP的所有编译制导指令是以#pragma omp开始,后面跟具体的功能指令(或命令),其具有如下形式: #pragma omp 指令[子句[, 子句] …] 支持OpenMP的编译器能识别、处理这些制导指令并实现其功能。其中指令或命令是可以单独出现的,而子句则必须出现在制导指令之后。制导指令和子句按照功能可以大体上分成四类: 1)并行域控制类; 2)任务分担类; 3)同步控制类; 并行域控制类指令用于指示编译器产生多个线程以并发执行任务,任务分担类指令指示编译器如何给各个并发线程分发任务,同步控制类指令指示编译器协调并发线程之间的时间约束关系等。 1)OpenMP规范中的指令有以下这些: ? parallel:用在一个结构块之前,表示这段代码将被多个线程并行执行; ? for:用于for循环语句之前,表示将循环计算任务分配到多个线程中并行执行,以实现任务分担,必须由编程人员自己保证每次循环之间无数据相关性; ? parallel for:parallel 和for指令的结合,也是用在for循环语句之前,表示for循环体的代码将被多个线程并行执行,它同时具有并行域的产生和任务分担两个功能; ? sections:用在可被并行执行的代码段之前,用于实现多个结构块语句的任务分担,可并行执行的代码段各自用section指令标出(注意区分sections和section); ? parallel sections:parallel和sections两个语句的结合,类似于parallel for; ? single:用在并行域内,表示一段只被单个线程执行的代码; ? critical:用在一段代码临界区之前,保证每次只有一个OpenMP线程进入; ? flush:保证各个OpenMP线程的数据影像的一致性; ? barrier:用于并行域内代码的线程同步,线程执行到barrier时要停下等待,直到所有线程都执行到barrier时才继续往下执行;

OpenMP并行程序的编译器优化

OpenMP 并行程序的编译器优化 张 平,李清宝,赵荣彩 (解放军信息工程大学信息工程学院,郑州 450002) 摘 要:OpemMP 标准以其良好的可移植性和易用性被广泛应用于并行程序设计。该文讨论了OpenMP 并行程序的编译器优化算法,在编译过程中通过并行区合并和扩展,实现并行区重构,并在并行区中实现了基于跨处理器相关图的barrier 同步优化。分析验证表明,这些优化策略减少了并行区和barrier 同步的数目,有效地提高了OpenMP 程序的并行性能。 关键词:跨处理器相关;barrier 同步;并行区重构;数据相关图 Compiler Optimization Algorithm for OpenMP Parallel Program ZHANG Ping, LI Qingbao, ZHAO Rongcai (School of Information and Engineering, PLA Information and Engineering University, Zhengzhou 450002) 【Abstract 】OpenMP is widely used in parallel programming for its portability and simplicity. This paper introduces the compiler optimization algorithms for OpenMP parallel program. In compiling, parallel regions are reconstructed through extension and combination. And a barrier synchronization optimization algorithm based on cross-processor dependence graph is developed to eliminate redundant barriers in each parallel region. Analysis show that these strategies reduce the number of parallel region and barrier synchronization, and can improve the parallel performance of OpenMP program. 【Key words 】Cross-processor dependence; Barrier synchronization; Parallel region reconstruction; Data dependence graph 计 算 机 工 程Computer Engineering 第32卷 第24期 Vol.32 No.24 2006年12月 December 2006 ·软件技术与数据库· 文章编号:1000—3428(2006)24—0037—04 文献标识码:A 中图分类号:TP311 OpenMP 是共享内存并行程序设计的工业标准,其目标 是为具有统一地址空间的并行系统提供可移植、可扩展的开发接口,它通过编译指示和运行时库函数扩展C 、C++和Fortran 语言支持并行。 OpenMP 为程序员提供了一种简单的并行程序设计方法,可以在串行程序的基础上方便地开发出并行程序。但应用程序员往往缺乏对程序并行性的分析(如数据相关性分析和通信分析等),许多OpenMP 程序的性能也并不理想,并行效率较低;另一方面,如果要求程序员在编写并行程序时进行深入的程序分析就会增加程序设计的难度,违背OpenMP 的易用性原则,也是不现实的。因此,我们考虑在编译过程中实现OpenMP 程序的优化。 本文讨论了循环级并行的OpenMP 并行程序的编译器优化策略:通过并行区的扩展和合并重构并行区,减少并行区的数目;在重构后的并行区中,依据计算的分配调度进行跨处理器相关性分析,建立跨处理器相关图,消除冗余barrier 同步。 1 OpenMP 并行程序 OpenMP 利用编译指示、运行时库函数和环境变量描述程序的并行特性。 1.1 OpenMP 编译指示 OpenMP 的编译指示分为并行结构、工作共享结构、同步编译、数据环境指示等几类。常用的编译指示包括:(1)#pragma omp parallel 说明并行结构,其中的代码被多个工作线程执行;(2)#pragma omp for 说明工作共享结构,指示循环被分配给多个工作线程并行执行;(3)#pragma omp barrier 为同步指示,标记线程在此等待,直到所有线程都执行到这个点,再继续向下执行。 在omp parallel, omp for 的结束都隐含着一个barrier 同步点,为减少不必要的同步,OpenMP 提供了no wait 子句,指 示可以不进行barrier 同步。 1.2 OpenMP 的执行模式 OpenMP 程序遵循fork-and-join 执行模式(如图1)。程序从主线程开始执行,当遇到并行结构,主线程启动(创建或唤醒)一组工作线程并行执行其中的语句。当遇到工作共享结构,工作负荷由线程组中的各个线程分担;并行区结束,线程组中的线程同步,工作线程终止(消亡或睡眠 ),主线程继续执行。 在程序执行过程中,可多次执行fork-and-join 过程,需要在串行执行和并行执行间进行多次切换。实验表明[1],OpenMP 执行过程中,串行执行和并行执行切换带来的额外执行开销是影响OpenMP 程序性能的一个重要因素。 图1 OpenMP 程序的执行模式 基金项目:国防科研基金资助重点项目 作者简介:张 平(1969-),女,博士生,主研方向:并行识别,并行编译;李清宝,副教授;赵荣彩,教授、博导 收稿日期:2006-02-24 E-mail :lqb215@https://www.wendangku.net/doc/ee13646906.html,

Intel高级 OpenMP

高级OpenMP*编程 简介 作为白皮书三部曲中的最后一篇,本文将为您介绍经验丰富的C/C++程序员如何开始使用OpenMP*,以及如何在应用中简化线程的创建、同步以及删除工作。我们的一系列白皮书将为您全面揭秘OpenMP,其中第一篇向您简要介绍了OpenMP 最常见的特性:循环工作共享。第二篇告诉您如何充分利用非循环并行能力及如何使用同步指令。最后一篇则讨论了库函数、环境变量、如何在发生错误时调试应用,以及最大限度地发挥性能的一些技巧。运行时库函数 您可能还记得,OpenMP 由一套编译指令、函数调用和环境变量组成。前两篇文章只讨论了编译指令, 本文将重点探讨函数调用和环境变量。这样安排的理由很简单:编译指令是OpenMP 的“原因”,它们提供最大程度的简易性,不需要改变源代码,并且您可忽略它们来生成代码的系列版本。另一方面,使用函数调用需要改变程序,这将为执行系列版本(如需)带来困难。如果遇到疑问,请您在计划使用函数调用(包括标头文件)时,尝试使用编译指令并保留函数调用。当然,您还应继续使用英特尔?C++编译器命令行切换/Qopenmp 。进行链接不需要其它库。 下表列出了四个最常使用的库函数,分别用于检索线程总数,设置线程数,返回当前线程数,及返回可用逻辑处理器的数量。如欲获得OpenMP 库函数的全部列表,请务必访问OpenMP 网站:https://www.wendangku.net/doc/ee13646906.html, *。

算作两颗处理器。 以下为使用上述函数来打印字母表的例子。 omp_set_num_threads(4); #pragma omp parallel private(i) {//This code has a bug.Can you find it? int LettersPerThread=26/omp_get_num_threads(); int ThisThreadNum=omp_get_thread_num(); int StartLetter='a'+ThisThreadNum*LettersPerThread; int EndLetter='a'+ThisThreadNum*LettersPerThread+LettersPerThread; for(i=StartLetter;i

openmp与openmpi区别

Lammps Mac 的并行之路 openmp与openmpi区别 openmp比较简单,修改现有的大段代码也容易。基本上openmp只要在已有程序基础上根据需要加并行语句即可。而mpi有时甚至需要从基本设计思路上重写整个程序, 调试也困难得多,涉及到局域网通信这一不确定的因素。不过,openmp虽然简单却只能用于单机多CPU/多核并行,mpi才是用于多主机超级计算机集群的强悍工具,当然复杂。 (1)MPI=message passing interface: 在分布式内存(distributed-memory)之间实现信息通讯的一种规范/标准/协议(standard)。它是一个库,不是一门语言。可以被fortran,c,c++等调用。MPI 允许静态任务调度,显示并行提供了良好的性能和移植性,用 MPI 编写的程序可直接 在多核集群上运行。在集群系统中,集群的各节点之间可以采用 MPI 编程模型进行程 序设计,每个节点都有自己的内存,可以对本地的指令和数据直接进行访问,各节点 之间通过互联网络进行消息传递,这样设计具有很好的可移植性,完备的异步通信功能,较强的可扩展性等优点。MPI 模型存在一些不足,包括:程序的分解、开发和调 试相对困难,而且通常要求对代码做大量的改动;通信会造成很大的开销,为了最小 化延迟,通常需要大的代码粒度;细粒度的并行会引发大量的通信;动态负载平衡困难;并行化改进需要大量地修改原有的串行代码,调试难度比较大。 (2)MPICH和OpenMPI: 它们都是采用MPI标准,在并行计算中,实现节点间通信的开源软件。各自有各自的 函数,指令和库。 Reference: They are two implementations of the MPI standard. In the late 90s and early 2000s, there were many different MPI implementations, and the implementors started to realize they were all re-inventing the wheel; there was something of a consolidation. The LAM/MPI team joined with the LA/MPI, FT-MPI, and eventually PACX-MPI teams to develop OpenMPI. LAM MPI stopped being developed in 2007. The code base for OpenMPI was completely new, but it brought in ideas and techniques from all the different teams.

相关文档