A “Hands-on” Introduction To OpenMP

2y ago
8 Views
2 Downloads
467.35 KB
153 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Ronan Orellana
Transcription

A “Hands-on” Introduction toOpenMP*Tim MattsonPrincipal EngineerIntel CorporationLarry MeadowsPrincipal EngineerIntel eadows@intel.com1* The name “OpenMP” is the property of the OpenMP Architecture Review Board.

Preliminaries: part 1zDisclosures Theviews expressed in this tutorial are those of thepeople delivering the tutorial.– We are not speaking for our employers.– We are not speaking for the OpenMP ARBzThis is a new tutorial for us: Helpus improve tell us how you would make thistutorial better.2

Preliminaries: Part 2zOur plan for the day . Active learning! Wewill mix short lectures with short exercises. You will use your laptop for the exercises thatway you’ll have an OpenMP environment to takehome so you can keep learning on your own.zPlease follow these simple rules Dothe exercises we assign and then change thingsaround and experiment.– Embrace active learning! Don’tcheat: Do Not look at the solutions beforeyou complete an exercise even if you get reallyfrustrated.3

Our Plan for the dayTopicExerciseconceptsI. OMP IntroInstall sw,hello worldParallel regionsII. Creating threadsPi spmd simpleParallel, default dataenvironment, runtime librarycallsIII. SynchronizationPi spmd finalFalse sharing, critical, atomicIV. Parallel loopsPi loopFor, reductionV. Odds and endsNo exerciseSingle, master, runtimelibraries, environmentvariables, synchronization, etc.VI. Data EnvironmentPi mcData environment details,modular software,threadprivateVII. Worksharing andscheduleLinked list,matmulFor, schedules, sectionsVIII. Memory modelProducerconsumerPoint to point synch with flushIX OpenMP 3 and tasksLinked listTasks and other OpenMP 3featuresBreaklunchBreak4

OutlineIntroduction to OpenMPz Creating Threadsz Synchronizationz Parallel Loopsz Synchronize single masters and stuffz Data environmentz Schedule your for and sectionsz Memory modelz OpenMP 3.0 and Tasksz5

OpenMP* Overview:C OMP FLUSHC OMP THREADPRIVATE(/ABC/)#pragma omp criticalCALL OMP SET NUM THREADS(10)An API forWritingMultithreadedC OMP OpenMP:parallel do shared(a,b, c)call omp test lock(jlok)Applicationscall OMP INIT LOCK (ilok)C OMP ATOMICC OMPC OMP MASTERSINGLEPRIVATE(X) A setof compilerdirectivesand librarysetenvOMP SCHEDULE“dynamic”routines for parallel application programmersC OMP ORDERED Greatly simplifies writing multi-threaded (MT)C OMP PARALLEL REDUCTION ( : A, B)programs in Fortran, C and C C OMP SECTIONS#pragma ompparallel for lastprivate(A,B) of ! OMP Standardizes20 yearsSMP practiceBARRIERC OMP PARALLEL DO ORDERED PRIVATE (A, B, C)C OMP PARALLEL COPYIN(/blk/)C OMP DO lastprivate(XX)Nthrds OMP GET NUM PROCS()omp set lock(lck)6* The name “OpenMP” is the property of the OpenMP Architecture Review Board.

System layerProg.LayerUser layerOpenMP Basic Defs: Solution StackEnd blesOpenMP Runtime libraryOS/system support for shared memory and threadingProc1HWOpenMP libraryProc2Proc3ProcNShared Address Space7

OpenMP core syntaxzMost of the constructs in OpenMP are compilerdirectives.#pragma omp construct [clause [clause] ] Example#pragma omp parallel num threads(4)Function prototypes and types in the file:#include omp.h z Most OpenMP* constructs apply to a“structured block”.z Structuredblock: a block of one or more statementswith one point of entry at the top and one point ofexit at the bottom. It’s OK to have an exit() within the structured block.8

Exercise 1, Part A: Hello worldVerify that your environment workszWrite a program that prints “hello world”.voidvoid main()main(){{intintIDID �,\n”,ID);ID);}}9

Exercise 1, Part B: Hello worldVerify that your OpenMP environment workszWrite a multithreaded program that prints “hello world”.#include “omp.h”voidvoid main()main(){{#pragma omp parallel Switches for compiling and linking-fopenmp{-mp pgiintintIDID 0;0;}}}gcc/Qopenmp ��,\n”,ID);ID);10

Exercise 1: SolutionA multi-threaded “Hello world” programzWrite a multithreaded program where eachthread prints “hello e“omp.h”#include “omp.h”voidvoid main()main() Parallel region with defaultParallel region with default Sample Output:{{Sample mahello(1) hello(0)hello(0) ld(0)world(0)intintIDID omp get thread num();omp get thread num(); Parallelregion}}return a thread ID.End of the Parallel region11

OpenMP Overview:How do threads interact?zOpenMP is a multi-threading, shared addressmodel.– Threads communicate by sharing variables.zUnintended sharing of data causes raceconditions:– race condition: when the program’s outcomechanges as the threads are scheduled differently.zTo control race conditions:– Use synchronization to protect data conflicts.zSynchronization is expensive so:– Change how data is accessed to minimize the needfor synchronization.12

OutlineIntroduction to OpenMPz Creating Threadsz Synchronizationz Parallel Loopsz Synchronize single masters and stuffz Data environmentz Schedule your for and sectionsz Memory modelz OpenMP 3.0 and Tasksz13

OpenMP Programming Model:Fork-Join Parallelism: Masterthread spawns a team of threads as needed. Parallelismadded incrementally until performance goalsare met: i.e. the sequential program evolves into aparallel program.MasterThreadin redParallel RegionsSequential PartsAANestedNestedParallelParallelregionregion14

Thread Creation: Parallel RegionsYou create threads in OpenMP* with the parallelconstruct.z For example, To create a 4 thread Parallel region:zEachEachthreadthreadexecutesexecutes aacopycopyofofthethecodecode withinwithinthethestructuredstructuredblockblockz EachRuntimedouble acertaincertainomp set num threads(4);numberofthreadsnumberofthreads#pragma omp parallel{int ID omp get thread turningreturningaathreadthreadIDIDthread calls pooh(ID,A) for ID 0 to 315* The name “OpenMP” is the property of the OpenMP Architecture Review Board

Thread Creation: Parallel RegionsYou create threads in OpenMP* with the parallelconstruct.z For example, To create a 4 thread Parallel region:zdouble A[1000];EachEachthreadthreadexecutesexecutes aacopycopyofofthethecodecode withinwithinthethestructuredstructuredblockblockz numbernumberofofthreadsthreads#pragma omp parallel num threads(4){int ID omp get thread turningreturningaathreadthreadIDIDthread calls pooh(ID,A) for ID 0 to 316* The name “OpenMP” is the property of the OpenMP Architecture Review Board

Thread Creation: Parallel Regions examplezEach thread executes thesame code redundantly.double A[1000];omp set num ouble A[1000];omp set num threads(4);#pragma omp parallel{int ID omp get thread num();pooh(ID, A);}printf(“all done\n”);pooh(1,A)printf(“all t herehere r)17* The name “OpenMP” is the property of the OpenMP Architecture Review Board

Exercises 2 to 4:Numerical IntegrationMathematically, we know that:14.0 4.0(1 x2)dx πF(x) 4.0/(1 x2)0We can approximate theintegral as a sum ofrectangles:2.0N F(x )Δx πii 00.0X1.0Where each rectangle haswidth Δx and height F(xi) atthe middle of interval i.18

Exercises 2 to 4: Serial PI Programstatic long num steps 100000;double step;void main (){int i; double x, pi, sum 0.0;step 1.0/(double) num steps;for (i 0;i num steps; i ){x (i 0.5)*step;sum sum 4.0/(1.0 x*x);}pi step * sum;}19

Exercise 2Create a parallel version of the pi programusing a parallel construct.z Pay close attention to shared versus privatevariables.z In addition to a parallel construct, you will needthe runtime library routinesz intomp get num threads(); int omp get thread num(); double omp get wtime();Number of threads inthe teamThread ID or rankTime in Seconds since afixed point in the past20

OutlineIntroduction to OpenMPz Creating Threadsz Synchronizationz Parallel Loopsz Synchronize single masters and stuffz Data environmentz Schedule your for and sectionsz Memory modelz OpenMP 3.0 and Tasksz21

SynchronizationzHigh level synchronization:– critical– atomic– barrier– orderedzSynchronization isused to impose orderconstraints and toprotect access toshared dataLow level synchronization– flush– locks (both simple and nested)Discussedlater22

Synchronization: criticalzMutual exclusion: Only one thread at a timecan enter a critical region.float res;#pragma omp parallel{float B; int i, id, nthrds;id omp get thread onsume()nthrds omp get num threads();for(i id;i niters;i nthrds){B big job(i);#pragma omp criticalconsume (B, res);}}23

Synchronization: AtomiczAtomic provides mutual exclusion but onlyapplies to the update of a memory location (theupdate of X in the following example)#pragma omp parallel{double tmp, B;B DOIT();tmp big ugly(B);#pragma omp atomicX big ugly(B);tmp;Atomic only protectsthe read/update of X}24

Exercise 3In exercise 2, you probably used an array tocreate space for each thread to store its partialsum.z If array elements happen to share a cache line,this leads to false sharing.z– Non-shared data in the same cache line so eachupdate invalidates the cache line in essence“sloshing independent data” back and forthbetween threads.zModify your “pi program” from exercise 2 toavoid false sharing due to the sum array.25

OutlineIntroduction to OpenMPz Creating Threadsz Synchronizationz Parallel Loopsz Synchronize single masters and stuffz Data environmentz Schedule your for and sectionsz Memory modelz OpenMP 3.0 and Tasksz26

SPMD vs. worksharingA parallel construct by itself creates an SPMDor “Single Program Multiple Data” program i.e., each thread redundantly executes thesame code.z How do you split up pathways through thecode between threads within a team?z Thisis called worksharing– Loop construct– Sections/section constructsDiscussed later– Single construct– Task construct . Coming in OpenMP 3.027

The loop worksharing ConstructszThe loop workharing construct splits up loopiterations among the threads in a team#pragma omp parallel{#pragma omp forfor (I 0;I N;I ){NEAT STUFF(I);}}Loop constructname: C/C : for Fortran: doThe variable I is made “private” to eachthread by default. You could do thisexplicitly with a “private(I)” clause28

Loop worksharing ConstructsA motivating exampleSequential codeOpenMP parallelregionOpenMP parallelregion and aworksharing forconstructfor(i 0;I N;i )for(i 0;I N;i ) {{a[i]a[i] a[i]a[i] end;idid omp get thread num();omp get thread num();NthrdsNthrds omp get num threads();omp get num threads();istartistart idid**NN//Nthrds;Nthrds;iendiend (id 1)(id 1)**NN//Nthrds;Nthrds;ifif(id(id Nthrds-1)iendNthrds-1)iend N;N;for(i istart;I iend;i )for(i istart;I iend;i ) {{a[i]a[i] a[i]a[i] #pragma#pragmaompompforforfor(i 0;I N;i )for(i 0;I N;i ) {{a[i]a[i] a[i]a[i] b[i];}b[i];}29

Combined parallel/worksharing constructzOpenMP shortcut: Put the “parallel” and theworksharing directive on the same linedouble res[MAX]; int i;#pragma omp parallel{#pragma omp forfor (i 0;i MAX; i ) {res[i] huge();}}double res[MAX]; int i;#pragma omp parallel forfor (i 0;i MAX; i ) {res[i] huge();}TheseTheseareareequivalentequivalent30

Working with loopszBasic approach Findcompute intensive loops Make the loop iterations independent . So they cansafely execute in any order without loop-carrieddependencies Place the appropriate OpenMP directive and testint i, j, A[MAX];j 5;for (i 0;i MAX; i ) {j 2;A[i] big(j);}Note: loop index“i” is private bydefaultRemove loopcarrieddependenceint i, A[MAX];#pragma omp parallel forfor (i 0;i MAX; i ) {int j 5 2*i;A[i] big(j);}31

ReductionzHow do we handle this case?double ave 0.0, A[MAX]; int i;for (i 0;i MAX; i ) {ave A[i];}ave ave/MAX;zzzWe are combining values into a single accumulationvariable (ave) there is a true dependence betweenloop iterations that can’t be trivially removedThis is a very common situation it is called a“reduction”.Support for reduction operations is included in mostparallel programming environments.32

ReductionzzOpenMP reduction clause:reduction (op : list)Inside a parallel or a work-sharing construct:– A local copy of each list variable is made and initializeddepending on the “op” (e.g. 0 for “ ”).– Compiler finds standard reduction expressions containing“op” and uses them to update the local copy.– Local copies are reduced into a single value andcombined with the original global value.zThe variables in “list” must be shared in the enclosingparallel region.double ave 0.0, A[MAX]; int i;#pragma omp parallel for reduction ( :ave)for (i 0;i MAX; i ) {ave A[i];}ave ave/MAX;33

OpenMP: Reduction operands/initial-valueszzMany different associative operands can be used with reduction:Initial values are the ones that make sense mathematically.OperatorInitial value *-010C/C onlyOperator Initial value& 0 0 0&&1 0Fortran OnlyOperator.AND.OR.NEQV.IEOR.IOR.IAND.EQV.Initial value.true.false.false.00All bits on.true.MIN*Largest pos. numberMAX*Most neg. number34

Exercise 4Go back to the serial pi program and parallelizeit with a loop constructz Your goal is to minimize the number changesmade to the serial program.z35

OutlineIntroduction to OpenMPz Creating Threadsz Synchronizationz Parallel Loopsz Synchronize single masters and stuffz Data environmentz Schedule your for and sectionsz Memory modelz OpenMP 3.0 and Tasksz36

Synchronization: BarrierzBarrier: Each thread waits until all threads arrive.#pragma omp parallel shared (A, B, C) private(id){id omp get thread num();A[id] big erattheendof#pragma omp uct#pragma omp forfor(i 0;i N;i ){C[i] big calc3(i,A);}#pragma omp for nowaitfor(i 0;i N;i ){ B[i] big calc2(C, i); }A[id] big citbarrierattheendimplicit barrier at the egionregion

Master ConstructThe master construct denotes a structuredblock that is only executed by the master thread.z The other threads just skip it (nosynchronization is implied).z#pragma omp parallel{do many things();#pragma omp master{ exchange boundaries(); }#pragma omp barrierdo many other things();}38

Single worksharing ConstructzzThe single construct denotes a block of code that isexecuted by only one thread (not necessarily themaster thread).A barrier is implied at the end of the single block (canremove the barrier with a nowait clause).#pragma omp parallel{do many things();#pragma omp single{ exchange boundaries(); }do many other things();}39

Synchronization: orderedzThe ordered region executes in the sequentialorder.#pragma omp parallel private (tmp)#pragma omp for ordered reduction( :res)for (I 0;I N;I ){tmp NEAT STUFF(I);#pragma orderedres consum(tmp);}40

Synchronization: Lock routineszSimple Lock routines: Asimple lock is available if it is unset.– omp init lock(), omp set lock(),omp unset lock(), omp test lock(),omp destroy lock()z Nested LocksA lock implies amemory fence(a “flush”) ofall threadvisiblevariables Anested lock is available if it is unset or if it is set butowned by the thread executing the nested lock function– omp init nest lock(), omp set nest lock(),omp unset nest lock(), omp test nest lock(),omp destroy nest lock()Note: a thread always accesses the most recent copy of thelock, so you don’t need to use a flush on the lock variable.41

Synchronization: Simple LockszProtect resources with locks.omp lock t lck;omp init lock(&lck);#pragma omp parallel private (tmp, id){WaitWaitherehereforforid omp get thread num();youryourturn.turn.tmp do lots of work(id);omp set d %d”, id, tmp);sosothethenextnextthreadthreadomp unset lock(&lck);getsgetsaaturn.turn.}Free-upomp destroy 42

Runtime Library routineszRuntime environment routines:– Modify/Check the number of threads– omp set num threads(), omp get num threads(),omp get thread num(), omp get max threads()– Are we in an active parallel region?– omp in parallel()– Do you want the system to dynamically vary the number ofthreads from one parallel construct to another?– omp set dynamic, omp get dynamic();– How many processors in the system?– omp num procs() plus a few less commonly used routines.43

Runtime Library routinesTo use a known, fixed number of threads in a program,(1) tell the system that you don’t want dynamic adjustment ofthe number of threads, (2) set the number of threads, then (3)save the number you got.Disable dynamic adjustment of thenumber of threads.#include omp.h void main()Request as many threads as{ int num threads;you have processors.omp set dynamic( 0 );omp set num threads( omp num procs() );#pragma omp parallelProtect this op since Memory{ int id omp get thread num();stores are not atomic#pragma omp singlenum threads omp get num threads();do lots of eadsthanthanrequested.requested. accordingly.accordingly.z

Environment VariableszSet the default number of threads to use.– OMP NUM THREADS int literalzControl how “omp for schedule(RUNTIME)”loop iterations are scheduled.– OMP SCHEDULE “schedule[, chunk size]” Plus several less commonly used environment variables.45

OutlineIntroduction to OpenMPz Creating Threadsz Synchronizationz Parallel Loopsz Synchronize single masters and stuffz Data environmentz Schedule your for and sectionsz Memory modelz OpenMP 3.0 and Tasksz46

Data environment:Default storage attributeszShared Memory programming model:– Most variables are shared by defaultzGlobal variables are SHARED among threads– Fortran: COMMON blocks, SAVE variables, MODULEvariables– C: File scope variables, static– Both: dynamically allocated memory (ALLOCATE, malloc, new)zBut not everything is shared.– Stack variables in subprograms(Fortran) or functions(C) calledfrom parallel regions are PRIVATE– Automatic variables within a statement block are PRIVATE.47

Data sharing: Examplesextern double A[10];void work(int *index) {double temp[10];static int count;.}double A[10];int main() {int index[10];#pragma omp parallelwork(index);printf(“%d\n”, index[0]);}A, index, l totoeacheachthreadthreadtemptemptempA, index, count48* Third party trademarks and names are the property of their respective owner.

Data sharing:Changing storage attributeszOne can selectively change storage attributes forconstructs using the following clauses*– SHARED– PRIVATE– FIRSTPRIVATEzAllAllthethe entireentireregion.region.The final value of a private inside a parallel loop can betransmitted to the shared variable outside the loop with:– LASTPRIVATEzThe default attributes can be overridden with:– DEFAULT (PRIVATE SHARED NONE)DEFAULT(PRIVATE) is Fortran onlyAll data clauses apply to parallel constructs and worksharing constructs except“shared” which only applies to parallel constructs.49

Data Sharing: Private Clausezprivate(var) creates a new local copy of var for each thread.– The value is uninitialized– In OpenMP 2.5 the value of the shared variable is undefined afterthe regionvoid wrong() {int tmp 0;#pragma omp for private(tmp)for (int j 0; j 1000; j)tmp j;printf(“%d\n”, 50

Data Sharing: Private ClauseWhen is the original variable valid?zzThe original variable’s value is unspecified in OpenMP 2.5.In OpenMP 3.0, if it is referenced outside of the construct– Implementations may reference the original variable or a copy .A dangerous programming practice!int tmp;void danger() {tmp 0;#pragma omp parallel private(tmp)work();printf(“%d\n”, extern int tmp;void work() {tmp tmp51

Data Sharing: Firstprivate ClausezFirstprivate is a special case of private.– Initializes each private copy with the correspondingvalue from the master thread.void useless() {int tmp 0;#pragma omp for firstprivate(tmp)for (int j 0; j 1000; j)tmp j;printf(“%d\n”, p:00inin3.0,3.0,unspecifiedunspecifiedinin2.52.552

Data sharing: Lastprivate ClausezLastprivate passes the value of a private from thelast iteration to a global variable.void closer() {int tmp 0;#pragma omp parallel for firstprivate(tmp) \lastprivate(tmp)for (int j 0; j 1000; j)tmp j;printf(“%d\n”, ration(i.e.,(i.e.,forforj 999)j 999)53

Data Sharing:A data environment testzConsider this example of PRIVATE and FIRSTPRIVATEvariables A,B, and C 1#pragma omp parallel private(B) firstprivate(C)zzAre A,B,C local to each thread or shared inside the parallel region?What are their initial values inside and values after the parallel region?Inside this parallel region .zz“A” is shared by all threads; equals 1“B” and “C” are local to each thread.– B’s initial value is undefined– C’s initial value equals 1Outside this parallel region .zThe values of “B” and “C” are unspecified in OpenMP 2.5, and inOpenMP 3.0 if referenced in the region but outside the construct.54

Data Sharing: Default ClausezNote that the default storage attribute is DEFAULT(SHARED) (sono need to use it) zException: #pragma omp taskTo change default: DEFAULT(PRIVATE)each variable in the construct is made private as if specified in aprivate clause mostly saves typing zDEFAULT(NONE): no default for variables in static extent. Mustlist storage attribute for each variable in static extent. Goodprogramming practice!Only the Fortran API supports default(private).C/C only has default(shared) or default(none).55

Data Sharing: Default Clause Exampleitotal 1000C OMP PARALLEL PRIVATE(np, each)np omp get num threads()each itotal/np C OMP END PARALLELThese twocodefragments areequivalentitotal 1000C OMP PARALLEL DEFAULT(PRIVATE) SHARED(itotal)np omp get num threads()each itotal/np C OMP END PARALLEL56

3.0Data Sharing: tasks (OpenMP 3.0)zzThe default for tasks is usually firstprivate, because the task maynot be executed until later (and variables may have gone out ofscope).Variables that are shared in all constructs starting from theinnermost enclosing parallel construct are shared, because thebarrier guarantees task completion.#pragma omp parallel shared(A) private(B){.#pragma omp task{int C;compute(A, B, C);}}A is sharedB is firstprivateC is private57

Data sharing: ThreadprivatezMakes global data private to a thread Fortran:COMMON blocks C: File scope and static variables, static class memberszDifferent from making them PRIVATE withPRIVATE global variables are masked. THREADPRIVATE preserves global scope within eachthreadzThreadprivate variables can be initialized usingCOPYIN or at time of definition (using languagedefined initialization capabilities).58

A threadprivate example (C)Use threadprivate to create a counter for eachthread.intintcountercounter adprivate(counter)intintincrement counter()increment counter(){{counter ;counter ;returnreturn(counter);(counter);}}59

Data Copying: CopyinYou initialize threadprivate data using a copyinclause.parameterparameter(N 1000)(N 1000)common/buf/A(N)common/buf/A(N)! OMP! alizeInitializethetheAAarrayarraycallcallinit data(N,A)init data(N,A)! OMP! OMPPARALLELPARALLELCOPYIN(A)COPYIN(A) eadprivatearrayarrayAAinitialiedinitialied broutinesubroutineinit data()init data()! OMP! OMPENDENDPARALLELPARALLELendend60

Data Copying: CopyprivateUsed with a single region to broadcast values of privatesfrom one member of a team to the rest of the team.#include#include omp.h omp.h voidvoidinput parametersinput valuesofofinputinputparametersparametersvoidvoiddo work(int,do private(Nsize,(Nsize,choice)choice)input parametersinput parameters(Nsize,(Nsize,choice);choice);}}}}do work(Nsize,do work(Nsize,choice);choice);61

Exercise 5: Monte Carlo CalculationsUsing Random numbers to solve tough problemszzSample a problem domain to estimate areas, computeprobabilities, find optimal values, etc.Example: Computing π with a digital dart board:2*rzzThrow darts at the circle/square.Chance of falling in circle isproportional to ratio of areas:A c r2 * πAs (2*r) * (2*r) 4 * r2P Ac/As π /4zN N 1010ππ 2.82.8N 100N 100ππ 3.163.16N N 10001000 ππ 3.1483.148Compute π by randomly choosingpoints, count the fraction that falls inthe circle, compute pi.62

Exercise 5zWe provide three files for this exercise pi mc.c:the monte carlo method pi program random.c: a simple random number generator random.h: include file for random number generatorzCreate a parallel version of this program withoutchanging the interfaces to functions in random.c Thisis an exercise in modular software why should a userof your parallel random number generator have to know anydetails of the generator or make any changes to how thegenerator is called?zExtra Credit: Makethe random number generator threadsafe. Make your random number generator numerically correct (nonoverlapping sequences of pseudo-random numbers).63

OutlineIntroduction to OpenMPz Creating Threadsz Synchronizationz Parallel Loopsz Synchronize single masters and stuffz Data environmentz Schedule your for and sectionsz Memory modelz OpenMP 3.0 and Tasksz64

Sections worksharing ConstructzThe Sections worksharing construct gives adifferent structured block to each ompsectionsectionX calculation();X calculation();#pragma#pragmaompompsectionsectiony calculation();y calculation();#pragma#pragmaompompsectionsectionz calculation();z calculation();}}}}By default, there is a barrier at the end of the “ompsections”. Use the “nowait” clause to turn off the barrier.65

loop worksharing constructs:The schedule clausezThe schedule clause affects how loop iterations aremapped onto threads schedule(static [,chunk])– Deal-out blocks of iterations of size “chunk” to each thread. schedule(dynamic[,chunk])– Each thread grabs “chunk” iterations off a queue until alliterations have been handled. schedule(guided[,chunk])– Threads dynamically grab blocks of iterations. The size of theblock starts large and shrinks down to size “chunk” as thecalculation proceeds. schedule(runtime)– Schedule and chunk size taken from the OMP SCHEDULEenvironment variable (or the runtime library for OpenMP 3.0).66

loop work-sharing constructs:The schedule clauseSchedule ClauseWhen To UseSTATICPre-determined andpredictable by theprogrammerDYNAMICUnpredictable, highlyvariable work periterationGUIDEDSpecial case of dynamicto reduce atatrun-timerun-time67

Exercise 6: hardzConsider the program linked.c Traversesa linked list computing a sequence ofFibonacci numbers at each node.Parallelize this program using constructsdefined in OpenMP 2.5 (loop worksharingconstructs).z Once you have a correct program, optimize it.z68

Exercise 6: easyParallelize the matrix multiplication program inthe file matmul.cz Can you optimize the program by playing withhow the loops are scheduled?z69

OutlineIntroduction to OpenMPz Creating Threadsz Synchronizationz Parallel Loopsz Synchronize single masters and stuffz Data environmentz Schedule your for and sectionsz Memory modelz OpenMP 3.0 and Tasksz70

OpenMP memory modelzOpenMP supports a shared memory model.zAll threads share an address s

8 OpenMP core syntax zMost of the constructs in OpenMP are compiler directives. #pragma omp construct [clause [clause] ] Example #pragma omp parallel num_threads(4) zFunction prototypes and types in the file: #include omp.h zMost OpenMP* constructs apply to a “structured block”. Structured block: a

Related Documents:

work/products (Beading, Candles, Carving, Food Products, Soap, Weaving, etc.) ⃝I understand that if my work contains Indigenous visual representation that it is a reflection of the Indigenous culture of my native region. ⃝To the best of my knowledge, my work/products fall within Craft Council standards and expectations with respect to

He’s got the whole world in his hands . He’s got the whole world in his hands . Verse 1 He’s got the itty bitty babies in his hands . He’s got the girls and boys in his hands . He’s got the great big grown-ups in his hands . He’s got the whole world in his hands . Verse 2 He’s got the earth

whole world in His hands, t# J. J 111T1 — l Iii— whole world in His hands, inp — whole world in His hands, PP He's got the nip r -i p Lr Lr Lr p r p r D Lr Li - whole world in His 6J hands, ., He's got the lit- tle bit-ty ba-by in His hands, He's got the 94# lt ! r oi i .i. i iii i r. . .) 1 .) r. .) 1 r. gr.) A J) r. .1) i .1) 17

Dec 23, 2020 · Hands-On Science Grade 6 Portage & Main Press, 2016 · V Hands-On Science · Grade 6 · ISBN: 978-1-55379-314-4 2 Introduction to Hands-On Science Program Introduction Hands-On Science helps develop students’ scientific literacy through active inquiry, problem solving, and decision ma

akuntansi musyarakah (sak no 106) Ayat tentang Musyarakah (Q.S. 39; 29) لًََّز ãَ åِاَ óِ îَخظَْ ó Þَْ ë Þٍجُزَِ ß ا äًَّ àَط لًَّجُرَ íَ åَ îظُِ Ûاَش

Collectively make tawbah to Allāh S so that you may acquire falāḥ [of this world and the Hereafter]. (24:31) The one who repents also becomes the beloved of Allāh S, Âَْ Èِﺑاﻮَّﺘﻟاَّﺐُّ ßُِ çﻪَّٰﻠﻟانَّاِ Verily, Allāh S loves those who are most repenting. (2:22

Hands Are Not for Hitting By Martine Agassi Free Spirit Publishing, Inc. Hands Are Not for Hittingis a story about alternative actions and activities that children and adults can do with their hands instead of hitting. It teaches correct use of hands in an encouraging way through the use of simple language and descriptive illustrations

God Has No Hands but Your Hands!! . The congregation began rebuilding the church facilities. One day a sculptor saw the broken figure of Christ, and offered to carve new hands. The church officials met to consider the sculptor’s friendly . Glenna Phi