Patterns For Parallel Programming

2y ago
38 Views
4 Downloads
1.02 MB
61 Pages
Last View : 13d ago
Last Download : 3m ago
Upload by : Carlos Cepeda
Transcription

Patterns for Parallel ProgrammingTim Mattson (Intel)Kurt Keutzer (UCB EECS)1

UCB’s Par Lab: Research OverviewEasy to write correct software that runs efficiently on manycoreytivticu rdoPr Layecynie rciEff ayeLAOS.hrcPersonal Image Hearing,ParallelSpeechHealth Retrieval MusicBrowserDesign Pattern LanguageComposition & Coordination Language (C&CL)C&CL Communication &SchedulersCheckingCodeSynch. PrimitivesEfficiency Language CompilersDebuggingOS Libraries & Serviceswith ReplayLegacy OSHypervisorIntel Multicore/GPGPURAMP ManycoreCorrectnessppAsnotialic2

Our goal: use the patterns to guide us tothe right frameworks123Domain Experts Applicationpatterns &frameworksEnd-user,applicationprogramsDomain literateprogramming gurus (1% ofthe population). Parallelpatterns l programming gurus (1-10% ofprogrammers)ParallelprogrammingframeworksThe hope is for Domain Experts to create parallel code with little or nounderstanding of parallel programming.Leave hardcore “bare metal” efficiency layer programming to the parallelprogramming experts3

But this is a course on parallel programming languages.I have something much more important to talk to youabout than patterns. besides, Kurt Keutzer or I can always come backand talk about patterns any time you want.4

Programmability and the ParallelProgramming ProblemTim Mattson (Intel)5

Computing with 100s or 1000s of cores is notnew . Intel has been doing it for decades.iPSC/860ParagonASCI Option RediPSC/28590IntelScientificFoundediPSC/2 shippediPSC/1shipped95Delta shipped - fastestcomputer in the worldiPSC/860 shipped,Wins Gordon BellprizeParagon shipped Breaks Delta recordsASCI Red, World’sFirst TFLOPASCI Red upgrade: Regainstitle as the “world’s fastestcomputer”Source: Intel6

The result membership in the “DeadArchitecture GIMyriasDistributedMemoryMIMDParallel ComputingIntel SSDBBNIBMWorkstation/PC clustersMassparSIMDThinking machinesICL/DAPGoodyearis toxic!MultiflowFPSOtherKSRDenelcore HEPTera/MTA – now Cray19801990Any product names on this slide are the property of their owners.20007

What went wrong . Software Parallel systems are useless without parallel software.Can we generate parallel software automatically? NO!!! After years of trying we know it just doesn’t work.Our only hope is to get programmers to create parallel software. But after 25 years of research, we are no closer to solvingthe parallel programming problem Only a tiny fraction of programmers write parallel code. Will the “if you build it they will come” principle apply?Many hope so, but .that implies that people didn’t really try hard enough over the last25 years. Does that really make sense?8

All you need is a good Parallel ProgrammingLanguage, right?Parallel Programming environments in the 90’sABCPLACEACT Active aARTSAthapascan-0bAuroraAutomapbb threadsBlazeBSPBlockCommC*."C* in CC**CarlOSCashmereC4CC ChuCharlotteCharmCharm PCthreadsCUMULVSDAGGERDAPPLEData Parallel CDC DCE DDDDICE.DIPCDOLIBDOMEDOSMOS.DRLDSM-ThreadsEase lamentsFMFLASHThe ellHPC JAVAR.HORUSHPCIMPACTISIS.JAVARJADEJava ucidMaisieManifoldMentatLegionMeta odula-2*MultipolMPIMPC MuninNano-ThreadsNESLNetClasses NexusNimrodNOWObjective LindaOccamOmegaOpenMPOrcaOOF90P P3Lp4-LindaPabloPADEPADREPandaPapersAFAPI.Para ParadigmThird party names are the property of their owners.Parafrase2ParalationParallel-C ParallaxisParCParLib ParLinParmacsPartipCpC isPOOMAPOOL-TPRESTOP-RIOProsperoProteusQPC PVMPSIPSDMQuakeQuarkQuick ThreadsSage SCANDALSAMpC uted nergyTelegrphosSuperPascalTCGMSG.Threads.h .TreadMarksTRAPPERuC UNITYUCVViC*Visifold V-NUSVPEWin32 threadsWinParWWWindaXENOOPSXPCZoundsZPL9

All you need is a good Parallel ProgrammingLanguage, right?Parallel Programming environments in the 90’sABCPLACEACT Active aARTSAthapascan-0bAuroraAutomapbb threadsBlazeBSPBlockCommC*."C* in CC**CarlOSCashmereC4CC ChuCharlotteCharmCharm PCthreadsCUMULVSDAGGERDAPPLEData Parallel CDC DCE DDDDICE.DIPCDOLIBDOMEDOSMOS.DRLDSM-ThreadsEase lamentsFMFLASHThe ellHPC JAVAR.HORUSHPCIMPACTISIS.JAVARJADEJava ucidMaisieManifoldMentatLegionMeta odula-2*MultipolMPIMPC MuninNano-ThreadsNESLNetClasses NexusNimrodNOWObjective LindaOccamOmegaOpenMPOrcaOOF90P P3Lp4-LindaPabloPADEPADREPandaPapersAFAPI.Para ParadigmParafrase2ParalationParallel-C ParallaxisParCParLib ParLinParmacsPartipCpC isPOOMAPOOL-TPRESTOP-RIOProsperoProteusQPC PVMPSIPSDMQuakeQuarkQuick ThreadsSage SCANDALSAMpC uted nergyTelegrphosSuperPascalTCGMSG.Threads.h .TreadMarksTRAPPERuC UNITYUCVViC*Visifold V-NUSVPEWin32 threadsWinParWWWindaXENOOPSXPCZoundsZPLBefore creating any new languages, maybe we shouldfigure out why parallel programming language researchbeen so unproductive.Maybe part of the problem is how we compareprogramming languages.Third party names are the property of their owners.10

Comparing programming languages: Ct and OpenMPOpenMP void smvp csr double 3x3mat(double3vec* dst,double3x3mat* A,int* cind,int* rows,double3vec* v,int width,int height,int nonzeros,int numThreads,void* pattern,double3vec* scratch,int* scratch int) {int i;double3vec* scratch1 scratch;double3vec* scratch2 &(scratch[MAX(nonzeros,height)]);double3vec* scratch3 &(scratch[MAX(nonzeros,height)*2]);int* scratch icast (int*)scratch;int baseStripSize nonzeros/numThreads;int leftoverStripSize nonzeros%numThreads;double3vec incomingarr[MAXPRIMTHREADS];int incomingseg[MAXPRIMTHREADS];int incomingsegs[MAXPRIMTHREADS];int* segflags ((multipattern*)pattern)- segdesc;int incomingarr int[MAXPRIMTHREADS]; #pragma omp parallel num threads(numThreads){#ifdef OPENMPint threadId omp get thread num();#elseint threadId 0;#endifint lowerBound threadId*baseStripSize (threadId leftoverStripSize ? threadId : leftoverStripSize);int upperBound (threadId 1)*baseStripSize (threadId 1 leftoverStripSize ? threadId 1 : leftoverStripSize);double3vec incoming;int incomingsgs 0;int elt;int frontier;int seghits 0;int seghit segflags[lowerBound];int incoming int 0; incoming[0] 0.0;incoming[1] 0.0;incoming[2] 0.0; if ((upperBound ! nonzeros) && (segflags[upperBound]))seghits 1;/* Fused Local Phase 1 Inner Product Local Reduction*/matrixvector3x3x1 multiply(scratch1[lowerBound], A[lowerBound], v[cind[lowerBound]]);for (elt lowerBound 1; (elt upperBound) && (elt nonzeros); elt ) {if (segflags[elt]) {matrixvector3x3x1 multiply(scratch1[elt], A[elt], v[cind[elt]]);seghit TRUE;seghits ;} else {matrixvector3x3x1 multiply(scratch1[elt], A[elt], v[cind[elt]]);vector3x1 ;}} /* SegReduction Global Phase */#pragma omp barrierincomingsegs[threadId] seghits;incomingseg[threadId] seghit;if (threadId) {vector3x1 cratch1[lowerBound-1]);} else {incomingarr[threadId][0] 0;incomingarr[threadId][1] 0;incomingarr[threadId][2] 0;}incoming[0] incomingarr[threadId][0];incoming[1] incomingarr[threadId][1];incoming[2] incomingarr[threadId][2];incomingsgs incomingsegs[threadId];seghit FALSE;#pragma omp barrier frontier 1;while (frontier numThreads) {if (threadId frontier) {if ((!incomingseg[threadId - frontier]) && !seghit) {vector3x1 eadId-frontier]);}incomingsgs incomingsegs[threadId] incomingsegs[threadId - frontier];}if (incomingseg[threadId - frontier])seghit TRUE;frontier 1;#pragma omp barrierincomingarr[threadId][0] incoming[0];incomingarr[threadId][1] incoming[1];incomingarr[threadId][2] incoming[2];incomingsegs[threadId] incomingsgs;#pragma omp barrier} seghit segflags[lowerBound]; incomingsgs (threadId ? incomingsegs[threadId-1] : 0);for (elt lowerBound; (elt upperBound) && (elt nonzeros); elt ) {if ((elt nonzeros-1) segflags[elt 1]) {if (!seghit) {vector3x1 dId]);vector3x1 ctor3x1 addinplace(scratch3[incomingsgs],scratch1[elt]);} else {vector3x1 ctor3x1 ncomingsgs ;seghit TRUE;}} baseStripSize height/numThreads;leftoverStripSize height%numThreads;lowerBound threadId*baseStripSize (threadId leftoverStripSize ? threadId : leftoverStripSize);upperBound (threadId 1)*baseStripSize (threadId 1 leftoverStripSize ? threadId 1 : leftoverStripSize); ((int*)scratch2)[lowerBound] 0;scratch icast[upperBound-1] (rows[upperBound-1] rows[upperBound] ? 0 : 1);Nested Data ParallelVEC double sparseMatrixVectorProduct(VEC double A, VEC int rowindex,VEC int cols, VEC double v){VEC expv distribute(v,cols);VEC product A*expv;return multiReduceSum(product,rowindex);}for (elt lowerBound 1; (elt upperBound)&& (elt height); elt ) {scratch icast[elt-1] (rows[elt-1] rows[elt] ? 0 : 1);((int*)scratch2)[elt] (((int*)scratch2)[elt-1] scratch icast[elt-1]); }#pragma omp barrierif (threadId)incomingarr int[threadId] ((int*)scratch2)[lowerBound-1] scratch icast[lowerBound-1];elseincomingarr int[threadId] 0;#pragma omp barrierincoming int incomingarr int[threadId];frontier 1;while (frontier numThreads) {if (threadId frontier) {incoming int incomingarr int[threadId - frontier];}frontier 1;#pragma omp barrierincomingarr int[threadId] incoming int;#pragma omp barrier}((int*)scratch2)[upperBound-1] incomingarr int[threadId];#pragma omp barrier if (threadId) {incomingarr int[threadId] ((int*)scratch2)[lowerBound-1] scratch icast[lowerBound-1]; /* barrier above guarantees the dst isn't read until after it's updated */ for (elt lowerBound; (elt upperBound-1) && (elt height); elt ) {((int*)scratch2)[elt] incomingarr int[threadId];if (scratch icast[elt])vector3x1 copy(dst[elt],scratch3[((int*)scratch2)[elt]]);}if (scratch icast[upperBound-1])vector3x1 pperBound-1]]);} else { /* threadId ! 0 */for (elt lowerBound; (elt upperBound) && (elt height); elt ) {if (scratch icast[elt])vector3x1 copy(dst[elt],scratch3[((int*)scratch2)[elt]]);}}} /* parallel region */}6 lines of codeBetter performance & scalability172 lines of codeNestedNested datadata parallelismparallelism enablesenables safesafe andand scalablescalablecompositioncomposition ofof softwaresoftware modulesmodules What conceptually does this comparison tell us? Anything?Isn’t this just marketing-speak disguised as reasoned analysis?11

Win32 API vs OpenMP: Which would you rather use?#include windows.h #define NUM THREADS 2HANDLE thread handles[NUM THREADS];CRITICAL SECTION hUpdateMutex;static long num steps 100000;double step;double global sum 0.0;void Pi (void *arg){int i, start;double x, sum 0.0;start *(int *) arg;step 1.0/(double) num steps;for (i start;i num steps; i i NUM THREADS){x (i-0.5)*step;sum sum 4.0/(1.0 x*x);}EnterCriticalSection(&hUpdateMutex);global sum sum;LeaveCriticalSection(&hUpdateMutex);}void main (){double pi; int i;DWORD threadID;int threadArg[NUM THREADS];for(i 0; i NUM THREADS; i ) threadArg[i] i 1;InitializeCriticalSection(&hUpdateMutex);for (i 0; i NUM THREADS; i ){thread handles[i] CreateThread(0, 0,(LPTHREAD START ROUTINE) Pi,&threadArg[i], 0, &threadID);#include omp.h static long num steps 100000;double step;void main (){int i;double x, pi, sum 0.0;step 1.0/(double) num steps;#pragma omp parallel for reduction( :sum) private(x)for (i 1;i num steps; i ){x (i-0.5)*step;sum sum 4.0/(1.0 x*x);}pi step * sum;}OpenMP}WaitForMultipleObjects(NUM THREADS,thread handles, TRUE,INFINITE);pi global sum * step;printf(" pi is %f \n",pi);}Win32 ThreadsI’mI’mjustjustasasbadbad . emakeininmymyOpenMPOpenMPtalks.talks.12

Comparing programming languages/APIs If we are going to make progress on the parallel programmingproblem, we must stop taking pot-shots at each other and stop actinglike marketers.We need to objectively compare parallel programming languages.Categorize the way people work with parallel programminglanguages and use those categories to evaluate differentlanguages.Define A common terminology for aspects of programmabilityCreate a set of useful programmability benchmarksIf we want to be “good academics” we need to build newlanguages based on a clear understanding of past work.We need a “theory of programmability” grounded inproductive peer review.And the first step is to define a human language ofprogrammability13

Towards a human language of programmability Algorithms: How do people think about parallel programming?A Language of programmabilityProgrammability metrics/benchmarks14

Design patterns were used helppeople “think OOP” A design pattern is:A “solution to a problem in acontext”.A structured description of highquality solutions to recurringproblemsA quest to encode expertise so alldesigners can capture that“quality without a name” thatdistinguishes truly excellentdesigns A pattern language is:A structured catalog of designpatterns that supports designactivities as “webs of connecteddesign patterns”.15

Design Patterns:A silly example Name: Money Pipeline Context: You want to get rich and all you have to work with is a C.S. degreeand programming skills. How can you use software to get rich? Forces: The solution must resolve the forces:It must give the buyer something they believe they need.It can’t be too good, or people won’t need to buy upgrades.Every good idea is worth stealing -- anticipate competition. Solution: Construct a money pipelineCreate SW with enough functionality to do something useful most of the time. Thiswill draw buyers into your money pipeline.Promise new features to thwart competitors.Use bug-fixes and a slow trickle of new features to extract money as you movebuyers along the pipeline.16

Let’s use Design patterns to helppeople “think parallel”A pattern language forparallel algorithmdesign with examplesin MPI, OpenMP andJava.This is our hypothesisfor how programmersthink about parallelprogramming.Now available at a bookstore near you!17

CS294 2008 – puttingit all together OurPattern Language(OPL 1.0)OPL 1.0PPPL: ParallelProgrammingPatternlanguage13 dwarves18

CS294-2009:OPL Version 2.0ApplicationsIdentify the key computational patterns – what are mykey computations? Guided instantiationGraphical modelsChoose your high level structure – what is thestructure of my application? Guided expansionComputationProductivity LayerSW ArchitecturePipe-and-filterModel-view controllerAgent and RepositoryBulk iterativeProcess ControlMap reduceEvent based, implicitinvocationLayered systemsFinite state machinesDynamic ProgrammingBacktrack Branch andBoundDense Linear AlgebraSparse Linear AlgebraArbitrary Static Task GraphUnstructured GridsStructured GridsN-Body methodsCircuitsSpectral MethodsMonte CarloParallel algorithm strategy - what high level strategy do I use for my parallel algorithm? Guided re-organizationTask ParallelismRecursive splittingEfficiency LayerGraph AlgorithmsData ParallelismPipelineAlgorithmGeometric DecompositionDiscrete EventDigital CircuitsGraph PartitioningImplementation strategy – how do I express the algorithm in software? Guided programmingSPMDMaster/workerDistributed ArrayShared QueueStrict-data-parLoop ParallelismShared DataShared Hash TableData strucFork/JoinBSPProgstrucActorsSource CodeExecutionConcurrent execution – how do we support the execution of the parallel algorithm? Guided HW mapping.CSPAdv. programCoordinationThread PoolMsg passPt-2-pt syncTask GraphcountersSpeculationcoll commcoll syncSIMD19data flowmutual exclusionTrans. Mem

OPL 2.0: outputs and targetsWhat is output at each layer of the patternlanguage?PrimaryAudienceEnd userApplicationApplication FrameworkOutput: High level softwarearchitecture.Output: define keycomputational subsubproblemsApplicationprogrammerApp frameworkdeveloperProgrammingFramework developerOutput: Parallel algorithmsPar frameworkdeveloperOutput: Parallel software/pseudosoftware/pseudo-codeOutput: Task graphs, datadata-flow graphs, or other execution abstractionsHW or system SWarchitects20

Status of OPL Conceptually we have made huge progressOur overall structure is sound.A complete list of patterns with a breif description are in the appendix tothis lecture.Current draft of patterns are available on line tternsStructural and Computational patterns:Many have text and are in decent shape: Often need better examples. Need figures and more detailed examples.Lower three layers:Many can be quickly generated based on content from PPPLNew patterns reflect more detailed analysis since PPPL was finished.Need focus on the needs of efficiency-layer programmers and ngduringCS294’2009!CS294’2009!21

OPL Example: the Linear Algebra dwarf Reservoir modelingQuantum ChemistryImage processingLeast squares and other statisticscalculationsSeismic Signal processingFinancial analyticsBusiness analytics (operations research)Graphics and more areas than I could hope to list22

Linear Algebra applicationsLinear Algebra Applications follow a similar pattern Loop over appropriate parameter: Build matrices Operate on matrices Update result Continue till termination conditionThe program is organized around the matrices (dense or sparse) inscientific computing with a monolithic architecture these problems usethe geometric decomposition and distributed array patterns.Ideally the library routines used to operate on the matrices dictate thestructure of the matrices the programmer needs to understand these sohe or she can build the matrices with the required structure.2323

decompositionWhat if we used a more rigorousarchitectural approach?StructuralComputationalConcurrency strategyImplementation strategyPar prog building blocks Pipe-and-filterAgent and RepositoryProcess ControlEvent based, implicit invocationModel-view-controllerBulk iterativeMap reduceLayered systemsArbitrary Static Task GraphWhich ArchitecturalStyle is appropriatefor linear algebraapplications?24

Agent & Repository (Blackboard) Data-centered repository styles: blackboard ordatabaseKey elements:Repository (Blackboard) of the resulting creation that isshared by all agentsAgents: intelligent agents that will act on blackboardController: orchestrates agents access to the repository andcreation of the aggregate nts2525

The controller The controller for the agents can use a variety ofapproaches:Master worker pattern when the updates vary widely andunpredictably.Geometric decomposition with a static mapping onto agents The Blackboard is a “logical structure”:Even on a shared memory machine, we organize it intoblocks to optimize data locality hence treat it quite similarto distributed memory environments.The black board may be distributed among the agents: Use an owner-computes filter i.e. equally spread out the black board,and each agent updates the blocks it “owns”.2626

Case Study: Quantum ChemistryI/OGeometryOptimizationmemorySingle PointEnergyGUIReactionPathwayBroadcastevent toagentsMatricesTerminationEventLoop Loop over appropriate parameter:On each agent Build matrix Diagonalize matrix Compute physical quantities from Eigenvalues/Eigenvectors Exit on termination condition2727

Managing concurrent access to matrices Matrices are stored according to the needs ofthe libraries and the target platforms. How do we abstract these complexities fromthe programmer?2828

decompositionDistributed Array Pattern Name: Distributed arrayProblem:StructuralComputationalConcurrency strategyImplementation strategyPar prog building blocksArrays often need to be partitioned between multiple UEs. How canwe do this so the resulting program is both readable and efficient? ForcesLarge number of small blocks organized to balance load.Able to specialize organization to different platforms/problems.Understandable indexing to make programming easier. Solution:Express algorithm in blocksAbstract indexing inside mapping functions programmer worksin an index space natural to the domain, functions map intodistribution needed for efficient execution.The text of the pattern defines some of these common mappingfunctions (which can get quite confusing and in the literature areusually left as “an exercise for the reader”).2929

Global ArraysDistributed dense arrays that can be accessed through ashared data-like stylePhysically distributed datasingle, shared data structure/global indexinge.g., access A(4,3) rather thanbuf(7) on task 2Global Address Space3030

Global Array Model of ComputationsShared Objectemorycompute/updatelocal memoryto shcopylocal mputgetlocal memoryaredcopy toobjectShared Objectlocal memory3131

Use of the patterns Patterns help us describe expert solutions to parallel programmingThey give us a language to describe the architecture of parallelsoftware.They provide a roadmap to the frameworks we need to supportgeneral purpose programmers. And they give us a way to systematically mapprogramming languages onto of parallel algorithms thereby comparing their range of suitability32

Example: Using patterns to discuss programmabilityPatternOpenMPOpenCLMPITask ParallelismRecursive splittingwith 3.0PipelineGeometric DecompositionDiscrete EventSPMDSIMDFork/Joinwith MPI 2.0ActorsBSPLoop ParallelismMaster/workerGood mappingWeak mappingNot suitable33

Towards a human language of programmability Algorithms: what are the standard algorithms parallelprogramming experts take for granted?A Language of programmabilityProgrammability metrics/benchmarks34

How do we describe human interaction with the programming language? Thomas Green is a well known researcher in the “psychology ofprogramming” community.After years of work on formal cognitive models with little to show forit, he concluded:The way forward is not to make strong, simple claimsabout how cognitive process work. The wayforward is to study the details of how notationsconvey information. He proposed a set of “Cognitive Dimensions” as a “discussionframework” for information notations.Cognitive Dimensions in actionFirst used to analyze visual programming languages.Since then, its used to analyze a number of informationappliances.Used by Steven Clarke of Microsoft to analyze C#35

Cognitive dimensions There are around 13 of them. The 10 most important to parallel programmingare:Viscosity: how hard is it to introduce small changes.Hidden Dependencies: does a change in one part of a program cause other partsto change in ways not overtly apparent in the program text?Error Proneness: How easy is it to make mistakes?Progressive Evaluation: can you check a program while incomplete? Canparallelism be added incrementally?Abstraction Gradient: how much is required? How much abstraction is possibleCloseness of mapping: how well does the language map onto the problemdomain?Premature commitment: Does the notation constrain the order you do things? AKAimposed look ahead.Consistency: Similar semantics implied by similar syntax. Can you guess one partof the notation given other parts?Hard mental operations: does the notation lead you to complex combinations ofprimitive operationsTerseness: how succinct is the language?For parallel programming, I’ll add two moreHW visibility: is a useful cost model exposed to the programmer?Portability: does the notation assume constraints on the hardware?36

Cognitive Dimensions: viscosity How easy is it to introduce small changes to an existing parallelprogram?Low viscosity example: To change how loop iterations are scheduled inOpenMP, just change a single clause#pragma omp parallel for reduction( :sum) private(x) schedule(dyanmic)for (i 1;i num steps; i ){x (i-0.5)*step;sum sum 4.0/(1.0 x*x);}pi step *h sum;37

Cognitive Dimensions: viscosity How easy is it to introduce small changes to an existing parallelprogram?High viscosity example: To change how loop iterations are scheduled inWin32 threads, change multiple lines of codestep 1.0/(double) num steps;for (i start;i num steps; i i NUM THREADS){x (i-0.5)*step;step 1.0/(double) num steps;sum sum 4.0/(1.0 x*x);Initiallize task queue(num Mutex);I get next()global sum sum;x um sum 4.0/(1.0 x*x);}done termination l sum sum;LeaveCriticalSection(&hUpdateMutex);}38

Cognitive Dimensions: Error Proneness Does the notation make it easy for the programmer to make acognitive slip and introduce an error?Shared address space languages (such as OpenMP) are very errorprone make it easy to introduce races by unintended sharing ofdata.#include omp.h static long num steps 100000;double step;void main (){int i;double x, pi, sum 0.0;step 1.0/(double) num steps;#pragma omp parallel for reduction( :sum)for (i 1;i num steps; i ){x (i-0.5)*step;sum sum 4.0/(1.0 x*x);}pi step * sum;}By forgetting a simple“private(x)” clause, I’veintroduced a race condition39

Cognitive Dimensions: Abstraction depth Does the notation give the programmer the ability to build his/herown abstractions?Abstraction rich parallel languages: TBB (thread building blocks); generic programming andstandard template libraries meets parallel programming.Build abstract containers, and introduce parallelism byusing concurrent containersChange how concurrency is executed by changingcontainers.Abstraction poor languages: OpenMP: Programmer has very little support from thenotation for building abstractions. Very little abstraction ispossible.Abstraction barriers: how much abstraction is required just to getstarted.TBB has a very high abstraction barrier.40

Cognitive Dimensions: Hidden dependencies Hidden Dependencies: make a change in one location and effectsseen elsewhere in ways not apparent in the program text.Abstraction rich languages increase problems from hiddendependencies:Change member function in a base class, and an object of aderived class changes its behavior.41

Cognitive dimensions There are around 13 of them. The 10 most important to parallelprogramming are:Viscosity: how hard is it to introduce small changes.Hidden Dependencies: does a change in one part of a program causeother parts to change in ways not overtly apparent in the program text?Error Proneness: How easy is it to make mistakes?Progressive Evaluation: can you check a program while incomplete? Canparallelism be added incrementally?Abstraction Gradient: how much is required? How much abstraction ispossibleCloseness of mapping: how well does the language map onto the problemdomain?Premature commitment: Does the notation constrain the order you dothings? AKA imposed look ahead.Consistency: Similar semantics implied by similar syntax. Can you guessone part of the notation given other parts?Hard mental operations: does the notation lead you to complexcombinations of primitive operationsTerseness: how succinct is the language?For parallel programming, I’ll add two moreHW visibility: is a useful cost model exposed to the programmer?Portability: does the notation assume constraints on the hardware?42

Cognitive dimensions of OpenMP and MPICognitive DimensionOpenMPMPIViscosityLow viscosity: pragma haveminimal semantic weight easyto move aroundHigh viscosity: sends/recvspaired, data structures explicitlydecomposedError PronenessHigh: shared address space hard to detect race conditionsMedium-low: disjoint memorymakes races rare and deadlockeasy to find. Long argumentlists are a problem.HW visibilityPoor: An abstract API that hideshardwareFair to Good: hardware modelimplied but usually visible.Progressive evaluationHigh: Semantically neutralconstructs allow incrementalparallelism.Low: rip prog. apart to exposedistributed data and tasks, andtest once you put things backtogether.PortabilityPoor: requires systems withshared address spacesGreat: assumes minimal systemsupport43

Comparing prog. languages for data parallel m m

Parallel patterns & programming frameworks Parallel programming gurus (1-10% of programmers) Parallel programming frameworks 3 2 Domain Experts End-user, application programs Application patterns & frameworks 11 The hope is for Domain Experts to create parallel code

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

Parallel Programming Design patterns and parallel patterns GrPPI architecture 10/105. GrPPI Introduction Design patterns and parallel patterns Software design There are two ways of constructing a softwa

CISC 879 : Advanced Parallel Programming Parallel Patterns Book Patterns for Parallel Programming. Mattson et al. (2005) Four Design Spaces Finding Concurrency Algorithm Structure Map tasks to processes Supporting Structures

ASTROPHYSICS - PAPER 1 Candidates may attempt not more than six questions. Each question is divided into Part (i) and Part (ii), which may or may not be related. Candidates may attempt either or both Parts. The number of marks for each question is the same, with Part (ii) of each question carrying twice as many marks as Part (i). The approximate number of marks allocated to each component of a .