COMP 422, Lecture 4: Decomposition Techniques For . - Rice University

1y ago
5 Views
2 Downloads
2.78 MB
41 Pages
Last View : 11d ago
Last Download : 3m ago
Upload by : Kamden Hassan
Transcription

COMP 422, Lecture 4:Decomposition Techniques forParallel Algorithms(Sections 3.1 & 3.2 of textbook)Vivek SarkarDepartment of Computer ScienceRice Universityvsarkar@rice.eduCOMP 422 Lecture 4 17 January 2008

Recap of Lecture 3 Interconnection Networks—Static (direct) vs. Dynamic (indirect) networks—Metrics: diameter, bisection width, arc connectivity, cost—Dynamic networks: Crossbar, Omega, (Fat) Tree—Static networks: Mesh, Torus, Hypercube Cache Coherence—Invalidate vs. Update protocols—Bus Snooping vs. Directory-based—False sharing Communication Costs—Store-and-forward vs. Cut-through2COMP 422, Spring 2008 (V.Sarkar)

Lecture 3 Review Question Consider the following parallel computation— CPU 0: Update even-numbered rowsfor ( j 0 ; j N ; j 2 )for ( k 0 ; k N ; k )A[j,k] f(j,k);— CPU 1: Update odd-numbered rowsfor ( j 1 ; j N ; j 2 )for ( k 0 ; k N ; k )A[j,k] g(j,k); Assume that a cache line size of 32B, an invalidate protocol,an element size of 8B for A, and that A’s first element isaligned on a cache line boundary. What is the maximumnumber of invalidate messages that will be sent between CPU0 and 1 as a function of N?3COMP 422, Spring 2008 (V.Sarkar)

Acknowledgments for today’s lecture Cilk lecture by Charles Leiserson and Bradley Kuszmaul(Lecture 1, Scheduling Theory)— http://supertech.csail.mit.edu/cilk/lecture-1.pdf Slides accompanying course textbook—http://www-users.cs.umn.edu/ karypis/parbook/4COMP 422, Spring 2008 (V.Sarkar)

Outline of Today’s Lecture Tasks, Dependence Graphs, Scheduling TheoryData and Computation Decompositions5COMP 422, Spring 2008 (V.Sarkar)

Tasks and Dependency Graphs The first step in developing a parallel algorithm is todecompose the problem into tasks that are candidates forparallel execution Task indivisible sequential unit of computationA decomposition can be illustrated in the form of a directedgraph with nodes corresponding to tasks and edgesindicating that the result of one task is required forprocessing the next. Such a graph is called a taskdependency graph.6COMP 422, Spring 2008 (V.Sarkar)

Example: Database Query ProcessingConsider the execution of the query:MODEL CIVIC'' AND YEAR 2001 AND(COLOR GREEN'' OR COLOR WHITE)on the following database:7COMP 422, Spring 2008 (V.Sarkar)

Example: Database Query ProcessingThe execution of the query can be divided into subtasks in various ways.Each task can be thought of as generating an intermediate table of entriesthat satisfy a particular clause.8COMP 422, Spring 2008 (V.Sarkar)

An Alternate Task Decomposition andDependency GraphNote that the same problem can be decomposed intosubtasks in other ways as well.9COMP 422, Spring 2008 (V.Sarkar)

Critical Path Length A directed path in the task dependency graph represents asequence of tasks that must be processed one after the other. The longest such path determines the shortest time in whichthe program can be executed in parallel. The length of the longest path in a task dependency graph iscalled the critical path length. The ratio of the total amount of work to the critical path lengthis the average degree of concurrency.10COMP 422, Spring 2008 (V.Sarkar)

Examples of Critical Path LengthConsider the task dependency graphs of the twodatabase query decompositions:Total work 63Crit. path length 27Avg. concurrency 2.311Total work 64Crit. path length 34Avg. concurrency 1.9COMP 422, Spring 2008 (V.Sarkar)

Algorithmic Complexity Measures(Ignoring Communication Overhead)12COMP 422, Spring 2008 (V.Sarkar)

Upper Bounds on TP** Greedy scheduler no unenforced idleness13COMP 422, Spring 2008 (V.Sarkar)

Performance Bound for Greedy AlgorithmNOTE: performance bound approaches 1 (optimal) when one of the max terms dominates the other14COMP 422, Spring 2008 (V.Sarkar)

Case Study: Cilk Chess Programs 15 Socrates placed 3rd in the 1994 International Computer ChessChampionship running on NCSA’s 512-node Connection MachineCM5. Socrates 2.0 took 2nd place in the 1995 World Computer ChessChampionship running on Sandia National Labs’ 1824-node IntelParagon.Cilkchess placed 1st in the 1996 Dutch Open running on a 12processor Sun Enterprise 5000. It placed 2nd in 1997 and 1998running on Boston University’s 64-processor SGI Origin 2000.Cilkchess tied for 3rd in the 1999 WCCC running on NASA’s 256-nodeSGI Origin 2000.COMP 422, Spring 2008 (V.Sarkar)

Socrates Normalized Speedup1T1/TPT1/T TP T /P0.1TP T1measured speedup0.010.0116TP T1/P T 0.1PT1/T 1COMP 422, Spring 2008 (V.Sarkar)

Developing Socrates For the competition, Socrates was to run on a 512processor Connection Machine Model CM5 supercomputer atthe University of Illinois. The developers had easy access to a similar 32-processorCM5 at MIT. One of the developers proposed a change to the program thatproduced a speedup of over 20% on the MIT machine. After a back-of-the-envelope calculation, the proposed“improvement” was rejected!17COMP 422, Spring 2008 (V.Sarkar)

Socrates Speedup ParadoxOriginal programProposed programT′32 40 secondsT32 65 secondsTP T1/P T T1 2048 secondsT 1 secondT′1 1024 secondsT′ 8 secondsT32 2048/32 1 65 secondsT′32 1024/32 8 40 secondsT512 2048/512 1 5 secondsT′512 1024/512 8 10 seconds18COMP 422, Spring 2008 (V.Sarkar)

Outline of Today’s Lecture Tasks, Dependence Graphs, Scheduling TheoryData and Computation Decompositions19COMP 422, Spring 2008 (V.Sarkar)

Decomposition Techniques: Patterns forParallel AlgorithmsSo how does one decompose a task into various subtasks?While there is no single recipe that works for all problems, wepresent a set of commonly used techniques that apply tobroad classes of problems. These include: recursive decomposition data decomposition exploratory decomposition speculative decomposition20COMP 422, Spring 2008 (V.Sarkar)

Recursive Decomposition Generally suited to problems that are solved using the divideand-conquer strategy. A given problem is first decomposed into a set of subproblems. These sub-problems are recursively decomposed further untila desired granularity is reached.21COMP 422, Spring 2008 (V.Sarkar)

Recursive Decomposition: ExampleA classic example of a divide-and-conquer algorithm onwhich we can apply recursive decomposition isQuicksort.In this example, a task represents the work of partitioning a(sub)array. Note that each subarray represents an independentsubtask. This can be repeated recursively.22COMP 422, Spring 2008 (V.Sarkar)

Data Decomposition Identify the data on which computations are performed. The data partitioning induces one or more decompositions ofthe computation into tasks e.g., by using the owner computesrulePartition data into sub-unitsData can be input, output or intermediate for differentcomputations23COMP 422, Spring 2008 (V.Sarkar)

Output Data Decomposition: ExampleConsider the problem of multiplying two n x nmatrices A and B to yield matrix C. The output matrixC can be partitioned into four submatrices as follows:24COMP 422, Spring 2008 (V.Sarkar)

Output Data Decomposition: ExampleA partitioning of output data does not result in a uniquedecomposition into tasks. Here are two possible taskdecompositions for the output data decomposition from theprevious slide:25COMP 422, Spring 2008 (V.Sarkar)

Output Data Decomposition: ExampleConsider the problem of counting the instances of given itemsets ina database of transactions. In this case, the output (itemsetfrequencies) can be partitioned across tasks.26COMP 422, Spring 2008 (V.Sarkar)

Input Data Decomposition Generally applicable if each output can be naturally computedas a function of the input. In many cases, this is the only natural decomposition becausethe output is not clearly known a-priori (e.g., the problem offinding the minimum in a list, sorting a given list, etc.). A task is associated with each input data partition. The taskperforms as much of the computation with its part of the data.Subsequent processing combines these partial results.27COMP 422, Spring 2008 (V.Sarkar)

Input Data Decomposition: ExampleIn the database counting example, the input (i.e., thetransaction set) can be partitioned. This induces a taskdecomposition in which each task generates partialcounts for all itemsets. These are combinedsubsequently for aggregate counts.28COMP 422, Spring 2008 (V.Sarkar)

Output vs. Input Data DecompositionsFrom the previous example, the following observations can bemade: If only the output is decomposed and the database oftransactions is replicated across the processes, each task canbe independently accomplished with no communication. If the input database is also partitioned (for scalability), itinduces a computation mapping in which each task computespartial counts, and additional tasks are used to aggregate thecounts.29COMP 422, Spring 2008 (V.Sarkar)

Combining Input and Output DataDecompositionsOften input and output data decomposition can be combined fora higher degree of concurrency. For the itemset countingexample, the transaction set (input) and itemset counts (output)can both be decomposed as follows:30COMP 422, Spring 2008 (V.Sarkar)

Intermediate Data Decomposition Computation can often be viewed as a sequence oftransformation from the input to the output data. In these cases, it is sometimes beneficial to use one of theintermediate stages as a basis for decomposition.31COMP 422, Spring 2008 (V.Sarkar)

Intermediate Data Partitioning: ExampleConsider the intermediate submatrices that can becreated in dense matrix multiplication.32COMP 422, Spring 2008 (V.Sarkar)

Intermediate Data Partitioning: Example33COMP 422, Spring 2008 (V.Sarkar)

From Data Decompositions to Task Mappings:Owner Computes Rule The Owner Computes Rule generally states that the processassigned a particular data item is responsible for allcomputation associated with it. In the case of input data decomposition, the owner computesrule implies that all computations that use the input data areperformed by the process. In the case of output data decomposition, the ownercomputes rule implies that the output is computed by theprocess to which the output data is assigned. Likewise for intermediate data decompositions34COMP 422, Spring 2008 (V.Sarkar)

Exploratory Decomposition In many cases, the decomposition of the problem goes handin-hand with its execution. These problems typically involve the exploration (search) of astate space of solutions. Problems in this class include a variety of discreteoptimization problems (0/1 integer programming, QAP, etc.),theorem proving, game playing, etc.35COMP 422, Spring 2008 (V.Sarkar)

Exploratory Decomposition: ExampleA simple application of exploratory decomposition is inthe solution to a 15 puzzle (a tile puzzle). We show asequence of three moves that transform a given initialstate (a) to desired final state (d).Of course, the problem of computing the solution, ingeneral, is much more difficult than in this simpleexample.36COMP 422, Spring 2008 (V.Sarkar)

Exploratory Decomposition: ExampleThe state spacecan be exploredby generatingvarious successorstates of thecurrent state andviewing them asindependenttasks.37COMP 422, Spring 2008 (V.Sarkar)

Exploratory Decomposition:Anomalous Speedups In many instances of parallel exploratory decomposition,unfinished tasks can be terminated when the firstsolution is found This can result in “anomalous” super- or sub-linearspeedups relative to serial execution.38COMP 422, Spring 2008 (V.Sarkar)

Speculative Decomposition In some applications, dependencies between tasks are notknown a-priori. For such applications, it is impossible to identify independenttasks. There are generally two approaches to dealing with suchapplications: conservative approaches, which identifyindependent tasks only when they are guaranteed to not havedependencies, and, optimistic approaches, which scheduletasks even when they may potentially be erroneous. Conservative approaches may yield little concurrency andoptimistic approaches may require roll-back mechanism inthe case of an error. Parallel Discrete Event Simulation (Example 3.8) is amotivating example for optimistic approaches39COMP 422, Spring 2008 (V.Sarkar)

Hybrid DecompositionsOften, a mix of decomposition techniques isnecessary for decomposing a problem e.g.,40COMP 422, Spring 2008 (V.Sarkar)

Summary of Today’s Lecture Tasks, Dependence Graphs, Scheduling TheoryData and Computation DecompositionsReading List for Next Lecture (Jan 22nd) Sections 7.1, 7.2, 7.3, 7.4 of textbookPages 4 - 17 (Sections 2.1 - 2.5) of Cilk Reference l-5.4.6.pdf41COMP 422, Spring 2008 (V.Sarkar)

6 COMP 422, Spring 2008 (V.Sarkar) Tasks and Dependency Graphs The first step in developing a parallel algorithm is to decompose the problem into tasks that are candidates for parallel execution Task indivisible sequential unit of computation A decomposition can be illustrated in the form of a directed graph with nodes corresponding to tasks and edges

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

2.1. Singular Value Decomposition The decomposition of singular values plays a crucial role in various fields of numerical linear algebra. Due to the properties and efficiency of this decomposition, many modern algorithms and methods are based on singular value decomposition. Singular Value Decomposition of an

DB9 Female to DB9 Female RS-422 207M SMPTE Cable (Part# CA190) The CA190 connects any Sealevel DB9 RS-422 device to a Sony (or compatible) 207M (SMPTE) 9 Pin connector. DB9 Female (RS-422) to DB9 Female (Opto 22 Optomux) Converter (Part# DB103) The DB103 is designed to convert a Sealevel DB9 male RS-422 connector to a DB9 female pinout compatible

Song of St. Patrick – Haugen – G Comp II # 685 Taste and See – Haugen – G Comp II # 34 Taste and See – Moore – G Comp II # 827 The Love of The Lord – Joncas – G Comp II # 680 The Servant Song – Richard Gillard– – G Comp II # 661 We Have Been Told – Haas – G Comp II # 69

2016-17 HERI Faculty Survey Institutional Profile Report Full-time Undergraduate Faculty Total Men Women CIRP Construct Note: Significance * p .05, ** p .01, *** p .001 Page 1 of 76 1A. American University of Beirut Your Inst Comp 1 Comp 2 Your Inst Comp 1 Comp 2 Your Inst Comp 1 Comp 2

A mission decomposition diagram is built to represent a hierarchical structure among the mission tasks. Different decomposition techniques (e.g., functional decomposition, goal decomposition, terrain decomposition) generally re

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

Introduction to Literature, Criticism and Theory provides a completely fresh and original introduction to literary studies. Bennett and Royle approach their subject by way of literary works themselves (a poem by Emily Dickinson, a passage from Shakespeare, a novel by Salman Rushdie), rather than by way of abstract theoretical ideas and isms. In 32 short chapters they focus on a range of .