Lecture 1 Introduction To Parallel Programming

3y ago
44 Views
2 Downloads
989.06 KB
99 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Nora Drum
Transcription

Lecture 1Introduction to Parallel Programming1/9

Course LogisticsStarting point for lIhttp://tinyurl.com/epflparAlso reachable from http://lamp.epfl.ch/teaching2/9

Course LogisticsStarting point for lIhttp://tinyurl.com/epflparAlso reachable from http://lamp.epfl.ch/teachingPlease follow instructions there to sign up for Coursera and use theform to which we link to tell us your coursera email.IFirst part of course uses classical lectures, but we need yourcoursera accounts because grading of assignments is donethrough Coursera platform2/9

Course LogisticsStarting point for lIhttp://tinyurl.com/epflparAlso reachable from http://lamp.epfl.ch/teachingPlease follow instructions there to sign up for Coursera and use theform to which we link to tell us your coursera email.IFirst part of course uses classical lectures, but we need yourcoursera accounts because grading of assignments is donethrough Coursera platformGrading:I30% project assignments during the semesterI30% mid-term after Spring breakI40% final quiz on the last day (Friday) of the semester2/9

Course OutlinePart I: Parallel programming in Scala:IWeek 1: Introduction, basic constructs, analysisIWeek 2: Reductions, operator associativityIWeek 3: Data-parallel programmingIWeek 4: Parallel data structures and algorithmsPart II: Data Processing with Spark:https://spark.apache.orgIWeek 5: Spark and futuresIWeek 6: Spark in-depthIWeek 7: Data mining and data analysisPart III: Reactive Programming (7 weeks):https://www.coursera.org/course/reactive3/9

LecturersPart I: Parallel programming in Scala:IWeek 1: Viktor KuncakIWeek 2: Viktor KuncakIWeek 3: Aleksandar ProkopecIWeek 4: Aleksandar ProkopecPart II: Data Processing with Spark:https://spark.apache.orgIWeek 5: Heather MillerIWeek 6: Heather MillerIWeek 7: Heather MillerPart III: Reactive Programming (7 tin Odersky, Roland Kuhn, and Erik Meijer4/9

Course TextbookLearning Concurrent Programming in Scala,by Aleksandar Prokopec. PACKT Publishing,November 2014ICovers many of the concepts we teach in thefirst part of the course, and nt/learning-concurrent-programming-scala5/9

Parallel ProgrammingSequential programming: at every point in time, one part ofprogram executing:6/9

Parallel ProgrammingSequential programming: at every point in time, one part ofprogram executing:6/9

Parallel ProgrammingSequential programming: at every point in time, one part ofprogram executing:6/9

Parallel ProgrammingSequential programming: at every point in time, one part ofprogram executing:Parallel programming: multiple parts of program execute at once:In both cases, we compute the same functionalityBenefit: in parallel case we wait less until the work is completedIit’s all about performance6/9

Parallel Programming: Infrastructure vs ApplicationsTwo separate problems:7/9

Parallel Programming: Infrastructure vs ApplicationsTwo separate problems:Iinfrastructure: build hardware, libraries, frameworks forparallel programming7/9

Parallel Programming: Infrastructure vs ApplicationsTwo separate problems:Iinfrastructure: build hardware, libraries, frameworks forparallel programmingIapplications: design your software (algorithms, datastructures) in a way that exploits parallelismFor the start, we will look at applicationsIwe learn to use parallel programming primitivies:parallel, task, parallel collections7/9

Parallelism vs Concurrency (My Interpretation, Informal)Concurrent programming is a general concept: structureprogram into concurrent tasks that communicate, both amongeach other and with the external environment.Icommunication can happen in the middle of task executionImust ensure safe accesses to shared resources (use e.g. locks)Irelevant even if we have no parallel hardware; OS and JVMprovide threads even on a single CPU coreIcan be used to ensure responsiveness in interactive apps8/9

Parallelism vs Concurrency (My Interpretation, Informal)Concurrent programming is a general concept: structureprogram into concurrent tasks that communicate, both amongeach other and with the external environment.Icommunication can happen in the middle of task executionImust ensure safe accesses to shared resources (use e.g. locks)Irelevant even if we have no parallel hardware; OS and JVMprovide threads even on a single CPU coreIcan be used to ensure responsiveness in interactive appsParallel programming: speed up computation through parallelhardware resources without solving all difficulties of concurrency(use frameworks to keep things simple). We often use this pattern:8/9

Parallelism vs Concurrency (My Interpretation, Informal)Concurrent programming is a general concept: structureprogram into concurrent tasks that communicate, both amongeach other and with the external environment.Icommunication can happen in the middle of task executionImust ensure safe accesses to shared resources (use e.g. locks)Irelevant even if we have no parallel hardware; OS and JVMprovide threads even on a single CPU coreIcan be used to ensure responsiveness in interactive appsParallel programming: speed up computation through parallelhardware resources without solving all difficulties of concurrency(use frameworks to keep things simple). We often use this pattern:Iget some part of the problemIwork on the problem in isolation, possibly creating other tasksIreturn the result (or store it to a dedicated place)8/9

We Write Programs in ScalaSee the Coursera online courseFunctional Programming Principles in ScalaIhttps://www.coursera.org/course/progfunWe benefit from an ecosystem of frameworks for parallel andconcurrent programming in Scala9/9

We Write Programs in ScalaSee the Coursera online courseFunctional Programming Principles in ScalaIhttps://www.coursera.org/course/progfunWe benefit from an ecosystem of frameworks for parallel andconcurrent programming in ScalaWe follow a functional programming style, try to avoid side effects.In practice, we may use some simple side effects, for efficiency, butimportant property of our programs in any case is:If one parallel tasks writes to a variable (or array entry),no other task may read or write this variable at the sametime.Parallel programming primitives help us enforce this goal withoutrelying on difficult general-purpose concurrent programmingtechniques.9/9

A Minimal Construct for Parallelism1/9

A Small Program: p-normFirst, solve sequentially the following problem, call it sumSegmentGivenIan integer array aIa positive double floating point number pItwo valid indices s t into array acomputet 1Xi sb ai p cAssume we have available the power function, defined e.g.def power(x: Int, p: Double): Int math.exp(p math.log(abs(x))).toIntWrite the sumSegment function in Scala2/9

Solution forsumSegmentdef sumSegment(a: Array[Int], p: Double, s: Int, t: Int): Int {var i s; var sum: Int 0while (i t) {sum sum power(a(i), p)i i 1}sum}Local var-s above are harmless; code above does same as:def sumSegment(a: Array[Int], p: Double, s: Int, t: Int): Int {def sumFrom(i: Int, acc: Int): Int {if (i t) sumFrom(i 1, acc power(a(i), p))else acc}sumFrom(s, 0)}3/9

The p-norm of an Arraydef sumSegment(a: Array[Int], p: Double, s: Int, t: Int): IntUse it to compute p-norm, (p 2 - Euclidean norm). N a.length a p : N 1Xi 0!1/pb ai p cWrite expression for p-norm using sumSegment4/9

The p-norm of an Arraydef sumSegment(a: Array[Int], p: Double, s: Int, t: Int): IntUse it to compute p-norm, (p 2 - Euclidean norm). N a.length a p : N 1Xi 0!1/pb ai p cWrite expression for p-norm using sumSegmentdef pNorm(a: Array[Int], p: Real): Int power(sumSegment(a, p, 0, a.length), 1/p)4/9

The p-norm of an Arraydef sumSegment(a: Array[Int], p: Double, s: Int, t: Int): IntUse it to compute p-norm, (p 2 - Euclidean norm). N a.length a p : !1/pN 1Xi 0b ai p cWrite expression for p-norm using sumSegmentdef pNorm(a: Array[Int], p: Real): Int power(sumSegment(a, p, 0, a.length), 1/p)Next use instead the following alternative formula (split the sum): bN/2c 1 a p : Xi 0b ai p c What is the new expression?N 1Xi bN/2c 1/pb ai p c 4/9

Computing p-norm in Two PartsLet mid N/20 1 . mid - 1{zpart1mid mid 1} .{zpart2N-1}5/9

Computing p-norm in Two PartsLet mid N/20 1 . mid - 1{zpart1mid mid 1} .{zpart2N-1def pNormTwoPart(a: Array[Int], p: Real): Int {val mid a.length / 2val (part1, part2) (sumSegment(a, p, 0, mid),sumSegment(a, p, mid, a.length))power(part1 part2, 1/p)}}Computed a pair with two invocations, stored into two values, thensummed them. Essentially same computation5/9

Computing Two Parts in ParallelThis was two parts sequentially - nothing new here:def pNormTwoPart(a: Array[Int], p: Real): Int {val mid a.length / 2val (part1, part2) (sumSegment(a, p, 0, mid),sumSegment(a, p, mid, a.length))power(part1 part2, 1/p)}Two parts in parallel:6/9

Computing Two Parts in ParallelThis was two parts sequentially - nothing new here:def pNormTwoPart(a: Array[Int], p: Real): Int {val mid a.length / 2val (part1, part2) (sumSegment(a, p, 0, mid),sumSegment(a, p, mid, a.length))power(part1 part2, 1/p)}Two parts in parallel:def pNormTwoPart(a: Array[Int], p: Real): Int {val mid a.length / 2val (part1, part2) parallel(sumSegment(a, p, 0, mid),sumSegment(a, p, mid, a.length))power(part1 part2, 1/p) }6/9

Computing Two Parts in ParallelThis was two parts sequentially - nothing new here:def pNormTwoPart(a: Array[Int], p: Real): Int {val mid a.length / 2val (part1, part2) (sumSegment(a, p, 0, mid),sumSegment(a, p, mid, a.length))power(part1 part2, 1/p)}Two parts in parallel:def pNormTwoPart(a: Array[Int], p: Real): Int {val mid a.length / 2val (part1, part2) parallel(sumSegment(a, p, 0, mid),sumSegment(a, p, mid, a.length))power(part1 part2, 1/p) }The only difference: invoke parallel on the pair of computations!6/9

A Minimal Interface for Parallel Programmingdef parallel[A, B](taskA: A, taskB: B): (A, B) { . }7/9

A Minimal Interface for Parallel Programmingdef parallel[A, B](taskA: A, taskB: B): (A, B) { . }Itype of result is the same as the type of its arguments7/9

A Minimal Interface for Parallel Programmingdef parallel[A, B](taskA: A, taskB: B): (A, B) { . }IItype of result is the same as the type of its argumentsrunning time can be maximum of times (instead of sum)7/9

A Minimal Interface for Parallel Programmingdef parallel[A, B](taskA: A, taskB: B): (A, B) { . }IItype of result is the same as the type of its argumentsrunning time can be maximum of times (instead of sum)Iif one task is short, we need to wait for the second7/9

A Minimal Interface for Parallel Programmingdef parallel[A, B](taskA: A, taskB: B): (A, B) { . }IItype of result is the same as the type of its argumentsrunning time can be maximum of times (instead of sum)IIif one task is short, we need to wait for the secondif there are no available parallel resources, it executes taskssequentially (sum)7/9

A Minimal Interface for Parallel Programmingdef parallel[A, B](taskA: A, taskB: B): (A, B) { . }IItype of result is the same as the type of its argumentsrunning time can be maximum of times (instead of sum)Iif one task is short, we need to wait for the secondIif there are no available parallel resources, it executes taskssequentially (sum)Iinvoking parallel always carries a significant constant overhead(do not do it if not needed)7/9

A Minimal Interface for Parallel Programmingdef parallel[A, B](taskA: A, taskB: B): (A, B) { . }IItype of result is the same as the type of its argumentsrunning time can be maximum of times (instead of sum)Iif one task is short, we need to wait for the secondIif there are no available parallel resources, it executes taskssequentially (sum)Iinvoking parallel always carries a significant constant overhead(do not do it if not needed)arguments taskA, taskB are call by nameI7/9

A Minimal Interface for Parallel Programmingdef parallel[A, B](taskA: A, taskB: B): (A, B) { . }IItype of result is the same as the type of its argumentsrunning time can be maximum of times (instead of sum)Iif one task is short, we need to wait for the secondIif there are no available parallel resources, it executes taskssequentially (sum)Iinvoking parallel always carries a significant constant overhead(do not do it if not needed)arguments taskA, taskB are call by nameIIWhat would happen if they were call by value(without before their types)?7/9

A Minimal Interface for Parallel Programmingdef parallel[A, B](taskA: A, taskB: B): (A, B) { . }IItype of result is the same as the type of its argumentsrunning time can be maximum of times (instead of sum)Iif one task is short, we need to wait for the secondIif there are no available parallel resources, it executes taskssequentially (sum)Iinvoking parallel always carries a significant constant overhead(do not do it if not needed)arguments taskA, taskB are call by nameIIWhat would happen if they were call by value(without before their types)?The implementation of parallel is not important for now (on JVMthey mostly reduce to Java threads, which reduce typically to OSthreads, which can run on different CPU cores).7/9

Can p-norm Use More Parallel Resources When Available?Two parts in parallel:def pNormTwoPart(a: Array[Int], p: Real): Int {val mid a.length / 2val (part1, part2) parallel(sumSegment(a, p, 0, mid),sumSegment(a, p, mid, a.length))power(part1 part2, 1/p) }How to do 4 parts in parallel?8/9

Can p-norm Use More Parallel Resources When Available?Two parts in parallel:def pNormTwoPart(a: Array[Int], p: Real): Int {val mid a.length / 2val (part1, part2) parallel(sumSegment(a, p, 0, mid),sumSegment(a, p, mid, a.length))power(part1 part2, 1/p) }How to do 4 parts in parallel?def pNormFourPart(a: Array[Int], p: Real): Int {val mid1 a.length / 4val mid2 a.length / 2val mid3 3 a.length / 4val ((part1, part2),(part3,part4)) parallel(parallel(sumSegment(a, p, 0, mid1),sumSegment(a, p, mid1, mid2)),parallel(sumSegment(a, p, mid2, mid3),sumSegment(a, p, mid3, a.length)))power(part1 part2 part3 part4, 1/p) }8/9

Divide and Conquer Recursive p-normdef pNormRec(a: Array[Int], p: Real): Int power(segmentRec(a, p, 0, a.length), 1/p)def segmentRec(a: Array[Int], p: Real, s: Int, t: Int) {if (t s threshold)sumSegment(xs, p, s, t) // for small segments: faster to do sequentiallyelse {val mid s (t s)/2val (leftSum, rightSum) parallel(segmentRec(a, p, s, mid),segmentRec(a, p, mid, t))leftSum rightSum}}9/9

Analyzing Parallel ProgramsNotions of Work and Span1/7

How long does our computation take?parallel(parallel(sumSegment(a, p, 0, mid1),sumSegment(a, p, mid1, mid2)),parallel(sumSegment(a, p, mid2, mid3),sumSegment(a, p, mid3, a.length)))Assume that we have enough parallel threads and memorybandwidth and that the “costs”, S(e), of different program partsare as follows:IIS(sumSegment(a, p, s, t)) c1 (t s) c2S(parallel(t1, t2)) cP max(S(t1), S(t2))Note: if S(t1) S(t2) c, this reduces to cP c2/7

How long does our computation take?parallel(parallel(sumSegment(a, p, 0, mid1),sumSegment(a, p, mid1, mid2)),parallel(sumSegment(a, p, mid2, mid3),sumSegment(a, p, mid3, a.length)))Assume that we have enough parallel threads and memorybandwidth and that the “costs”, S(e), of different program partsare as follows:IIS(sumSegment(a, p, s, t)) c1 (t s) c2S(parallel(t1, t2)) cP max(S(t1), S(t2))Note: if S(t1) S(t2) c, this reduces to cP cFor each sumSegment above,S(sumSegment(a, p, s, t)) c1 N/4 c2(N a.length)2/7

How long does our computation take?parallel(parallel(sumSegment(a, p, 0, mid1),sumSegment(a, p, mid1, mid2)),parallel(sumSegment(a, p, mid2, mid3),sumSegment(a, p, mid3, a.length)))Assume that we have enough parallel threads and memorybandwidth and that the “costs”, S(e), of different program partsare as follows:IIS(sumSegment(a, p, s, t)) c1 (t s) c2S(parallel(t1, t2)) cP max(S(t1), S(t2))Note: if S(t1) S(t2) c, this reduces to cP cFor each sumSegment above,S(sumSegment(a, p, s, t)) c1 N/4 c2 (N a.length)S(parallel(sumSegment, sumSegment)) 2/7

How long does our computation take?parallel(parallel(sumSegment(a, p, 0, mid1),sumSegment(a, p, mid1, mid2)),parallel(sumSegment(a, p, mid2, mid3),sumSegment(a, p, mid3, a.length)))Assume that we have enough parallel threads and memorybandwidth and that the “costs”, S(e), of different program partsare as follows:IIS(sumSegment(a, p, s, t)) c1 (t s) c2S(parallel(t1, t2)) cP max(S(t1), S(t2))Note: if S(t1) S(t2) c, this reduces to cP cFor each sumSegment above,S(sumSegment(a, p, s, t)) c1 N/4 c2 (N a.length)S(parallel(sumSegment, sumSegment)) cP c1 N/4 c22/7

How long does our computation take?parallel(parallel(sumSegment(a, p, 0, mid1),sumSegment(a, p, mid1, mid2)),parallel(sumSegment(a, p, mid2, mid3),sumSegment(a, p, mid3, a.length)))Assume that we have enough parallel threads and memorybandwidth and that the “costs”, S(e), of different program partsare as follows:IIS(sumSegment(a, p, s, t)) c1 (t s) c2S(parallel(t1, t2)) cP max(S(t1), S(t2))Note: if S(t1) S(t2) c, this reduces to cP cFor each sumSegment above,S(sumSegment(a, p, s, t)) c1 N/4 c2 (N a.length)S(parallel(sumSegment, sumSegment)) cP c1 N/4 c2S(parallel(parallel, parallel)) 2/7

How long does our computation take?parallel(parallel(sumSegment(a, p, 0, mid1),sumSegment(a, p, mid1, mid2)),parallel(sumSegment(a, p, mid2, mid3),sumSegment(a, p, mid3, a.length)))Assume that we have enough parallel threads and memorybandwidth and that the “costs”, S(e), of different program partsare as follows:IIS(sumSegment(a, p, s, t)) c1 (t s) c2S(parallel(t1, t2)) cP max(S(t1), S(t2))Note: if S(t1) S(t2) c, this reduces to cP cFor each sumSegment above,S(sumSegment(a, p, s, t)) c1 N/4 c2 (N a.length)S(parallel(sumSegment, sumSegment)) cP c1 N/4 c2S(parallel(parallel, parallel)) cP cP c1 N/4 c22/7

How long does our computation take?parallel(parallel(sumSegment(a, p, 0, mid1),sumSegment(a, p, mid1, mid2)),parallel(sumSegment(a, p, mid2, mid3),sumSegment(a, p, mid3, a.length)))Assume that we have enough parallel threads and memorybandwidth and that the “costs”, S(e), of different program partsare as follows:IIS(sumSegment(a, p, s, t)) c1 (t s) c2S(parallel(t1, t2)) cP max(S(t1), S(t2))Note: if S(t1) S(t2) c, this reduces to cP cFor each sumSegment above,S(sumSegment(a, p, s, t)) c1 N/4 c2 (N a.length)S(parallel(sumSegment, sumSegment)) cP c1 N/4 c2S(parallel(parallel, parallel)) cP cP c1 N/4 c2For large N, this is a win compared to sequential: c1 N c22/7

Estimating Cost of Our ProgramsYou have previously learned how to concisely characterize behaviorof sequential programs using the number of operations theyperform as a function of arguments.Iinserting into an integer into a sorted linear list takes timeO(n), for list storing n integersIinserting into an integer into a balanced binary tree of nintegers takes time O(log n), for tree storing n integers3/7

Estimating Cost of Our ProgramsYou have previously learned how to concisely characterize behaviorof sequential programs using the number of operations theyperform as a function of arguments.Iinserting into an integer into a sorted linear list takes timeO(n), for list storing n integersIinserting into an integer into a balanced binary tree of nintegers takes time O(log n)

learning-concurrent-programming-scala 5/9. Parallel Programming Sequential programming: at every point in time, one part of program executing: Parallel programming: multiple parts of program execute at once: In both cases, we compute the same functionality

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .

Introduction to Quantum Field Theory for Mathematicians Lecture notes for Math 273, Stanford, Fall 2018 Sourav Chatterjee (Based on a forthcoming textbook by Michel Talagrand) Contents Lecture 1. Introduction 1 Lecture 2. The postulates of quantum mechanics 5 Lecture 3. Position and momentum operators 9 Lecture 4. Time evolution 13 Lecture 5. Many particle states 19 Lecture 6. Bosonic Fock .

MEDICAL RENAL PHYSIOLOGY (2 credit hours) Lecture 1: Introduction to Renal Physiology Lecture 2: General Functions of the Kidney, Renal Anatomy Lecture 3: Clearance I Lecture 4: Clearance II Problem Set 1: Clearance Lecture 5: Renal Hemodynamics I Lecture 6: Renal Hemodynamics II Lecture 7: Renal Hemodynam

Series-Parallel Circuits If we combined a series circuit with a parallel circuit we produce a Series-Parallel circuit. R1 and R2 are in parallel and R3 is in series with R1 ǁ R2. The double lines between R1 and R2 is a symbol for parallel. We need to calculate R1 ǁ R2 first before adding R3.