Concurrent Programming (RIO) 8.12.2009 Moore's Law - Helsinki

1y ago
4 Views
2 Downloads
961.21 KB
8 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Milena Petrie
Transcription

Concurrent Programming (RIO)8.12.2009Lesson 12Moore’s Law “The number of transistors that can beinexpensively placed on an integrated circuit isincreasing exponentially, doubling approximatelyevery two years” (orig. 18 months)Current ResearchCourse Review[Pat 08] [A-TKS 07] [KCHK 06]– Gordon E. Moore, 1965– Memory size in chips will double every two years– Increase in transistor count is also a rough measure ofcomputer processing performance, resulting toprocessing speed doubling every two yearsMoore’s LawMultiprocessor Challenge in 1980’sMulticore Challenge NowCourse Review “Heat barrier” limit for processing speed reached2004– Luke Collins, IEE Review, Jan 93720.pdf8.12.2009Copyright Teemu Kerola 200918.12.2009Copyright Teemu Kerola 20092Problem Moore’s Law willnot give us fasterprocessors (anymore)– But it gives us nowmore processors onone chip Multicore CPU Chip-levelmultiprocessor(CMP)(hyperthreads)Herb Sutter, “A Fundamental TurnToward Concurrency in SW”,Dr. Dobb’s Journal, 2005.Borkar, Dubey, Kahn, et al. “Platform 2015.” Intel White Paper, /borkar ers/sutter 2005.pdf8.12.2009Copyright Teemu Kerola 200938.12.2009 Multi-core architectures: an inflection point inmainstream SW development Writing parallel SW is hard Number of cores per chip doubles every twoyears, while clock speed decreases– Need to utilize systems with hundreds or thousands ofcores– Need to handle systems with millions (billions?) ofconcurrent threads– Need to emphasize scalability – not best performancefor fixed number of cores.– Need to be able to easily replace inter-chip parallelismwith intra-chip parallelismMarc Snir4Multi-core: An Inflection Pointalso in SW DevelopmentMoore’s Law Reinterpreted8.12.2009Copyright Teemu Kerola 2009– Mainstream developers (currently) not used to thinkingin parallel– Mainstream languages (currently) force the use oflow-level concurrency features– Must have with new systems? Navigating through this inflection point requiresbetter concurrency /papers/snir 2008.pdfCopyright Teemu Kerola 2009Lecture 12: Current Research,Summary58.12.2009Copyright Teemu Kerola 200961

Concurrent Programming (RIO)8.12.2009MCC 1983 - (2000)The Multicore Challenge re: Japanese 5th Generation Project Heat barrier dead-end for chip speed So, try now multicore chips – Ministry of Internat. Trade and Industry (MITI)– Japan, 1982, 10 years, 850M– New type of computer to run Artificial Intelligence applications– Shared memory multiprocessor Similar multiprocessor HW tried before (not at chip level) US national level response to Japanese MITI Project– Convex, Floating Point Systems, INMOS, Thinking Machines,nCUBE, Kendall Square Research, MasPar, Encore, Sequent, – Massively Parallel Processing (MPP), how to build and use them Microelectronics and Computer Technology Corporation(MCC), Austin, Texas– All failed – Patterson’s “Dead Parallel Computer Society” John Hennessy:– Some 450 researchers in 1986– 12 companies in 1983: Control Data, DEC, Harris, RCA,Sperry-Univac, NCR, Honeywell, National Semiconductor,Advanced Micro Devices, Motorola, – More companies later on: Microsoft, Boeing, GE, Lockheed,Martin Marietta, Westinghouse, 3M, Rockwell, Kodak,Nokia 1997, DoD, – Budget 50-100M per year, for 20 years?– “ when we start talking about parallelism and ease of use of trulyparallel computers, we’re talking about a problem that’s as hard as anythat computer science has faced. I would be panicked if I were inhttp://www.acmqueue.com/modules.php?name Content&pa showpage&pid 445&page 4industry.” Challenge: How to use multicore effectively Answer: Industry/government funded research?– Scale: Manhattan core-challenge/8.12.2009Copyright Teemu Kerola 200978.12.2009MCC 1983 - (2000)B.R. Inman Shared memory multiprocessors CEO Admiral Bobby Ray Inman (ex NSA #1, ex CIA #2)– With cache coherence– Memory bus can handle some 20 processors– Well connected, new law to avoid antitrust problems 1984 More processors have new problems– Software technology, semiconductor packaging, VLSI computeraided design, parallel processing, database management, humaninterfaces and artificial intelligence/knowledge-based systems– Interconnect networks All-to-all, butterfly, cube connected cycles, hyper-cube, Parallel Processing ProgramS. Lundstrom8MPP Challenge in early 1980’s Main Research Areas (Programs)Peter ticles/MM/dnm1.htmlCopyright Teemu Kerola 2009– Multi-level memory, varying memory access times–––––“What type of MPP computer to build and how to use it?”VP Peter Patton 1983-86, VP Stephen Lundstrom 1986-87Some 30 researchers for 3 years, lots of resourcesLots of published (and proprietary) research papersReviews every 3 months, for knowledge transformationfrom MCC to shareholding companies– Could not formulate goal properly, starts to disintegrate 1987 Local, shared, node, network, Operating systems– Shared Memory OS, Distributed OS?– Compiler technology – ouch! Applications tailored for just one cs and Computer Technology Corporation8.12.2009Copyright Teemu Kerola 200998.12.2009Copyright Teemu Kerola 200910The Multicore Challenge Challenge The Multicore Challenge– How to use multicore/shared-memory-multiprocessor effectively?– Answer: Industry/government funded research for many years (?) The Challenge Challenge– Is the Multicore Challenge the right challenge to take? It might fail (again)– What other challenges would better suitour need for ever faster computers? The Real Challenge (?):– How to get computational speed to double every two years orevery 18 months for all computations ? (just like before ) Currently there is no answer What if there is no answer ever? Is that ok?– Get computational speed to double every two years for somecomputations? Is this enough? This is doable but is it enough? Which “some”?8.12.2009Copyright Teemu Kerola 2009Lecture 12: Current Research,Summary118.12.2009Copyright Teemu Kerola 2009122

Concurrent Programming (RIO)8.12.2009US Multicore Challenge ProjectsNeeds for Multicore Challenge TargetHow to synchronize processes in multiprocessor systems?How to make it easy to obtain parallel performanceScalable solutions (to many and even more processors)Avoid deadlocksData consistency with error situations (abort support)Current programming languages force low level solutionsAmdahl’s Law: Proportion of serial code will give upperlimit on speedup– 1000-core (or more) processors– Parallel algorithms, development environments, andruntime systems that scale to 1000s of hardware threads– Need new OS & HW architectures What exactly is needed? That is the problem!– What type of MPP computer to build and how to use it? Tools– FPGA-based simulators to test out work– With 5% serial code, max speedup is 20 (even for 100 processors)1Speedup --------------(1-P) P/N Field-Programmable Gate Array Reprogram HW to test new HW ideaswhere P parallel code proportionN nr of processors Funding problem – another challenge?– Defence Advanced Research Projects Agency(DARPA) funding receding in 2000-2008Serial code proportion (1-P) shouldbe almost zero. How?8.12.2009 Would have been needed big-timeCopyright Teemu Kerola 2009138.12.2009Copyright Teemu Kerola 200914StanfordUS Projects 2008 Stanford University– Pervasive Parallelism LabJohn Hennessy John Hennessy (Jan 2007): University of California at Berkeley– “When we start talking about parallelism and ease ofuse of truly parallel computers, we’re talking about aproblem that’s as hard as any that computer sciencehas faced.”– “I would be panicked if I were in industry.”– Parallel Computing Lab University of Illinois at Urbana-Champaign,– The Universal Parallel Computing Research Center The Multicore Association– Many companies share resultshttp://www.acmqueue.com/modules.php?name Content&pa showpage&pid 445&page 38.12.2009Copyright Teemu Kerola 2009158.12.2009Stanford16Stanford 2008, 6M for 3 years, 9 faculty 30 grad students Nvidia, Sun Microsystems, AdvancedMicro Devices, Hewlett-Packard, IBM, Intel William Dally (chairman, Stanford CS dept) Goal: the Parallel ComputingPlatform for 2012John Hennessy– Make parallel programming practical for themassesapplications– Algorithms, programming models, runtimesystemsprogramming and sw systems– architectures for scalable parallelismWilliam Dally– Stream computing, transactional memory Enable use of parallelism beyondtraditional scientific computingJohn Hennessy New ideas for high-level concurrency abstractions New ideas for hardware support for new paradigmsarchitecture 10,000s of HW threads Parallel computing a core componentof CS education– Build real, full system ervasive Parallelism /PPL.pdf8.12.2009Copyright Teemu Kerola 2009Copyright Teemu Kerola 2009Lecture 12: Current Research,Summary178.12.2009Copyright Teemu Kerola 2009183

Concurrent Programming (RIO)8.12.2009BerkeleyUniversal Parallel ComputerResearch Centers (UPCRC’s) David A. Patterson (Aug 2008): Funding: Intel & Microsoft– “Knowing what we know today, if wecould go back in time we would haveDavid Pattersonlaunched a Manhattan Project to bring together the best minds inapplications, software architecture, programming languages andcompilers, libraries, testing and correctness, operating systems,hardware architecture, and chip design to tackle this parallelchallenge.”– “We need the US Government to return to its historic role to bringthe many more minds on these important problem. To make realprogress, we would need a long-term, multi-hundred million dollarper year program.“Dan Reed– Dan Reed (Microsoft, Extreme Comp Grp) Univ of California at Berkeley– Parallel Computing Lab– David A. PattersonDavid Patterson Univ of Illinois at Urbana-Champaign– The Universal ParallelComputing Research Center– Marc Snir & Wen-mei Hwuhttp://view.eecs.berkeley.edu/wiki/Main e-challenge/Marc Snir8.12.2009Wen-mei HwuCopyright Teemu Kerola 2009198.12.2009Copyright Teemu Kerola 2009BerkeleyBerkeley ApplicationsParallel Computing Lab David A. Patterson– Need new 21st century n Page Medicine, image, music, speech, David A. Patterson, 10 faculty 40 grad students 17M for 5 yearsBerkeley Emulation Engine v3 (BEE3)7 basic programming tasks atthe heart of most parallel programs?David Patterson Computational bottleneck benchmarks Parallel SW development (55% faculty)–––––– Very thin tterson-intro-20070604.pptCopyright Teemu Kerola 2009218.12.2009Center (UPCRC)David Patterson– Research Accelerator for Multiple Processors– 4 FGPAs/board, 21 boards (84 FGPA’a)– “It is possible that parallel programming is inherentlyhard, in which case, indeed the sky is falling.”– “An alternative view is that, intrinsically, parallelprogramming is not significantly harder than sequentialprogramming; rather, it is hampered by the lack ofadequate languages, tools and architectures.”FieldProgrammableGate Array– RAMPants: 10 faculty Berkeley, CMU, MIT, Stanford, Texas,Washington Create HW&SW for Manycore communityLecture 12: Current Research,SummaryMarc Snir Marc Snir (Nov 2008): Other architectures by FPGA redesignCopyright Teemu Kerola 200922The Universal Parallel Computing Research HW: build own academic Manycore8.12.2009Copyright Teemu Kerola 2009IllinoisBerkeley– 12 32-bit RISC cores / FPGAImplement 13 dwarfs as libraries or frameworksEfficiency layer by experts (mutex, deadlock, etc)Productivity layer by “normal” programmersCreate Composition and Coordination (C&C) language21st century code generation OS and Architecture Compositional verification and testing 1008 Core RAMP BlueDavid Patterson– 7 dwarfs, use them to analyse hw/sw designs– Health Care, Speech Recognition, New Music andAudio Technologies, Content-based Image Retrieval,Parallel Browser, Puppet-driven games, 238.12.2009Copyright Teemu Kerola 2009244

Concurrent Programming (RIO)Illinois Marc Snir & Wen-mei Hwu 17M for 5 years Parallel programming can be(should be?) a child’s play Simplicity is / Moore’s Law ReinterpretedMarc Snir– Number of cores per chip doubles every two years,while clock speed decreases– Need to be able to easilyreplace inter-chipparallelism withintra-chip parallelismWen-mei Hwu Memory Wall– Simpler languages more complex architectures– a feast for compiler developers– Most area and energy inchip budget is spent onstoring and moving bits What hooks can HW provide to facilitateprogramming? Reliability and Variance– MTTF (mean time to fail)per chip does not decrease – hardware is used to mask errors– Programmers do not have to handle faults– Sync primitives, debug/performance rs/snir 2008.pdf8.12.2009Copyright Teemu Kerola 2009258.12.2009Illinois– capture the basic elements of communication and synchronizationfor closely distributed embedded systems. Multicore Programming Practices (MPP) wg Serial semantics, parallel performance model– develop a multicore software programming guide for the industrythat will aid in improving consistency and understanding ofmulticore programming issuesWen-mei Hwu Multicore Resource Management API (MRAPI) wg Every computer scientist educated to “think parallel”– specify essential application-level resource managementcapabilities needed by multicore applications– Make parallel programming synonymous with programming What hooks can HW provide to facilitate programming? Hypervisor wg– Sync primitives, debug/performance support– support hypervisor (multiple OS’s on same host) portability andmulticore capabilities. There is no silver bullet – no one technology solutionCopyright Teemu Kerola 200926The Multicore Association– Simpler languages more complex architectures a feast for compiler developers8.12.2009Copyright Teemu Kerola 2009 Intel, NSN, Texas Instruments, Plurality, Wind River,PolyCore, Samsung, http://www.multicore-association.org/home.php Multicore Communications API (MCAPI) work. gr (wg) New emphasis on deterministic (repeatable) parallelcomputation models – focus on producer-consumer orbarrier synchronization, not on nondeterministic mutualexclusion– Parallel algorithms are designed by programmers,not inferred by compilersMarc Snir27Tesla unified graphics and computing architecture8.12.2009Copyright Teemu Kerola elltips1/Intel Teraflops Research Chip waferCool 80-core chip: block matrix operations1 TFLOPS at 1.0V at 110 CThe Cell processorFast Roadrunner system 12 960 Cells, 1 PFLOPS, 2.3 MW 6 948 dual-core Opteron I/O total 116 640 cores 90 km fiber-optic cable, 500m2http://www.acmqueue.org/modules.php?name Content&pa showpage&pid 532&page 68.12.2009Copyright Teemu Kerola 2009Lecture 12: Current Research,Summary298.12.2009Copyright Teemu Kerola 2009305

Concurrent Programming (RIO)8.12.2009Likely Problems with MulticoreChallengeMore Likely Problems with MulticoreChallenge Computation communication memory use Symmetric Multiprocessor (SMP) scales up to onlysomewhere (remember the 1980’s) Must have interconnect networks– Optimize on overall time/space? – Between cores in chip, between chips, boards, and nodes How to distribute memory and access it– Local core memory in cache?– Memory hierarchy reworked? Disks disappear? How to implement I/O and database– How to distribute disks or other permanent stores Avoid communication & I/O bottlenecks– Masters & slaves, control hierarchies, .– Applications run well only on one system?E.g., Systems based on STI cell vs. Intel 80-core chip?– Remember Amdahl’s Law– Minimize communication and serialization8.12.2009Copyright Teemu Kerola 2009Test new processor architectures with FPGA’sOperating system for new architectures?New languages for new architetures?Good compilers for new architectures?May end up with lots of different architectures318.12.2009Even More Likely Problems withMulticore ChallengeCopyright Teemu Kerola 200932Current Research Summary Moore’s Law and what it means now What type of MPP computer to build and how to use it? Scalable processors architecture needs to be at least 3D?– We have jumped into the multi-core train – is that OK?– Is 3D enough?– Stacked chips 2008 Some projects in Universities & Industry– Too small scale? 10M’s but not 100M’s or 1000M’s No silver bullet? Real stress to get results fast– Get computational speed to double every two years for somecomputations? Is this enough?– Single-core dead-end?– New architectures designedand built now– Important applications builton new architectures Make parallel programming synonymous withprogramming?– Only experts program the “efficiency layer” in SW? What happens if all these projects fail?– Get heterogeneous architectures that areincompatible with each other’s code?8.12.2009Copyright Teemu Kerola 2009338.12.2009Copyright Teemu Kerola 200934358.12.2009Copyright Teemu Kerola 200936Concurrent Programming withNew Architectures Minimize synchronization and communication––––Amdahl’s LawBarrier synchronization often goodAvoid any complex synchronizations that do not scale upMutual exclusion should not be used in computational work Use shared memory when possible– Faster than sharing data with messages How to partition problem so that overall solution time isminimized?– Distribute computing and data– Minimize communication and synchronization– Trade computing time to communication time Prepare for faults– Many components, some will fail, must be prepared8.12.2009Copyright Teemu Kerola 2009Lecture 12: Current Research,Summary6

Concurrent Programming (RIO)8.12.2009Course ReviewCourse Review (contd) Deadlock Concurrency– Detection, prevention, avoidance - DDA, Bankers– Problems, atomic statements– Critical sections, synchronization, communication– What are the problems in writing concurrent programs? Semaphores– Private semaphores, split semaphores, baton passing,implementation, busy-wait semaphores Disabling interrupts, busy wait, or suspension? Monitors, protected objects– When to (not) use? HW vs. SW solution?– Shared memory or not? One or distributed system?– Condition variables, signal semantics Messages, RPC, channels, rendezvous Proofs of correctness––––8.12.2009– Concurrent algorithmsFunctionalityMutex, no deadlock, no starvationHow to do the proofs with temporal logic?How to determine what you really are trying to prove?Copyright Teemu Kerola 200937 Distributed mutex– (token passing) Ricart-Agrawala, Neilsen-Mizuno Current research8.12.2009 When to use what method for critical section (mutex),synchronization, or communication? How do you know you have a mutex problem? When would you use busy waits, semaphores, monitors,protected objects, RPC, channels, rendezvous? How do you implement XYZ with busy waits,semaphores, monitors, protected objects, RPC, channels,rendezvous? When is some technology not appropriate? Basic problems and solutions for them– Dining philosophers, sleeping barber, bakery– Readers-writers, producer-consumer Distributed system– Concurrency control mechanisms– Solutions for critical section problemCopyright Teemu Kerola 2009398.12.2009What Should You Know?An Introduction to Specification and VerificationSpesifioinnin ja verifioinnin perusteet What type of OS/programming language library toolsyou have for CS/synchronization/communicationproblems? Concurrency problems in distributed systemsOperating SystemsDistributed Systems What do you need to study to solve your problem? What type of tools would you need to solve yourproblem?KäyttöjärjestelmätHajautetut järjestelmät Concurrency tools for Java programming How does current research apply to me?Software Design (Java)– Do I need to study MPI (Message Passing Interface)?Lecture 12: Current Research,Summary40 How prove correctness of (concurrent) programs?– If serial solution is ok, use it!Copyright Teemu Kerola 2009Copyright Teemu Kerola 2009What Next at TKTL? When do you need concurrent/distributed algorithms?8.12.200938What Should You Know?Course Review (contd)8.12.2009Copyright Teemu Kerola 2009418.12.2009Ohjelmointitekniikka (Java)Copyright Teemu Kerola 2009427

Concurrent Programming (RIO)-- The End BOF/06-Borrett.pdfCopyright Teemu Kerola 2009Lecture 12: Current Research,Summary438

Concurrent Programming (RIO) 8.12.2009 Lecture 12: Current Research, Summary 1 Current Research Course Review [Pat 08] [A-TKS 07] [KCHK 06] Moore's Law Multiprocessor Challenge in 1980's Multicore Challenge Now Course Review 8.12.2009 Copyrghi t Teemu Kerola 2009 1 Lesson 12 Moore's Law "The number of transistors that can be

Related Documents:

The Rio Salado Redevelopment Study Area has been the focus of past and current planning related projects and revitalization efforts. The following list captures these efforts: 1. South Mountain Target Area B Redevelopment Plan 2. Rio Salado Oeste Plan 3. Rio Salado Habitat Restoration Project 4. Rio Montana Area Plan 5. Rio Salado Interim .

Concurrent coding features Concurrent Coding Dashboard is the launch padfor concurrent coders: Manage workflow Complete final coding Concurrent Coding Worklists Can be defined like CDI worklists Priority factors that support concurrent coding workflow ocase status, priority score, last coder, last reviewer Ability to add findings .

Chapter 1. Concurrent Object-Oriented Programming This book discusses some ways of thinking about, designing, and implementing concurrent programs in the Java programming language. Most presentations in this book assume that you are an experienced developer familiar with object-oriented (OO) programming, but have little exposure to concurrency.File Size: 2MBPage Count: 318

Other areas include Rio Salado's Library Services (page 32), Study Skills (pages 22-24), and Writing Tips (page 22). For Rio Salado College's online catalog, please visit: www.riosalado.edu\catalog Rio Salado College Catalog Th e Rio Salado College Catalog is published once a year. Please be aware that some courses and programs

The remaining waters are sourced from the Rio Chama Watershed. These tributaries and the mainstem Rio Chama encompass a drainage that is over 500,000 acres in size. Primary tributaries of the Rio Chama include Archuleta Creek, Rio Brazos, Little Willow Creek, Chavez Creek, Willow Creek, Horse Lake Creek, Rito de Tierra Amarilla, Rio

Huatulco en Oaxaca; y la cuenca del río San Pedro Mezquital en Durango, Nayarit y Zacatecas. Este proyecto se ejecuta bajo el patrocinio de la Fundación Gonzalo Río Arronte I.A.P. www.fgra.org.mx el río san pedro mezquital el gran desconocido wwf.org.mx/sanpedromezquital el río san pedro mezquital el gran desconocido

ples of Concurrent Programming and the first edition of Principles of Concurrent and Distributed Programming. Several developments have made it advisable to write a new edition. Surprisingly, the main reason is not any revolution in the prin-ciples of this subject. While the superficial technology may change, basic concepts

Tom Sawyer’s observations of his environment and the people he encounters. In addition, students will make their own observations about key aspects of the novel, and use the novel and the journal writing activity to make observations about their own world and the people they are surrounded by. This unit plan will allow students to examine areas of Missouri, both in Hannibal, and in their own .