ELE 475 / COS 475 Computer Architecture Lecture 13 .

2y ago

4 Views

1 Downloads

1.45 MB

39 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Mya Leung

Report this link

Download PDF

Transcription

Computer ArchitectureELE 475 / COS 475Slide Deck 12: MultithreadingDavid WentzlaffDepartment of Electrical EngineeringPrinceton University1

Agenda Multithreading Motivation Course Grain Multithreading Simultaneous Multithreading2

Multithreading Difficult to continue to extract instruction-levelparallelism (ILP) or data level parallelism (DLP)from a single sequential thread of control Many workloads can make use of thread-levelparallelism (TLP)– TLP from multiprogramming (run independent sequential jobs)– TLP from multithreaded applications (run one job faster usingparallel threads) Multithreading uses TLP to improve utilization ofa single processor3

Pipeline Hazardst0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14LW r1, 0(r2)LW r5, 12(r1)ADDI r5, r5, #12SW 12(r1), r5F D X MWF D D D D X MWF F F F D D D D X MWF F F F D D D D Each instruction may depend on the nextWhat is usually done to cope with this?5

MultithreadingHow can we guarantee no dependencies betweeninstructions in a pipeline?-- One way is to interleave execution of instructionsfrom different program threads on same pipeline7

MultithreadingHow can we guarantee no dependencies betweeninstructions in a pipeline?-- One way is to interleave execution of instructionsfrom different program threads on same pipelineInterleave 4 threads, T1-T4, on non-bypassed 5-stage pipet0 t1 t2 t3 t4 t5 t6 t7T1: LW r1, 0(r2)T2: ADD r7, r1, r4T3: XORI r5, r4, #12T4: SW 0(r7), r5T1: LW r5, 12(r1)F D X MF D XF DFWMXDFt8t9WMWX MWD X MW8

Simple Multithreaded PipelinePCPCPC 1PC 111I IRGPR1GPR1GPR1GPR1XYD 12 Threadselect2 Have to carry thread select down pipeline to ensure correct state bitsread/written at each pipe stage Appears to software (including OS) as multiple, albeit slower, CPUs10

Multithreading Costs Each thread requires its own user state– PC– GPRs Also, needs its own system state– virtual memory page table base register– exception handling registers– Other system state Other overheads:– Additional cache/TLB conflicts from competing threads– (or add larger cache/TLB capacity)– More OS overhead to schedule more threads (where do all thesethreads come from?)11

Thread Scheduling Policies Fixed interleave (CDC 6600 PPUs, 1964)– Each of N threads executes one instruction every N cycles– If thread not ready to go in its slot, insert pipeline bubble– Can potentially remove bypassing and interlocking logic Software-controlled interleave (TI ASC PPUs, 1971)– OS allocates S pipeline slots amongst N threads– Hardware performs fixed interleave over S slots, executingwhichever thread is in that slot Hardware-controlled thread scheduling (HEP, 1982)– Hardware keeps track of which threads are ready to go– Picks next thread to execute based on hardware priorityscheme12

Coarse-Grain Hardware Multithreading Some architectures do not have many lowlatency bubbles Add support for a few threads to hideoccasional cache miss latency Swap threads in hardware on cache miss13

Denelcor HEP(Burton Smith, 1982)BRL HEP MachineImage c-computers/png/hep2.pngFirst commercial machine to use hardware threading inmain CPU––––120 threads per processor10 MHz clock rateUp to 8 processorsprecursor to Tera MTA / Cray XMT (Multithreaded Architecture)14

Tera (Cray) MTA (1990) Up to 256 processors Up to 128 active threads per processor Processors and memory modules populatea sparse 3D torus interconnection fabric Flat, shared main memory––No data cacheSustains one main memory access per cycle per processor GaAs logic in prototype, 1KW/processor @260MHzImage Credit:Tera Computer Company– Second version CMOS, MTA-2, 50W/processor– New version, XMT, fits into AMD Opteronsocket, runs at 500MHz15

MTA PipelineIssue PoolInst FetchWWrite PoolMemory PoolMACWW Every cycle, oneVLIW instruction fromone active thread islaunched into pipeline Instruction pipelineis 21 cycles long Memory operationsincur 150 cycles oflatencyRetry PoolInterconnection NetworkMemory pipeline16

MIT Alewife (1990) Modified SPARC chips– register windows holddifferent thread contexts Up to four threads per node Thread switch on local cachemissImage Credit: MIT17

Oracle/Sun Niagara processors Target is datacenters running web servers anddatabases, with many concurrent requests Provide multiple simple cores each with multiplehardware threads, reduced energy/operationthough much lower single thread performance Niagara-1 [2004], 8 cores, 4 threads/core Niagara-2 [2007], 8 cores, 8 threads/core Niagara-3 [2009], 16 cores, 8 threads/core18

Oracle/Sun Niagara-3, “Rainbow Falls” 2009Image Credit: Oracle/SunImage Credit: Oracle/Sun19From Hot Chips 2009 Presentation by Sanjay Patel

Simultaneous Multithreading (SMT)for OOO Superscalars Techniques presented so far have all been“vertical” multithreading where each pipelinestage works on one thread at a time SMT uses fine-grain control already presentinside an OOO superscalar to allowinstructions from multiple threads to enterexecution on same clock cycle. Gives betterutilization of machine resources.20

Ideal Superscalar Multithreading[Tullsen, Eggers, Levy, UW, 1995]Issue widthTime Interleave multiple threads to multiple issueslots with no restrictions21

For most apps, most execution units lieidle in an OOO superscalarFor an 8-waysuperscalar.Image From: Tullsen, Eggers,and Levy,“Simultaneous Multithreading:Maximizing On-chip Parallelism”,ISCA 1995.22

Superscalar Machine EfficiencyIssue widthInstructionissueCompletely idle cycle(vertical waste)TimePartially filled cycle,i.e., IPC 4(horizontal waste)23

Vertical MultithreadingIssue widthInstructionissueSecond thread interleavedcycle-by-cycleTimePartially filled cycle,i.e., IPC 4(horizontal waste) What is the effect of cycle-by-cycle interleaving?24

Vertical MultithreadingIssue widthInstructionissueSecond thread interleavedcycle-by-cycleTimePartially filled cycle,i.e., IPC 4(horizontal waste) What is the effect of cycle-by-cycle interleaving?– removes vertical waste, but leaves some horizontalwaste25

Chip Multiprocessing (CMP)Issue widthTime What is the effect of splitting into multiple processors?26

Chip Multiprocessing (CMP)Issue widthTime What is the effect of splitting into multiple processors?– reduces horizontal waste,– leaves some vertical waste, and– puts upper limit on peak throughput of each thread.27

Ideal Superscalar Multithreading[Tullsen, Eggers, Levy, UW, 1995]Issue widthTime Interleave multiple threads to multiple issueslots with no restrictions28

OOO Simultaneous Multithreading[Tullsen, Eggers, Emer, Levy, Stamm, Lo, DEC/UW, 1996] Add multiple contexts and fetch engines andallow instructions fetched from different threadsto issue simultaneously Utilize wide out-of-order superscalar processorissue queue to find instructions to issue frommultiple threads OOO instruction window already has most of thecircuitry required to schedule from multiplethreads Any single thread can utilize whole machine29

SMT adaptation to parallelism typeFor regions with high thread levelparallelism (TLP) entire machine widthis shared by all threadsIssue widthTimeFor regions with low thread levelparallelism (TLP) entire machine width isavailable for instruction level parallelism(ILP)Issue widthTime30

Power 4[POWER 4 system microarchitecture, Tendler et al, IBM J. Res. & Dev., Jan 2002] Image Credit: IBMCourtesy of International Business Machines, International Business Machines.Power 52 commits(architectedregister sets)2 fetch (PC),2 initial decodes[POWER 5 system microarchitecture, Sinharoy et al, IBM J. Res. & Dev., Jul/Sept 2005] Image Credit: IBMCourtesy of International Business Machines, International Business Machines.31

Power 5 data flow .Image Credit: Carsten Schulz[POWER 5 system microarchitecture, Sinharoy et al, IBM J. Res. & Dev., Jul/Sept 2005] Image Credit: IBMCourtesy of International Business Machines, International Business Machines.Why only 2 threads? With 4, one of the sharedresources (physical registers, cache, memorybandwidth) would be prone to bottleneck32

Changes in Power 5 to support SMT Increased associativity of L1 instruction cache andthe instruction address translation buffers Added per thread load and store queues Increased size of the L2 (1.92 vs. 1.44 MB) and L3caches Added separate instruction prefetch and bufferingper thread Increased the number of virtual registers from 152to 240 Increased the size of several issue queues The Power5 core is about 24% larger than thePower4 core because of the addition of SMTsupport33

Pentium-4 Hyperthreading (2002) First commercial SMT design (2-way SMT)– Hyperthreading SMT Logical processors share nearly all resources of the physicalprocessor– Caches, execution units, branch predictors Die area overhead of hyperthreading 5% When one logical processor is stalled, the other can make progress– No logical processor can use all entries in queues when two threads areactive Processor running only one active software thread runs atapproximately same speed with or without hyperthreading Hyperthreading dropped on OOO P6 based follow-ons to Pentium-4(Pentium-M, Core Duo, Core 2 Duo), until revived with Nehalemgeneration machines in 2008. Intel Atom (in-order x86 core) has two-way vertical multithreading34

Initial Performance of SMT Pentium 4 Extreme SMT yields 1.01 speedup forSPECint rate benchmark and 1.07 for SPECfp rate– Pentium 4 is dual threaded SMT– SPECRate requires that each SPEC benchmark be run against avendor-selected number of copies of the same benchmark Running on Pentium 4 each of 26 SPEC benchmarks pairedwith every other (262 runs) speed-ups from 0.90 to 1.58;average was 1.20 Power 5, 8-processor server 1.23 faster for SPECint ratewith SMT, 1.16 faster for SPECfp rate Power 5 running 2 copies of each app speedup between0.89 and 1.41– Most gained some– Floating Point apps had most cache conflicts and least gains35

Icount Choosing PolicyFetch from thread with the least instructions in flight.Why does this enhance throughput?36

Time (processor cycle)Summary: Multithreaded CategoriesSuperscalarFine-Grained Coarse-GrainedThread 1Thread 2MultiprocessingThread 3Thread 4SimultaneousMultithreadingThread 5Idle slot37

Acknowledgements These slides contain material developed and copyright by:–––––––Arvind (MIT)Krste Asanovic (MIT/UCB)Joel Emer (Intel/MIT)James Hoe (CMU)John Kubiatowicz (UCB)David Patterson (UCB)Christopher Batten (Cornell) MIT material derived from course 6.823 UCB material derived from course CS252 & CS152 Cornell material derived from course ECE 475038

t0 t1 t2 t3 t4 t5 t6 t7 t8 F D D D D X M W F F F F D D D D X M W F D t9 t10 t11 t12 t13 t14 . Modified SPARC chips –register windows hold . Niagara-3 [2009], 16 cores, 8 threads/core 18 . Oracle/Sun

Related Documents:

TRIGONOMETRI Andini tresnaningsih sylvia nopiani risa p.

2. Perkalian Sinus dan Sinus Dari rumus jumlah dan selisih dua sudut, dapat diperoleh rumus sebagai berikut: cos (A B) cos A cos B – sin A sin B cos (A – B) cos A cos B sin A sin B _ cos (A B) – cos (A –B) –2 sin A sin B Jadi, rumus perkalian antara sinus dengan sinus adalah: 3. Perkalian Cosinus dan Sinus

44 Views

3y ago

Executing a Multi-Year Multi-Method Electronic Data ...

ASM/COS) 1.Respondent Debriefings (2015 ASM/COS) 2. Paradata Analysis (2015 ASM/COS) 2. Usability Testing (2016 ASM/COS) 1. Two Rounds Usability Testing (2017 Economic Census) 2. Respondent Debriefings (2016 ASM/COS) 3. Paradata Analysis (2016 ASM/COS) 1. Paradata Analysis (2016 ASM/COS) 2. Paradata Analy

34 Views

2y ago

5.5 Multiple Angle and Product-to-Sum Formulas

410 Chapter 5 Analytic Trigonometry Half-Angle Formulas The signs of and depend on the quadrant in which lies. u 2 sin cos tan u 2 1 cos u sin u sin u 1 cos u cos u 2 1 cos u 2 sin u 2 1 cos u 2 Example 6 To find the exact value of a trigonometric function with an angle measure in for

46 Views

2y ago

Formulas from Trigonometry

Formulas from Trigonometry: sin 2A cos A 1 sin(A B) sinAcosB cosAsinB cos(A B) cosAcosB tansinAsinB tan(A B) A tanB 1 tanAtanB sin2A 2sinAcosA cos2A cos2 A sin2 A tan2A 2tanA 1 2tan A sin A 2 q 1 cosA 2 cos A 2 q 1 cos A 2 tan 2 sinA 1 cosA sin2 A 1 2 21 2 cos2A cos A 1 2 1 2 cos2A sinA sinB 2sin 1 2 (A B)cos 1 2 (A 1B .

143 Views

2y ago

2). 5 (-a,(a,9). (a,y)

The polar equation r cos 6' produces a shifted circle. The top point is at 6' s/4, which gives r h/2. When 6' goes from 0 to 27r, we go two times around the graph. Rewriting as r2 r cos 6' leads to the xy equation x2 y2 x. Substituting r cos 6' into x r cos 6' yields x cos 26' and similarly y cos 6' sin 6'. In this form

9 Views

1y ago

Página 131

5. BACHILLERATO 5 2 Ecuaciones trigonométricas Página 134 Hazlo tú. Resuelve sen (α 30 ) 2 cos α. sen (α 30 ) 2 cos α sen α cos 30 cos α sen 30 2 cos α sen aa cosc os a 2 1 2 3 2 Dividimos los dos miembros entre cos α: tg a8 tg a8 tg a 2 1 2 3 23 44 - 3

11 Views

1y ago

ERAMEGA 369 SOFTGELS | N E W L A U N C H

biotin usp 300 mcg potassium iodide eq. to ele. iodine ip 50 mcg zinc oxide eq. to ele. zinc ip 10 mg. manganese suplhate eq. to ele. manganese usp 4 mg. copper sulphate pentahydrate eq. to ele. copper bp 1 mg. sodium selenite pentahydrate eq. to ele selnium bp 40 mcg chromium picolinate eq.

9 Views

2y ago

BUS 475 Capstone Final Examination Part 1

Examination Part 1, BUS 475 Capstone Final Examination Part 1 Test Paper, UOP Business 475 Final Exam Solution, BUS 475 Capstone Final Examination Part 1 Questions and Answers, BUS 475 Complete Course, BUS 475 Complete Assignment for University Of Phoenix. 1. Article 6 of the Treaty on European Union, called the Maastricht Treaty,

19 Views

2y ago

Recent Views

E-Cigarette Maker Hit with Class Action Lawsuit - Truth in Advertising

action lawsuit Wednesday alleging that e-cigarettes are "unreasonably dangerous, harmful and/or . tlement-investigation/) Life Insurance Claims Lawsuit & Annuities Fraud Class Action Lawsuit Investigation

1y ago

138 Views

Se Filing Your Lawsuit in Ederal Andbook Court

Defendant: An individual (or business) against whom a lawsuit is filed. In Forma Pauperis (IFP): When the filer has been granted the ability to file their lawsuit in federal court without paying the civil filing fee. Litigation: A case, controversy, or lawsuit. Participants (plaintiffs and defendants) in

1y ago

125 Views

The Divine Covenant Lawsuit Motif in Canonical Perspective

the lawsuit genre was the city gate, and instead proposed the cult as the . Hesse insists that the cultic pronouncements and the prophetic lawsuit must be distinguished: the cult always pronounces judgment o

2y ago

129 Views

Talc to a Fws&g Export Right Now 888-7323389

Iutnwstnrts attached Illinois Lawsuit Funding issists zla:nttfs Throognout the ware & I::r,rna wan are facrng tranual nt'::,.: rS and ytCn for nor ia'wud Sc snr.s nuy -cl Ce in aptrs' ft-u n-tv 0-jar,-, or a lawson 4cl r.arwu Of SlWit and no to SIX X-O rirpaidinss ci your raven ou woda Itnstaqn Illinois Lawsuit Loans Lawsuit Loans In .

1y ago

107 Views

Christopher v. Residence Mutual Insurance Company (San .

Christopher v. Residence Mutual Insurance Company in the Superior Court of California, County of Los Angeles. The Lawsuit was transferred to the Superior Court of California, County of San Bernardino, Case No. CIVDS1711860. The Lawsuit alleges that RMIC failed to provide agreed u

3y ago

172 Views

CONFIRMED: The Trillion-Dollar Lawsuit That Could End .

information leaked by Benjamin Fulford -- the former Asia-Pacific bureau chief for Forbes Magazine -- on a week-by-week basis. Finally, the lawsuit at the epicenter of this investigation has now become a tangible reality -- validating everything Fulford has been sayin

2y ago

145 Views

HOW TO FILE AN ANSWER TO AN EVICTION LAWSUIT

SUPERIOR COURT OF STANISLAUS COUNTY SELF HELP CENTER HOW TO FILE AN ANSWER TO AN EVICTION LAWSUIT (UNLAWFUL DETAINER ) Material prepared and/or distributed by the Superior Court Clerk’s Office IS INTENDED FOR INFORMATIONAL AND EDUCATIONAL PURPOSES ONLY. Such material is NOT intended t

2y ago

356 Views

LIVINGSTON COUNTY JAIL LAWSUIT SETTLED I

LIVINGSTON COUNTY JAIL LAWSUIT SETTLED M ARCH 2004 1 The American Civil Liberties Union of Michigan 60 W. Hancock Detroit, MI 48201-1343 (313) 578-6800 www.aclumich.org . the federal district court in Bay City struck it down in 2000. In June, 2003 the entire Sixth Circuit upheld the District Court

2y ago

425 Views

F C A Pro Se Guide

Missouri, then there would be “diversity.” In a diversity case, the defendant may challenge your decision to file the lawsuit in a particular U.S. District Court by filing a motion. For example, if you file your lawsuit in the District of Kansas but the defendant believes that the

2y ago

341 Views

Civil Lawsuit Basics: Motions for Summary Judgment

Civil Lawsuit Basics: Motions for Summary Judgment Presented by Sandra Levin Executive Director LA Law Library October 22, 2016 . Disclaimer! LA Law Library does not provide legal advice. LA Law Library provides legal resources and assistance with legal research as an educational service. T

2y ago

322 Views

EXHIBIT A - Kia Engine Settlement

Kia Motors America, Inc., No. 8:17-cv-00838 (C.D. Cal.) on May 10, 2017; Plaintiffs Stanczak and Creps filed the proposed nationwide class action lawsuit Stanczak and Creps v. Kia Motors America, Inc. et al., No. 8:17-cv-1365 (C.D. Cal.) on August 8, 2017; Plaintiffs Kinnick and Coats filed the proposed nationwide class action lawsuit Kinnick

2y ago

358 Views

A Step-by-step Guide to Filing a Civil Lawsuit in The United States .

A STEP-BY-STEP GUIDE TO FILING A CIVIL LAWSUIT . IN THE UNITED STATES DISTRICT COURT . FOR THE WESTERN DISTRICT OF TEXAS . Rev. Ed. October 26, 2017 ACKNOWLEDGMENT . This Guide was prepared and revised in cooperation with the . San Antonio Chapter of the Federal Bar Association .

1y ago

122 Views

Attorney General Balderas Announces Lawsuit to Halt Holtec Nuclear .

Attorney General Balderas Announces Lawsuit to Halt Holtec Nuclear Storage Facility . Santa Fe, NM---Today, Attorney General Hector Balderas announced that the State of New Mexico filed suit against the United States Nuclear Regulatory Commission ("NRC" or "the Commission") and the United States seeking to stop them from indefinitely storing

1y ago

142 Views

Ł -- I - Utah State Bar

Lawsuit Claims and/or created a duty on the part of the insurer to defend the Claims at its expense. Specifically, throughout 2008 and 2009 (when the Philips Lawsuit Claims were asserted in Philips' initial and amended complaints), BCT was the named insured under both a

1y ago

127 Views

Updates:Legal Updates: Anatomy of a Lawsuit - Lehigh University

! jurisdiction:Personal jurisdiction:! a lawsuit! a mayspecific court, the defendant is the ty that may jurisdiction! r py, ersonal j residency, injuring tort) 2/18/2009 12. Jurisdiction! . Microsoft PowerPoint - t [Compatibility Mode] .

1y ago

116 Views

ELE 475 / COS 475 Computer Architecture Lecture 13 .

It looks like you're using an ad-blocker