Mathematical Modeling Of Many-cores - Technion

2y ago
16 Views
3 Downloads
1.52 MB
30 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Kairi Hasson
Transcription

Mathematical modeling of many-coresRan GinosarTechnion, IsraelSeptember 20131

Outline Many core architectures Mathematical model Open questions2

define many-cores Many-core is: a single chipwith many (how many?) cores and on-chip memoryrunning one (parallel) program at a time, solving one probleman accelerator Many-core is NOT: Not a “normal” multi-core Not running an OS Contending many-core architectures Shared memory (the Plural architecture, XMT)Tiled (Tilera, Godson-T)Clustered (Rigel)GPU (Nvidia)SIMDAssociative Processor Contending programming models3

Five many-core architectures4

Shared Memory ManycoreI/OEXAMPLE64 cores64 L1 caches16kB x64 1 MBP P P P P P P PP P P P P P P PL1 L1 L1 L1 L1 L1 L1 L1P P P P P P P PL1 L1 L1 L1 L1 L1 L1 L1I/OI/OL1 L1 L1 L1 L1 L1 L1 L1Core-to-memnetworkP P P P P P P PL1 L1 L1 L1 L1 L1 L1 L1shared memoryshared memmany banks1 MB x256 256 MBI/O5

Tiled ManycoreI/OP L2P L2P L2P L2P L2P L2P L2EXAMPLEP L2L1L1L1L1L1L1L1L1P L2P L2P L2P L2P L2P L2P L2P L264 tilesI/OP L2L1P L2L1P L2L1P L2L1P L2L1P L2L1P L2L1P L2L1L1L1L1L1L1L1L1P L2P L2P L2P L2P L2P L2P L2P L2L1P L2L1P L2L1P L2L1P L2L1P L2L1P L2L1P L264 L1 caches16kB x64 1 MBL1P L2L1L1L1L1L1L1L1L1P L2P L2P L2P L2P L2P L2P L2P L2I/OL14 MB L2 x64 256 MBL1L1L1L1L1L1L1L1P L2P L2P L2P L2P L2P L2P L2P L2mesh NOCsDirectory:All L2s L3L1L1L1L1L1L1L1L1P L2P L2P L2P L2P L2P L2P L2P L2L1L1L1L1L1L1L1L1I/O6

GPUEXAMPLE7

PPP256 coresI/OI/OPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPmemory banks1 MB x256 256 PPPPPPPPPPPPPPPPI/O8

Associative ProcessorI/OcontrolEXAMPLECombined memory & processingI/OI/O128 MB(each bit twicelarger thanSRAM)Each bit also computesI/O9

3D10

Add 3D logic & 3D memory The HMC industry already makes the first step 100,000 TSV vertical interconnects

Store & Compute 1 Tbyte / chip in 2020– Combined DRAM NVM Many many-cores NoC & 3D NoC Must be low power &cold: 0.1W THIS WILL CHANGEMANY-COREARCHITECTURE

Many cubes in a rack 1000 cubes 100 WMany racks in asupercomputer Less than 1 MW

Modeling Many-core Architectures14

A first many-core research question Given fixed area, into how many processorcores should we divide it? Other good questions (not dealt here): Given fixed power, how many cores? which cores? Given fixed energy, how many cores? which cores? Given target performance, how many? Which? Analysis can be based on Pollack’s rule16

The history at the basis of Pollack’s analysisShrink, scalingP5New architecture,same processP4P3Q: On red arrows, howmuch more performancefor how much morearea?P2P1TechnologygenerationsG1G2G3G4G517

Pollack’s rule for processors:Area or Power vs. Performance Pollack (& Borkar & Ronen, Micro 1999)observed many years of (intel) architecture In each Intel technology node, they compared: Old μArch (shrink from previous node) New μArch (faster clock and/or higher IPC) They noted: New μArch used 2-3X larger area New μArch achieved 1.5-1.7X higher performance Resulting from both higher frequency and higher IPC They did not consider power increase Who thought about power in 1999? Observation: Performance 𝑎𝑟𝑒𝑎18

Performance IPC FrequencyAverage IPC Experience shows: for higher performance,both IPC and frequency must be increased1.62.02500.571.5aSPECInt92 / MHz 0.6PowerPC1.50.5SPECInt92 100PENTIUM 1064Series11Series12Series13Series14Series15Speed demons0.5Series16005050100100150150200200Diep, Nelson & Shen, ISCA 1995250250300300350350400400 MHz19Frequency

The many-core fixed-total-area model Assume fixed chip area (typically 300-500 mm2) Split chip area A Acores Amem Split (memory size) affects on-chip hit rate Amem may be further split into AL1 AL2 Divide Acores into m cores. How many ? Area of each core: 𝑎 𝐴𝑐𝑜𝑟𝑒𝑠𝑚.Thus, m 1𝑎 [Pollack’s]: core area determines core performance. SelectIPC and frequency f so that: Performance (core) IPC 𝑓 𝑎. Thus, a 𝐼𝑃𝐶 2 𝑓 2 , m 1 Power (core) a 𝑓 𝐼𝑃𝐶 2 𝑓 3𝐼𝑃𝐶 2 𝑓2 Assume perfect parallelism (at least as upper bound) Performance (m cores) IPC 𝑓 𝑚 Power (m cores) a 𝑓 𝑚 1𝑓Summary: Performance 𝑚,𝐼𝑃𝐶 2 𝑓3𝐼𝑃𝐶 2 𝑓2𝐼𝑃𝐶 𝑓𝐼𝑃𝐶 2 𝑓2 f Power 1𝐼𝑃𝐶 𝑓 𝐼𝑃𝐶 𝑚𝐼𝑃𝐶 𝑚 𝑚1𝐼𝑃𝐶 𝑚1 𝑓,𝑚m 1𝑓220

Performance (core) IPC 𝑓21

a 𝐼𝑃𝐶 2 𝑓 2For each IPC curve, a 𝑓 222

m 1𝐼𝑃𝐶 2 𝑓2For each IPC curve,m 1𝑓223

1Performance 𝑓Power 𝑓 𝑚1𝑚24

𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 1 𝑓1𝑚 2 𝑚𝑃𝑜𝑤𝑒𝑟𝑓𝑓1 𝑚Analysis of the results so far: Slower frequency and lower IPC higher performance, lower power Thanks to Pollack’s square ruleBut this changes when we also consider memory power 25

Now add memory So far, only computing power Including power to access local cache/memory in eachcore But we also need to access not-so-local sharedmemory Access rate to memory: once every rm instructions E.g. about every 20 instructions Assume using only on-chip memory Need to add memory access power to thecomputing power Relative energy: assume access is 10x higher than exec.26

1𝑓1 𝑓𝑓1𝑓1 𝑓𝑓𝑚𝑚 1𝑚𝑚𝑚 1𝑚27

Does the model apply to different architectures?28

Shared memory versus Tiled architecturesArchitectureLocal memoryShared memoryL1 in each coreTiledL1 & L2 in each coreGlobal (on-chip) memory Shared memoryL2 of other coresCore-to-global-memorynetworkDedicated cores-tomemories, e.g. MINIndirect via othercores/routers, e.g. meshAccess rates(strongly depends onapp; Examples:)1/20 to L11/1,000 to shared mem1/20 to L11/1,000 to L21/50,000 to othersAccess time (cycles)2 to L110 to shared memory2 to L110 to L2100 to othersAccess energy(relative to one registerinstruction)2x to L120x to shared mem2x to L15x to L2100x to others29

Shared memory & Tiled architectures versus othersArchitectureShared memoryLocal memoryL1 in each coreL1 & L2 in each coreGlobal (on-chip)memoryShared memoryL2 of other coresCore-to-globalmemory networkDedicated cores-tomemories, e.g. MINIndirect via othercores/routers, e.g.meshAccess rates(strongly dependson app;Examples:)1/20 to L11/1,000 to shared mem1/20 to L11/1,000 to L21/50,000 to othersAccess time(cycles)2 to L110 to shared memory2 to L110 to L2100 to othersAccess energy(relative to oneregister instruction)2x to L120x to shared mem2x to L15x to L2100x to othersTiledGPUSIMDAP30

Summary of the model Considering only cores, fixed-total-area modelimplies: for highest performance and lowestpower, use smallest / weakest cores (lowest IPC) lowest frequency Adding on-chip access to memory leads to adifferent conclusion: for lowest power andhighest performance/power ratio, use Strongest cores (high IPC) But stay with lowest frequency Lower frequency lower access rate to global memory How does this apply to other architectures?31

Mathematical modeling of many-cores Ran Ginosar Technion, Israel September 2013 1. Outline Many core architectures Mathematical model Open questions 2. define many-cores Many-core is: a single chip with many (how many?) cores and on-chip memory runn

Related Documents:

So, I say mathematical modeling is a way of life. Keyword: Mathematical modelling, Mathematical thinking style, Applied 1. Introduction: Applied Mathematical modeling welcomes contributions on research related to the mathematical modeling of e

SPARC @ Oracle 16 x 2nd Gen cores 6MB L2 Cache 1.7 GHz 8 x 3 rd Gen Cores 4MB L3 Cache 3.0 GHz 16 x 3rd Gen Cores 8MB L3 Cache 3.6 GHz 12 x 3rd Gen 48MB L3 Cache 3.6 GHz 6 x 3 Gen Cores 48MB L3 Cache 3.6 GHz T3 T4 T5 M5 M6 S7 32 x 4th Gen Cores 64MB L3 Cache 4.1 GHz DAX1 M7 8 x 4th Gen Co

ISO 5264-2:2002 Pulps -- Laboratory beating -- Part 2: PFI mill method 7.3 Cores - Tests on cores ISO 11093-1:1994 Paper and board -- Testing of cores -- Part 1: Sampling ISO 11093-2:1994 Paper and board -- Testing of cores -- Part 2: Conditioning of test samples ISO 11093-3:1994 Paper and board -- Testing of cores -- Part 3: Determination of moisture content using the oven drying method

2.1 Mathematical modeling In mathematical modeling, students elicit a mathematical solution for a problem that is formulated in mathematical terms but is embedded within meaningful, real-world context (Damlamian et al., 2013). Mathematical model

The many-core future Hardware is leaving many HPC users and codes behind Clock rate is going down, number of cores is increasing Memory per core is going down Majority of codes scale to less than 512 cores These will soon be desk-side systems Less than 10 codes in EU today will scale on capability systems with 100,000 cores Lindgren already has more than 36,000 cores

14 D Unit 5.1 Geometric Relationships - Forms and Shapes 15 C Unit 6.4 Modeling - Mathematical 16 B Unit 6.5 Modeling - Computer 17 A Unit 6.1 Modeling - Conceptual 18 D Unit 6.5 Modeling - Computer 19 C Unit 6.5 Modeling - Computer 20 B Unit 6.1 Modeling - Conceptual 21 D Unit 6.3 Modeling - Physical 22 A Unit 6.5 Modeling - Computer

What is mathematical modeling? – Modeling (Am -English spelling) or modelling (Eng spelling) is a mathematical process in which mathematical problem solvers create a solution to a problem, in an attempt to make sense of mathematical phenomena (e.g., a data set, a graph, a diagram, a c

Animal Fun Challenge Pack . Fold the paper plate in half. 2. Trace the elephant's outline on one side. 3. Colour or paint the elephant (not the tusk). 4. Cut out the elephant making sure not to cut the folded edge except for the shaping at each end. 5. Carefully cut out the paper plate section between the legs leaving the edge of the paper plate connecting the legs to make the rocker. (This .