Memory Scaling Is Dead, Long Live Memory Scaling

7m ago
10 Views
1 Downloads
1,001.21 KB
23 Pages
Last View : Today
Last Download : 3m ago
Upload by : Farrah Jaffe
Transcription

Memory Scaling is Dead, Long Live Memory Scaling Le Memoire Scaling est mort, vive le Memoire Scaling! Moinuddin K. Qureshi ECE, Georgia Tech At Yale’s “Mid Career” Celebration at University of Texas at Austin, Sept 19 2014

The Gap in Memory Hierarchy L1(SRAM) EDRAM 21 23 25 DRAM 27 29 Flash ? 211 213 215 217 HDD 219 221 223 Typical access latency in processor cycles (@ 4 GHz) Misses in main memory (page faults) degrade performance severely Main memory system must scale to maintain performance growth

The Memory Capacity Gap Trends: Core count doubling every 2 years. DRAM DIMM capacity doubling every 3 years Lim ISCA’09 Memory capacity per core expected to drop by 30% every two years

Challenges for DRAM: Scaling Wall Scaling wall DRAM does not scale well to small feature sizes (sub 1x nm) Increasing error rates can render DRAM scaling infeasible

Two Roads Diverged Architectural support for DRAM scaling and to reduce refresh overheads Find alternative technology that avoids problems of DRAM DRAM challenges Important to investigate both approaches

Outline Introduction ArchShiled: Yield Aware (arch support for DRAM ) Hybrid Memory: reduce Latency, Energy, Power Adaptive Tuning of Systems to Workloads Summary

Reasons for DRAM Faults Unreliability of ultra-thin dielectric material In addition, DRAM cell failures also from: – Permanently leaky cells – Mechanically unstable cells – Broken links in the DRAM array Q DRAM Cell Capacitor Charge Leaks Permanently Leaky Cell DRAM Cells DRAM Cell Capacitor (tilting towards ground) Mechanically Unstable Cell Broken Links Permanent faults for future DRAMs expected to be much higher 7

Row and Column Sparing DRAM chip (organized into rows and columns) have spares Replaced Columns Spare Columns Faults Deactivated Rows and Columns Replaced Rows Spare Rows DRAM Chip: Before Row/Column Sparing DRAM Chip: After Row/Column Sparing Laser fuses enable spare rows/columns Entire row/column needs to be sacrificed for a few faulty cells Row and Column Sparing incurs large area overheads 8

Commodity ECC-DIMM Commodity ECC DIMM with SECDED at 8 bytes (72,64) Mainly used for soft-error protection For hard errors, high chance of two errors in same word (birthday paradox) For 8GB DIMM 1 billion words Expected errors till double-error word 1.25*Sqrt(N) 40K errors 0.5 ppm SECDED not enough at high error-rate (what about soft-error?) 9

Dissecting Fault Probabilities At Bit Error Rate of 10-4 (100ppm) for an 8GB DIMM (1 billion words) Faulty Bits per word (8B) Probability Num words in 8GB 0 99.3% 0.99 Billion 1 0.7% 7.7 Million 2 26 x 10-6 28 K 3 62 x 10-9 67 4 10-10 0.1 Most faulty words have 1-bit error The skew in fault probability can be leveraged for low cost resilience Tolerate high error rates with commodity ECC DIMM while retaining soft-error resilience 10

ArchShield: Overview Inspired from Solid State Drives (SSD) to tolerate high bit-error rate Expose faulty cell information to Architecture layer via runtime testing Replication Area Fault Map Main Memory ArchShield Most words will be error-free Fault Map (cached) 1-bit error handled with SECDED Multi-bit error handled with replication ArchShield stores the error mitigation information 11 in memory

ArchShield: Yield Aware Design When DIMM is configured, runtime testing is performed. Each 8B word gets classified into one of three types: No Error 1-bit Error Multi-bit Error (Replication not needed) SECDED can correct hard error Word gets decommissioned SECDED can correct soft error Need replication for soft error Only the replica is used (classification of faulty words can be stored in hard drive for future use) Tolerates 100ppm fault rate with 1% slowdown and124% capacity loss

Outline Introduction ArchShiled: Yield Aware (arch support for DRAM ) Hybrid Memory: reduce Latency, Energy, Power Adaptive Tuning of Systems to Workloads Summary

Emerging Technology to aid Scaling Phase Change Memory (PCM): Scalable to sub 10nm Resistive memory: High resistance (0), Low resistance (1) Advantages: scalable, has MLC capability, non volatile (no leakage) PCM is attractive for designing scalable memory systems. But

Challenges for PCM Key Problems: 1. Higher read latency (compared to DRAM) 2. Limited write endurance ( 10-100 million writes per cell) 3. Writes are much slower, and power hungry Replacing DRAM with PCM causes: High Read Latency, High Power High Energy Consumption How do we design a scalable PCM without these disadvantages?

Hybrid Memory: Best of DRAM and PCM PCM Main Memory DATA Processor DRAM Buffer T Flash Or HDD DATA T Tag-Store PCM Write Queue Hybrid Memory System: 1. DRAM as cache to tolerate PCM Rd/Wr latency and Wr bandwidth 2. PCM as main-memory to provide large capacity at good cost/power 3. Write filtering techniques to reduces wasteful writes to PCM

Latency, Energy, Power: Lowered 1.1 Normalized Execution Time 1 0.9 0.8 8GB DRAM 32GB PCM 32GB DRAM 32GB PCM 1GB DRAM 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 db1 db2 qsort bsearch kmeans gauss daxpy vdotp gmean Hybrid memory provides performance similar to iso-capacity DRAM Also avoids the energy/power overheads from frequent writes

Outline Introduction ArchShiled: Yield Aware (arch support for DRAM ) Hybrid Memory: reduce Latency, Energy, Power Adaptive Tuning of Systems to Workloads Summary

Workload Adaptive Systems Different policies work well for different workloads 1. 2. 3. 4. 5. No single replacement policy works well for all workloads Or, the prefetch algorithm Or, the memory scheduling algorithm Or, the coherence algorithm Or, any other policy (write allocate/no allocate?) Unfortunately: systems are designed to cater to average case (a policy that works good enough for all workloads) Ideal for each workload to have the policy that works best for it

Adaptive Tuning via Runtime Testing Say we want to select between two policies: P0 and P1 Divide the cache in three: – Dedicated P0 sets – Dedicated P1 sets – Follower sets (winner of P0,P1) n-bit saturating counter misses to P0-sets: counter misses to P1-set: counter-Counter decides policy for Followers: – MSB 0, Use P0 – MSB 1, Use P1 miss P0-sets n-bit cntr P1-sets miss – Follower Sets monitor choose apply (Set Dueling: using a single counter) Adaptive Tuning can allow dynamic policy selection at low cost

Outline Introduction ArchShiled: Yield Aware (arch support for DRAM ) Hybrid Memory: reduce Latency, Energy, Power Adaptive Tuning of Systems to Workloads Summary

Challenges for Computer Architects End of: Technology Scaling, Frequency Scaling, Moore’s Law, ? How do we address these challenges: The solution for all computer architecture problems is: Yield Awareness Hybrid memory: Latency, Energy, Power reduction for PCM Workload adaptive systems: low cost “Adaptivity Through Testing”

Challenges for Computer Architects End of: Technology Scaling, Frequency Scaling, Moore’s Law, ? How do we address these challenges: The solution for all computer architecture problems is: Yield Awareness Hybrid memory: Latency, Energy, Power reduction for PCM Workload adaptive systems: get low cost Adaptivity Through Testing Happy th 75 Yale !

Memory Scaling is Dead, Long Live Memory Scaling Le Memoire Scaling est mort, vive le Memoire Scaling! . The Gap in Memory Hierarchy Main memory system must scale to maintain performance growth 21 3 227 11 13 2215 219 23 Typical access latency in processor cycles (@ 4 GHz) L1(SRAM) EDRAM DRAM HDD 25 29 217 221 Flash

Related Documents:

Measurement and Scaling Techniques Measurement In Research In our daily life we are said to measure when we use some yardstick to determine weight, height, or some other feature of a physical object. We also measure when we judge how well we like a song, a File Size: 216KBPage Count: 23Explore further(PDF) Measurement and Scaling Techniques in Research .www.researchgate.netMeasurement & Scaling Techniques PDF Level Of .www.scribd.comMeasurement and Scaling Techniqueswww.slideshare.netMeasurement and scaling techniques - SlideSharewww.slideshare.netMeasurement & scaling ,Research methodologywww.slideshare.netRecommended to you b

AWS Auto Scaling lets you use scaling plans to configure a set of instructions for scaling your resources. If you work with AWS CloudFormation or add tags to scalable resources, you can set up scaling plans for different sets of resources, per application. The AWS Auto Scaling console provides recommendations for

3 books about Dead Sea Scrolls, such as The Meaning of the Dead Sea Scrolls by A. Powell Davies, The Dead Sea Scrolls Deceptions by Richard Leigh and Michael Baigent, Jesus and the Riddle of the Dead Sea Scrolls by Barbara Thiering, The Teacher of Righteousness and the Dead Sea Scrolls by Bette Stockbauer and some Dead Sea Scrolls-related works by Kenneth Von Pfettenbach.

DEAD WEEK 1 TEAM WORKOUT 8-11 am TEAM WORKOUT 8-11 am TEAM WORKOUT 8-11 am TEAM WORKOUT 8-11 am CORKY KELL 7 on 7 Roswell Area Parks 11 12 DEAD WEEK 1 DEAD WEEK 1 DEAD WEEK 1 DEAD WEEK 1 DEAD WEEK 1 DEAD WEEK 1 4 5 June 2021 May '21 July '21

In memory of Paul Laliberte In memory of Raymond Proulx In memory of Robert G. Jones In memory of Jim Walsh In memory of Jay Kronan In memory of Beth Ann Findlen In memory of Richard L. Small, Jr. In memory of Amalia Phillips In honor of Volunteers (9) In honor of Andrew Dowgiert In memory of

Memory Management Ideally programmers want memory that is o large o fast o non volatile o and cheap Memory hierarchy o small amount of fast, expensive memory -cache o some medium-speed, medium price main memory o gigabytes of slow, cheap disk storage Memory management tasks o Allocate and de-allocate memory for processes o Keep track of used memory and by whom

So let [s make sure you understand and get the point. The hamster is well and truly, no doubt about it, are you listening to me Jenson Kilroy DEAD!!! DEAD!!! DEAD!!! DEAD!!! DEAD!!! And also, when you think about it, probably the most unlucky hamster in the entire, so far, recorded history of hamsters. What a terrible way for the poor, sweet

Screw-Pile in sand under compression loading (ignoring shaft resistance) calculated using Equation 1.5 is shown in Figure 3. The influence of submergence on the calculated ultimate capacity is also shown. The friction angle used in these calculations is the effective stress axisymmetric (triaxial compression) friction angle which is most appropriate for Screw-Piles and Helical Anchors. 8 .