C S Chapter 7- Memory System Design DA 2/e

3y ago
18 Views
2 Downloads
3.06 MB
75 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Maleah Dent
Transcription

CSDA2/eChapter 7- Memory System Design IntroductionRAM structure: Cells and ChipsMemory boards and modulesTwo-level memory hierarchyThe cacheVirtual memoryThe memory as a sub-system of the computerComputer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eIntroductionSo far, we’ve treated memory as an array of words limited insizeonly by the number of address bits. Life is seldom so easy.Real world issues arise: cost speed size power consumption volatility etc.What other issues can you think of that will influence memorydesign?Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CIn This Chapter we will cover–SDA2/e Memory components: RAM memory cells and cell arrays Static RAM–more expensive, but less complex Tree and Matrix decoders–needed for large RAM chips Dynamic RAM–less expensive, but needs “refreshing” Chip organization Timing Commercial RAM products" SDRAM and DDR RAM ROM–Read only memory Memory Boards Arrays of chips give more addresses and/or wider words 2-D and 3-D chip arrays Memory Modules Large systems can benefit by partitioning memory for separate access by system components fast access to multiple words–more–Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/e In This Chapter we will also cover–The memory hierarchy: from fast and expensive to slow and cheap Example: Registers- Cache– Main Memory- DiskAt first, consider just two adjacent levels in the hierarchyThe Cache: High speed and expensive Kinds: Direct mapped, associative, set associativeVirtual memory–makes the hierarchy transparent Translate the address from CPU’s logical address to thephysical address where the information is actually stored Memory management - how to move information back and forthMultiprogramming - what to do while we waitThe “TLB” helps in speeding the address translation processWill discuss temporal and spatial locality as basis for success ofcache and virtual memory techniques.Overall consideration of the memory as a subsystem. Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFig. 7.1 The CPU–Main Memory InterfaceSequence of events:Read:1. CPU loads MAR, issues Read, and REQUEST2. Main Memory transmits words to MDR3. Main Memory asserts COMPLETE.Write:1. CPU loads MAR and MDR, asserts Write, and REQUEST2. Value in MDR is written into address in MAR.3. Main Memory asserts COMPLETE.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eThe CPU–Main Memory Interface - cont'd.Additional points: if b w, Main Memory must make w/b b-bit transfers. some CPUs allow reading and writing of word sizes w.Example: Intel 8088: m 20, w 16,s b 8.8- and 16-bit values can be read and written If memory is sufficiently fast, or if its response is predictable,then COMPLETE may be omitted. Some systems use separate R and W lines, and omit REQUEST.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eTable 7.1 Some Memory PropertiesSymbolwmsb2m2mxsDefinitionCPU Word SizeBits in a logical memory addressBits in smallest addressable unitData Bus sizeMemory wd capacity, s-sized wdsMemory bit capacityComputer Systems Design and Architecture Second EditionIntel8088Intel8086IBM/Moto.60116bits20 bits88220220x816bits 64 bits20 bits 32 bits881664220232220x8 232x8 2004 Prentice Hall

CSDA2/eBig-Endian and Little-EndianStorageWhen data types having a word size larger than the smallestaddressable unit are stored in memory the question arises,“Is the least significant part of the word stored at thelowest address (little Endian, little end first) or–is the most significant part of the word stored at thelowest address (big Endian, big end first)”?Example: The hexadecimal 16-bit number ABCDH, stored at address 0:msbABLittle Endian10ABCDComputer Systems Design and Architecture Second Edition.lsbCDBig Endian10CDAB 2004 Prentice Hall

CSDA2/eTable 7.2 Memory PerformanceParametersSymbolDefinitionUnitstaAccess time timetcCycle timekbtlBlock sizeBandwidthLatencytbl tl k/bBlocktimeaccess timeMeaningTime to access a memory wordtimeTime from start of access to start of nextaccesswordsNumber of words per blockwords/time Word transmission ratetimeTime to access first word of a sequenceof wordsTime to access an entire block of words(Information is often stored and moved in blocks at the cache and disk level.)Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eTable 7.3 The Memory Hierarchy, Cost, n MemoryDisk MemoryAccess Random Random RandomCapacity,bytesDirect64-1024 8KB-8MB 64MB-2GB8GBTapeMemorySequential1TBLatency .4-10ns .4-20ns 10-50ns10ms10ms-10sBlocksize1 word4KB4KBBandwidthSystem System 10-4000clockClockMB/sRaterate-80MB/sCost/MB High†AsCache16 words 16 words 10 .2550MB/s 0.0021MB/s 0.01of 2003-4. They go out of date immediately.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFig. 7.3 Memory Cells - a conceptual viewRegardless of the technology, all RAM memory cells must providethese four functions: Select, DataIn, DataOut, and R/W.SelectSelectDataInDataOut DataInDataOutR/WR/WThis “static” RAM cell is unrealistic.We will discuss more practical designs later.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSD Fig. 7.4 An 8-bit register as a 1D RAM arrayA2/e The entire register is selected with one select line, and uses one R/W lineData bus is bi-directional, and buffered. (Why?)Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFig. 7.5 A 4x8 2D Memory Cell Array2-4 line decoder selects one of the four 8-bit arrays2-bitaddressR/W is commonto all.Bi-directional 8-bit buffered data busComputer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSFig. 7.6 A 64Kx1 bit static RAM (SRAM) chipDA square array fits IC design2/eparadigmSelecting rows separatelyfrom columns means only256x2 512 circuit elementsinstead of 65536 circuitelements!CS, Chip Select, allows chips in arrays tobe selected individuallyThis chip requires 21 pins including power and ground, and sowill fit in a 22 pin package.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFig 7.7 A 16Kx4 SRAM ChipThere is little differencebetween this chip andthe previous one, exceptthat there are 4, 64-1Multiplexers instead of 1,256-1 Multiplexer.This chip requires 24 pins including power and ground, and so will require a 24pin pkg. Package size and pin count can dominate chip cost.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CFig 7.8 Matrix and Tree DecodersSDAdecoders are limited in size because of gate fanin.2/e 2-levelMost technologies limit fanin to 8. When decoders must be built with fanin 8, then additional levelsof gates are required. Tree and Matrix decoders are two ways to design decoders with large fanin:3-to-8 line tree decoder constructedfrom 2-input gates.Computer Systems Design and Architecture Second Edition4-to-16 line matrix decoderconstructed from 2-input gates. 2004 Prentice Hall

CSDA2/eFig 7.9 A 6 Transistor static RAM cellThis is a more practicaldesign than the 8-gatedesign shown earlier.A value is read byprecharging the bitlines to a value 1/2way between a 0 anda 1, while asserting theword line. This allows thelatch to drive the bit linesto the value stored inthe latch.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFigs 7.10 Static RAM Read TimingAccess time from Address– the time required of the RAM array to decode theaddress and provide value to the data bus.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFigs 7.11 Static RAM Write TimingWrite time–the time the data must be held valid in order to decode address andstore value in memory cells.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CS Fig 7.12 A DynamicRAM (DRAM) CellDA2/eCapacitor willdischarge in 4-15ms.Refresh capacitor by reading(sensing) value on bit line,amplifyingacitor.Write: place value on bit lineand assert word line.Read: precharge bit line,assert word line, sense valueon bit line with sense/amp.This need to refresh thestorage cells of dynamicRAM chips complicatesDRAM system design.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CS Fig 7.13 DRAM ChiporganizationDA2/e Addresses are timemultiplexed on address bususing RAS and CAS asstrobes of rows andcolumns. CAS is normally used asthe CS function.Notice pin counts: Without address multiplexing:27 pins including power andground. With address multiplexing: 17pins including power andground.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFigs 7.14, 7.15 DRAM Read and Write cyclesTypical DRAM Read operationMemoryAddressRow AddrRASMemoryAddressCol Addrt Prechgt RASTypical DRAM Write operationCASRow AddrCol Addrt RASRASPrechgCASR/WWDataDatatAtCAccess time Cycle timeNotice that it is the bit line prechargeoperation that causes the differencebetween access time and cycle time.Computer Systems Design and Architecture Second EditiontDHRtCData hold from RAS. 2004 Prentice Hall

CSDA2/eDRAM Refresh and row access Refresh is usually accomplished by a “RAS-only” cycle. The row addressis placed on the address lines and RAS asserted. This refreshed the entire row.CAS is not asserted. The absence of a CAS phase signals the chip that arow refresh is requested, and thus no data is placed on the external data lines. Many chips use “CAS before RAS” to signal a refresh. The chip has an internalcounter, and whenever CAS is asserted before RAS, it is a signal to refresh the rowpointed to by the counter, and to increment the counter. Most DRAM vendors also supply one-chip DRAM controllers that encapsulatethe refresh and other functions. Page mode, nibble mode, and static column mode allow rapid access tothe entire row that has been read into the column latches. Video RAMS, VRAMS, clock an entire row into a shift register where it canbe rapidly read out, bit by bit, for display.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFig 7.16 A CMOS ROM Chip2-D CMOS ROM Chip V00RowDecoderAddressCS1Computer Systems Design and Architecture Second Edition010 2004 Prentice Hall

CSDA2/eTbl 7.4 Kinds of ROMROM TypeCostProgrammabilityTime to programMask programmedVeryinexpensiveAt the factoryWeeks (turn around) N/APROMInexpensiveOnce, by enduserSecondsN/AEPROMModerateMany timesSeconds20 minutesFlashEPROMExpensiveMany times100 us.1s, largeblockEEPROMVeryexpensiveMany times100 us.10 ms,byteComputer Systems Design and Architecture Second EditionTime to erase 2004 Prentice Hall

CSDA2/eMemory boards and modules There is a need for memories that are larger and wider than a single chip Chips can be organized into “boards.” Boards may not be actual, physical boards, but may consist ofstructured chip arrays present on the motherboard. A board or collection of boards make up a memory module. Memory modules: Satisfy the processor–main memory interface requirements May have DRAM refresh capability May expand the total main memory capacity May be interleaved to provide faster access to blocks of words.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFig 7.17 General structure of memory chipThis is a slightly different view of the memory chip than previous.Chip SelectsAddressmMultiple chip selects ease the assembly ofchips into chip arrays. Usually providedby an external AND rCSmR/WAddressDatasssssDataBi-directional data bus.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSFig 7.18 Word Assembly from Narrow ChipsDA2/eAll chips have common CS, R/W, and Address WAddressDataDataDatasssp sP chips expand word size from s bits to p x s bits.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CS Fig 7.19 Increasing the Number of Words by aDFactor of 2kA2/ekThe additional k address bits are used to select one of 2 chips,each one of which has 2m words:Word size remains at s bits.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFig 7.20 ChipMatrix Using TwoChip SelectsThis schemesimplifies thedecoding fromuse of a (q k)bitdecoderto using oneq-bit and onek-bit decoder.Multiple chipselect linesare used toreplace thelast level ofgates in thismatrixdecoderscheme.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CS Fig 7.21 A 3-DDRAM ArrayDA2/e CAS is used to enabletop decoder in decodertree. Use one 2-D array foreach bit. Each 2-D arrayon separate board.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFig 7.22 A Memory Module interfaceMust provide– Read and Write signals. Ready: memory is ready to accept commands. Address–to be sent with Read/Write command. Data–sent with Write or available upon Read when Ready is asserted. Module Select–needed when there is more than one module.Bus Interface:Addressk mkControl signal generator:for SRAM, just strobesdata on Read, ProvidesReady on Read/WriteControlsignalgeneratorReadWritemMemory boardsand/orchipswReadyComputer Systems Design and Architecture Second EditionregisterChip/boardselectionModuleselectFor DRAM–also providesCAS, RAS, R/W, multiplexesaddress, generates refreshsignals, and provides Ready.AddressData registerDataw 2004 Prentice Hall

CSDA2/eFig 7.23 DRAM module with refresh control.Addressk mAddressChip/boardselectionReadWritem/2m/2m/2Board andchip esh counterRefreshclock andcontrolRegisterAddresslinesDynamicRAM ArrayData lineswData registerDatawComputer Systems Design and Architecture Second Edition 2004 Prentice Hall

CS Fig 7.24 Two Kinds of Memory Module Organization.DA2/eMemory Modulesare used to allowaccess to morethan one wordsimultaneously. Scheme (a)supports filling acache line. Scheme (b) allowsmultiple processesor processors toaccess memory atonce.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CFig 7.25 Timing of Multiple Modules on aSBusDA2/e If time to transmit information over bus, tb, is module cycle time, tc,it is possible to time multiplex information transmission to severalmodules;Example: store one word of each cache line in a separate module.Main Memory Address:WordModule No.This provides successive words in successive modules.Timing:BusRead module 0AddressModule 0Write module 3Address & dataModule 0 readModule 0Data returnModule 3 writeModule 3tbtctbWith interleaving of 2k modules, and tb tb/2k, it is possible to get a 2k-foldincrease in memory bandwidth, provided memory requests are pipelined.DMA satisfies this requirement.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eMemory system performanceBreaking the memory access process into steps:For all accesses: transmission of address to memory transmission of control information to memory (R/W, Request, etc.) decoding of address by memoryFor a read: return of data from memory transmission of completion signalFor a write: Transmission of data to memory (usually simultaneous with address) storage of data into memory cells transmission of completion signalThe next slide shows the access process in more detail -Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFig 7.26 Static and dynamic RAM timing“Hidden refresh” cycle. A normal cycle would exclude thepending refresh step.-moreComputer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eExample SRAM timings (using unrealisticallylong timing)Approximate values for static RAM Read timing: Address bus drivers turn-on time: 40 ns. Bus propagation and bus skew: 10 ns. Board select decode time: 20 ns. Time to propagate select to another board: 30 ns. Chip select: 20ns.PROPAGATION TIME FOR ADDRESS AND COMMAND TO REACH CHIP: 120 ns. On-chip memory read access time: 80 ns Delay from chip to memory board data bus: 30 ns. Bus driver and propagation delay (as before): 50 ns.TOTAL MEMORY READ ACCESS TIME: 280 ns.Moral: 70ns chips to not necessarily provide 70ns access time!Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eConsidering any two adjacentlevels of the memory hierarchySome definitions:Temporal locality: the property of most programs that if a given memorylocation is referenced, it is likely to be referenced again, “soon.”Spatial locality: if a given memory location is referenced, those locationsnear it numerically are likely to be referenced “soon.”Working set: The set of memory locations referenced over a fixed period oftime, or in a time window.Notice that temporal and spatial locality both work to assure that the contentsof the working set change only slowly over execution time.Defining the Primary and Secondary levels:Faster,smallerPrimaryCPU levelSlower,largerSecondarylevel two adjacent levels in the hierarchyComputer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFigure 7.28 Temporal and Spatial Locality ExampleConsider the C for loop:for ((I 0); I n; I )A[I] 0;Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/ePrimary and secondary levelsof the memory hierarchySpeed between levels defined by latency: time to access first word, andbandwidth, the number of words per second transmitted between levels.PrimarylevelSecondarylevelTypical latencies:cache latency: a few clocksDisk latency: 100,000 clocks The item of commerce between any two levels is the block. Blocks may/will differ in size at different levels in the hierarchy.Example:Cache block size 16-64 bytes.Disk block size: 1-4 Kbytes. As working set changes, blocks are moved back/forth through thehierarchy to satisfy memory access requests. A complication: Addresses will differ depending on the level.Primary address: the address of a value in the primary level.Secondary address: the address of a value in the secondary level.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/ePrimary and secondary address examples Main memory address: unsigned integer Disk address: track number, sector number, offset of word in sector.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSFig 7.29 Addressing and Accessing a 2-Level HierarchyDTheAcomputer2/esystem, HWor SW,must performany addresstranslationthat isrequired:Two ways of forming the address: Segmentation and Paging.Paging is more common. Sometimes the two are used together,one “on top of” the other. More about address translation and paging later.Computer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eFig 7.30 Primary Address FormationComputer Systems Design and Architecture Second Edition 2004 Prentice Hall

CSDA2/eHits and misses; paging;block placementHit: the word was found at the level from which it was requested.Miss: the word was not found at the level from which it was requested.(A miss will result in a request for the block containing the word fromthe next higher level in the hierarchy.)Hit ratio (or hit rate) h number of hitstotal number of referencesMiss ratio: 1 - hit ratiotp primary memory access time. ts secondary memory access timeAccess time, ta h tp (1-h) ts.Page: commonly, a disk block.Page fault: synonymous with a miss.Demand paging: pages are moved from disk to main memory only whena word in the page is requeste

In This Chapter we will also cover– The memory hierarchy: from fast and expensive to slow and cheap Example: Registers- Cache– Main Memory- Disk At first, consider just two adjacent levels in the hierarchy The Cache: High speed and expensive Kinds: Direct mapped, associative, set associative

Related Documents:

Part One: Heir of Ash Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26 Chapter 27 Chapter 28 Chapter 29 Chapter 30 .

TO KILL A MOCKINGBIRD. Contents Dedication Epigraph Part One Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Part Two Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18. Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26

DEDICATION PART ONE Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 PART TWO Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 .

Memory -- Chapter 6 2 virtual memory, memory segmentation, paging and address translation. Introduction Memory lies at the heart of the stored-program computer (Von Neumann model) . In previous chapters, we studied the ways in which memory is accessed by various ISAs. In this chapter, we focus on memory organization or memory hierarchy systems.

In memory of Paul Laliberte In memory of Raymond Proulx In memory of Robert G. Jones In memory of Jim Walsh In memory of Jay Kronan In memory of Beth Ann Findlen In memory of Richard L. Small, Jr. In memory of Amalia Phillips In honor of Volunteers (9) In honor of Andrew Dowgiert In memory of

Memory Management Ideally programmers want memory that is o large o fast o non volatile o and cheap Memory hierarchy o small amount of fast, expensive memory -cache o some medium-speed, medium price main memory o gigabytes of slow, cheap disk storage Memory management tasks o Allocate and de-allocate memory for processes o Keep track of used memory and by whom

CMPS375 Class Notes (Chap06) Page 2 / 17 by Kuo-pao Yang 6.1 Memory 281 In this chapter we examine the various types of memory and how each is part of memory hierarchy system We then look at cache memory (a special high-speed memory) and a method that utilizes memory to its fullest by means of virtual memory implemented via paging.

Chapter 2 Memory Hierarchy Design 2 Introduction Goal: unlimited amount of memory with low latency Fast memory technology is more expensive per bit than slower memory –Use principle of locality (spatial and temporal) Solution: organize memory system into a hierarchy –Entire addressable memory space available in largest, slowest memory –Incrementally smaller and faster memories, each .