Ambit - Carnegie Mellon University

2y ago
15 Views
2 Downloads
898.39 KB
41 Pages
Last View : 4d ago
Last Download : 2m ago
Upload by : Isobel Thacker
Transcription

AmbitIn-Memory Accelerator for Bulk Bitwise OperationsUsing Commodity DRAM TechnologyVivek SeshadriDonghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim,Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, Todd C. Mowry

Executive Summary Problem: Bulk bitwise operations– present in many applications, e.g., databases, search filters– existing systems are memory bandwidth limited Our Proposal: Ambit––––perform bulk bitwise operations completely inside DRAMbulk bitwise AND/OR: simultaneous activation of three rowsbulk bitwise NOT: inverters already in sense amplifiersless than 1% area overhead over existing DRAM chips Results compared to state-of-the-art baseline– average across seven bulk bitwise operations 32X performance improvement, 35X energy reduction– 3X-7X performance for real-world data-intensive applications2

BitWeaving(database queries)Bitmap indices(database indexing)BitFunnelSet operationsBulk BitwiseOperations(web search)DNAsequence mappingEncryption algorithms[1] Li and Patel, BitWeaving, SIGMOD 2013[2] Goodwin , BitFunnel, SIGIR 2017.3

Today, DRAM is just a storage device!WriteProcessor(CPU, GPU,FPGA)ChannelDRAMReadThroughput of bulk bitwise operationslimited by available memory bandwidth4

Our ApproachProcessor(CPU, GPU,FPGA or PiM)ChannelDRAMUse analog operation of DRAM to performbitwise operations completely inside memory!5

Outline of the talk1. DRAM Background2. Ambit-AND-OR: Bitwise AND/OR in DRAM3. Ambit-NOT: Bitwise NOT in DRAM4. Ambit Implementation5. Applications and Evaluation6

Inside a DRAM Chip Sense amplifiers 2D Arrayof DRAM Cells 8KB7

DRAM Cell enseenable Ampbitline8

DRAM Cell Operationraisewordlinedeviation inbitline voltagewordline10V½DDVDD δbitlinecapacitorcell regainscell loseschargeto chargebitline01enablesense ampaccesstransistorconnects cellto bitlineSenseenable Ampbitline½ 0VDD9

Outline of the talk1. DRAM Background2. Bitwise AND/OR in DRAM3. Bitwise NOT in DRAM4. Ambit Implementation5. Applications and Evaluation10

Triple-Row Activation: Majority Function10activateall threerowsV½DDVDD δ011001enablesense ampSenseAmp11

Bitwise AND/OR Using Triple-Row Activation1VDDA1B1C1SenseAmp12

Bitwise AND/OR Using Triple-Row Activation1VDDA1B1C1Output AB BC CA C (A OR B) C (A AND B)Control the value of C toperform bitwise OR or38X improvement in raw throughputbitwise AND of A and B44X reductionSensein energy consumptionfor bulk bitwiseAmpAND/OR operations13

Potential Concerns with Triple-Row Activation1. With three cells, bitline deviation may not be enough2. Process variation: all cells are not equalSpice simulations put these concerns to rest.(Section 6 in paper)3. Cells leak charge4. Memory controller may have to send three addresses5. Source data gets destroyedAddress these challenges through implementation(next slide)14

Bulk Bitwise AND/OR in DRAMStatically reserve three designated rows t1, t2, and t3Result row A AND/OR row B1. Copy data of row A to row t12. Copy data of row B to row t23. Initialize data of row t3 to 0/14. Activate rows t1/t2/t3 simultaneously5. Copy data of row t1/t2/t3 to Result rowMICRO 201315

Bulk Bitwise AND/OR in DRAMStatically reserve three designated rows t1, t2, and t3Result row A AND/OR row B1. Copy RowClonerowt1A to row t1data of rowdataA toofrow2. Copy RowClonerowt2B to row t2data of rowdataB toofrow3. Initialize RowClonerow t3 to 0/1data of rowdatat3 toof0/14. Activate rows t1/t2/t3 simultaneously5. Copy RowCloneof rowtot1/t2/t3to Result rowdata of rowdatat1/t2/t3Result rowUse RowClone to perform copy and initializationoperations completely in DRAM!16

Outline of the talk1. DRAM Background2. Bitwise AND/OR in DRAM3. Bitwise NOT in DRAM4. Ambit Implementation5. Applications and Evaluation17

Negation Using the Sense AmplifierCan we copy the negated valuefrom bitline to a DRAM cell?bitlineenableSenseAmpbitline18

Negation Using the Sense AmplifierDual Contact CellRegular wordlineNegation wordlineenablebitlineSenseAmpbitline19

Negation Using the Sense AmplifierV½DDVDD δactivate sourcesourceactivatenegation wordlineenablesense ampbitlineSenseAmpbitline½0 VDD20

Ambit vs. DDR3: Performance and EnergyPerformance Improvement706050403020100Energy Reduction32Xnotand/or35Xnand/nor xor/xnormean21

Outline of the talk1. DRAM Background2. Bitwise AND/OR in DRAM3. Bitwise NOT in DRAM4. Ambit Implementation5. Applications and Evaluation22

Ambit – Implementation 10 Pre-initialized Rows RegularData Rows10101010101010Designated Rowsfor Triple ActivationDual Contact CellsSenseAmplifiers23

10 Pre-initialized Rows RegularData Rows Regular Row DecoderAmbit – Implementation10101010101010Bitwise DecoderTemporary Rowsfor Triple ActivateDual Contact CellsSenseAmplifiers24

Integrating Ambit with the System1. PCIe device– Similar to other accelerators (e.g., GPU)2. System memory bus– Ambit uses the same DRAM command/address interfacePros and cons discussed in paper (Section 5.4)25

Outline of the talk1. DRAM Background2. Bitwise AND/OR in DRAM3. Bitwise NOT in DRAM4. Ambit Implementation5. Applications and Evaluation26

Real-world Applications Methodology (Gem5 simulator)– Processor: x86, 4 GHz, out-of-order, 64-entry instruction queue– L1 cache: 32 KB D-cache and 32 KB I-cache, LRU policy– L2 cache: 2 MB, LRU policy– Memory controller: FR-FCFS, 8 KB row size– Main memory: DDR4-2400, 1 channel, 1 rank, 8 bank Workloads– Database bitmap indices– BitWeaving –column scans using bulk bitwise operations– Set operations – comparing bitvectors with red-black trees27

Bitmap Indices: PerformanceExecution Time of Query120Baseline100Ambit6.2X8060406.6X6.3X5.7Xw 4w 26.1X5.4X200w 2w 3n 8mw 3w 4n 16mConsistent reduction in execution time. 6X on average28

Speedup offered by Ambit for BitWeavingSpeedup offered by Ambitselect count(*) where c1 field c2Number of rows in the database table1m2m4m8m1412108642012X481216202428Number of bits for each column value3229

Other Details and Results in Paper Detailed implementation of Ambit– Changes to DRAM chips– Optimizations to improve performance– Error correction codes (open problem) Detailed SPICE simulation analysis Comparison to 3D-stacked DRAM Other applications–––––Set operationsBitFunnel: Web search document filteringMasked initializationCryptographyDNA sequence mapping30

Conclusion Problem: Bulk bitwise operations– present in many applications, e.g., databases, search filters– existing systems are memory bandwidth limited Our Proposal: Ambit––––perform bulk bitwise operations completely inside DRAMbulk bitwise AND/OR: simultaneous activation of three rowsbulk bitwise NOT: inverters already in sense amplifiersless than 1% area overhead over existing DRAM chips Results compared to state-of-the-art baseline– average across seven bulk bitwise operations 32X performance improvement, 35X energy reduction– 3X-7X performance for real-world data-intensive applications31

AmbitIn-Memory Accelerator for Bulk Bitwise OperationsUsing Commodity DRAM TechnologyVivek SeshadriDonghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim,Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, Todd C. Mowry

33

Backup Slides34

Data movement consumes high energysrc: Bill Dally Keynote, “Challenge for Future Computer Systems,” HIPEAC 2015.35

RowClone: In-DRAM Bulk Data Copy (MICRO 2013)activatesource10V½DDVDD δsourceactivate 10destinationMICRO 2013destination01data gets copiedsource todestinationenablesense ampSenseAmp36

Today, DRAM is just a storage device!Write (64B)ProcessorChannelDRAMRead (64B)Can we do more with DRAM?37

10 Pre-initialized Rows RegularData Rows Regular Row DecoderAmbit – Implementation10101010101010Bitwise DecoderTemporary RowsReservedfor Triple ActivateAddress 0Dual Contact CellsSenseAmplifiers38

Summary of operations Sense amplifiers CopyANDORNOT 1.2.3.4. 39

Ambit ThroughputSkylakeGTX 745HMC 2.0AmbitAmbit-3DTHROUGHPUT (GOPS/SEC)40961024256641641notand/ornand/nor xor/xnormean40

Error Correction Code Need ECC that is homomorphic over bitwise operations– ECC(A and B) ECC(A) and ECC(B)– ECC(A or B) ECC(A) or ECC(B)– ECC(not A) not ECC(A) Triple Modular Redundancy– trivially satisfies the above condition– 2X capacity overhead– Better performance and energy efficiency– Lower overall cost41

Ambit vs. DDR3: Performance and Energy 21 0 10 20 30 40 50 60 70 not and/or nand/nor xor/xnor mean Performance Improvement Energy Reduction 32X 35X. Outline of the talk 22 1. DRAM Background 3. Bitwise NOT in DRAM 2. Bitwise AND/OR in DRAM 4. Ambit Implementation 5. Applications and Evaluation . PowerPoint Presentation Author:

Related Documents:

CMMI Appraisal Program The Software Engineering Institute is a federally funded research and development center sponsored by the U.S. Department of Defense and operated by Carnegie Mellon University CMMI, Capability Maturity Model and Carnegie Mellon are re gistered in the U.S. Patent and Trademark Office by Carnegie Mellon University

CMMI Appraisal Program The Software Engineering Institute is a federally funded research and development center sponsored by the U.S. Department of Defense and operated by Carnegie Mellon University CMMI and Carnegie Mellon are registered in the U.S. Patent and Trademark Office by Carnegie Mellon University

Ambit Asset Management PRIVATE & CONFIDENTIAL Pg 10 Page Industries Mkt Cap (Rs cr) Trailing EPS Trailing P/E (x) Apr-10 920 27 31 Apr-20 20,271 334 54 CAGR 36% 29% 6% Source: Ambit Capital Source: BSE, Ambit Capital Sensex Price and EPS are based to 100 on Jan 1991 -500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500

National Consultant, Ambit Energy Tools required for using this system: TNT Ambit Power Pitch ( 8.95) is a sizzle line voicemail system where your prospects can listen to a 3 minute message 24/7 that gets them excited and wanting more info regarding Ambit, after

MSC codes: 60G10, 60G51, 60G55, 60G57, 60G60, 91G99. 1 Introduction This paper introduces a new type of model for electricity forward prices, which is based on ambit fields and ambit processes. Ambit stochastics constitutes a general probabilistic framework which is suitable for tempo-spatial modelling.

companies like Defendant Ambit Energy 1 (called "ESCOs") have grown rapidly. 2. Founded in 2006 by Defendants Jere W. Thompson and Chris Chambless, Ambit Energy has quickly grown into one of the nation's largest independent energy suppliers. Based in Dallas, Defendant Ambit Energy now serves over 1 million electric and natural gas customers,

AMBIT ENERGY HOLDINGS, LLC, AMBIT TEXAS, LLC, AMBIT MARKETING, LLC, and AMBIT NEW YORK, LLC, JERE W. THOMPSON, and CHRIS CHAMBLESS, Defendants. Index No. 503285/2015 CLASS ACTION SETTLEMENT AGREEMENT 1. Subject to the approval of the Court, this Settlement Agreement (the "Settlement Agreement" or "Agreement") is entered into by and between: (i .

paper no.1( 2 cm x 5 cm x 0.3 mm ) and allowed to dry sera samples at 1: 500 dilution and their corresponding at room temperature away from direct sun light after filter paper extracts at two-fold serial dilutions ranging that stored in screw-capped air tight vessels at – 200C from 1: 2 up to 1: 256.