An Energy Scalable Computational Array For Energy Harvesting Sensor .

4m ago
4 Views
1 Downloads
9.02 MB
36 Pages
Last View : 21d ago
Last Download : 3m ago
Upload by : Laura Ramon
Transcription

An Energy Scalable Computational Array for Energy Harvesting Sensor Signal Processing Rajeevan Amirtharajah University of California, Davis

Energy Scavenging Wireless Sensor 1 Extend sensor node lifetime beyond battery limitation Scavenging energy from light, heat, and vibrations 1 Cope with the variability of the harvested power Energy scalable approximate signal processing E-Workshop - February 22, 2007 2

Sensor Data Processing Subsystem Bridge Sensor to RF Microcontroller – – – – Sensor calibration DSP configuration High active power Low duty cycle DSP Coprocessor – Continuous sensor data processing (e.g., event detection) – High duty cycle – Ultra low active power SWNT or SiNW SWNT or SiNW Microcontroller SWNT or SiNW SWNT or SiNW A/D Converter E-Workshop - February 22, 2007 ctrl data DSP Coprocessor 3

Outline Introduction Serial Computation and Energy Scalability Energy Scalable Computational Array Based on Distributed Arithmetic On-Chip Low Power PWM Interconnect Conclusions and Future Work E-Workshop - February 22, 2007 4

Extending Sensor Node Lifetime Energy Config 1 Power Config 0 Config 2 Output Input Controller Config 1 Config 0 Config 2 Quality Controller chooses appropriate configuration based on input, available energy, desired output quality Power awareness leads to 60-200% battery lifetime improvement (Bhardwaj TVLSI01) E-Workshop - February 22, 2007 5

Power Tradeoffs of Bit Serial Arithmetic Serial Adder: Cin Bi 1 Parallel Adder: Ai 1 Bi Ai Bi 1 Ai 1 Cin ,i Cout Ai Bi Si S i 1 Co ,i Si Si 1 Serial approach uses less area, less interconnect capacitance, fewer devices imply less leakage Must clock serial implementation at higher frequency for same throughput, possibly higher supply voltage Tradeoff between dynamic power and static power E-Workshop - February 22, 2007 6

Serial vs. Parallel Multiplier Power Power vs. Throughput for Multipliers 1.E-04 Power (W) Serial (130nm) Serial (100nm) 1.E-05 Serial (70nm) Parallel (130nm) 1.E-06 Parallel (100nm) Parallel (70nm) 1.E-07 1.E 03 1.E 04 1.E 05 1.E 06 1.E 07 Throughput (Hz) At low throughputs, lower leakage of smaller serial implementation results in less total power E-Workshop - February 22, 2007 7

Outline Introduction Serial Computation and Energy Scalability Energy Scalable Computational Array Based on Distributed Arithmetic On-Chip Low Power PWM Interconnect Conclusions and Future Work E-Workshop - February 22, 2007 8

Simple Distributed Arithmetic Lookup table-based implementation of vector dot product Bit-serial, word-parallel # cycles input data bit width E-Workshop - February 22, 2007 9

Reduced Memory Distributed Arithmetic Recoded address bits reduces LUT size by 1/2 E-Workshop - February 22, 2007 10

Enhanced DA Unit for Signal Processing E-Workshop - February 22, 2007 11

Scalability Enhanced DA Operations Operation # of Cycles vector dot product 1-16 distributed arithmetic serial multiply 2-32 shift add division 21-36 subtract shift Algorithm nonrestoring algorithm: square root 12-26 piecewise linear (log) 21-36 add/sub 1 power min. 40 serial multiply add complex multiply 64-144 shift add subtract swap complex add/sub 2 add subtract shift (Li, ICCD97) lookup table linear approx. parallel addition add subtract E-Workshop - February 22, 2007 12

Multiported Register File Shift Memory Bitlines in both directions enable serial & parallel loads and bit shifts for addressing LUT Provides bit-level energy scalability with less area and power than flip-flop implementation E-Workshop - February 22, 2007 13

Multiported Register File Cell YDO XDI XDI YDI YDI XDO XDOE X ports read and write rows Y ports read and write columns (addresses LUT) E-Workshop - February 22, 2007 14

Input Data Shifter Power Scaling Power vs. Length Pow er (µ W ) 1000.0 100.0 Registers 10.0 Memory 1.0 0.1 0 2 4 6 8 10 12 14 16 18 Length (Bits) Memory implementation is lower power for all lengths but 1 bit Decreases more slowly than register due to serialized column access E-Workshop - February 22, 2007 15

Long DA Filter Implementation (A0-A3) (A4-A7) (A8-A11) (A12-A15) DA Unit 0 DA Unit 1 DA Unit 2 DA Unit 3 y Single LUT implementation grows exponentially in filter length Long filters implemented with multiple stages of addition Flowgraph mapped onto array of enhanced DA units E-Workshop - February 22, 2007 16

Energy Scalable DA Array Architecture DA tile functional unit performs energy scalable computation for a set of linear/nonlinear functions DA Tile Low power island-style reconfigurable interconnect permits the direct realization of DSP flowgraphs Switch boxes and connection boxes implemented with full transmission gates E-Workshop - February 22, 2007 17

Energy Scalable Distributed Arithmetic Goal: Enable power vs. performance tradeoffs for low power signal processing Explore different tradeoff mechanisms with minimum overhead in area, leakage power – Variable input data bit width – Variable number of filter taps – Variable data bit width in LUTs – Variable number of iteration cycles Evaluate impact of array granularity on efficiency of implementation (area and power) Major issue is mapping applications for low power – Minimize interconnect distance – Efficient data handshaking E-Workshop - February 22, 2007 18

Dataflow Synchronization E-Workshop - February 22, 2007 19

Power Scalable FIR Filter Results 98 2.50 Power (mW) 2.00 94 1.50 92 1.00 90 88 0.50 Recognition (%) 96 86 0.00 84 0 2 4 6 8 10 12 14 16 18 Input Bit Width Simulated power and projected recognition performance for biomedical event detection application E-Workshop - February 22, 2007 20

Scaling DA Power vs. MAC Power 1.E-01 Power (W) 1.E-02 1.E-03 SRAM-based DA power Register-based DA power MAC1 w/ single-ported SRAM memory MAC1 w/ multiported SRAM memory MAC2 w/single-ported SRAM memory MAC2 w/ multiported SRAM memory 1.E-04 1.E-05 0 32 64 96 128 160 192 224 256 Filter Taps (N) E-Workshop - February 22, 2007 21

Outline Introduction Serial Computation and Energy Scalability Energy Scalable Computational Array Based on Distributed Arithmetic On-Chip Low Power PWM Interconnect Conclusions and Future Work E-Workshop - February 22, 2007 22

Low Power Interconnect Design Interconnect power must be minimized – Coarse-grained reconfigurable array has high logic to wire ratio – Low swing signaling not as effective due to overhead of generating additional supply – Attempt to minimize switched capacitance instead through wire spacing, bus activity E-Workshop - February 22, 2007 23

Interconnect Power vs. Bus Width E-Workshop - February 22, 2007 24

Serial Interconnect Power vs. Spacing E-Workshop - February 22, 2007 25

Edge Position Signaling Symbol Width T1 T2 n1δ1 n2δ2 T1 δ1 δ1 T2 δ2 δ2 Basic Idea: encode multiple bits per transition by modulating edge timing – Pulse Position Modulation (PPM) – Pulse Width Modulation (PWM) Reduces worst case power consumption over binary signaling What is circuit implementation area and power overhead? E-Workshop - February 22, 2007 26

Pulse Position – Pulse Width Modulator Pulse Position Modulation – Encode one bit by choosing one of two leading edge positions Pulse Width Modulation – Encode two bits by delaying leading edge through digitally controlled delay line All digital circuits, no static power dissipation E-Workshop - February 22, 2007 27

PPM – PWM Demodulator Pulse Position Demodulation – Compare leading edge to local clock reference Pulse Width Demodulation – Compare trailing edge to three width reference edges generated from leading edge by delay line E-Workshop - February 22, 2007 28

Digitally Controlled Delay Element CAL2 CAL1 CAL0 CAL2 CAL1 CAL0 E-Workshop - February 22, 2007 29

Volts Width Demodulation Waveforms 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 340 350 360 370 380 390 Time (ns) E-Workshop - February 22, 2007 30

20 18 16 14 12 10 8 6 4 2 0 1Mbps 16 n 22 n 32 n 45 n 65 n 90 n 10Mbps 0. 25 u 0. 18 u 0. 13 u Wire Length (mm) PPM - PWM Signaling Breakeven Length Technology Node (m) E-Workshop - February 22, 2007 31

Outline Introduction Serial Computation and Energy Scalability Energy Scalable Computational Array Based on Distributed Arithmetic On-Chip Low Power PWM Interconnect Conclusions and Future Work E-Workshop - February 22, 2007 32

Conclusions Power scalable signal processing extends lifetime of energy scavenging wireless sensor nodes Serial arithmetic techniques offer power advantages at low throughputs and bit-level energy scalability Reconfigurable arrays can exploit inherent parallelism in DSP applications to enable architecture-level energy scalability Pulse-based signaling can lower worst case power for long interconnects Microarchitecture, circuit design, and interconnect must be optimized to implement scalability while minimizing overhead E-Workshop - February 22, 2007 33

Test Chip Design Semicustom DA Tile Implementation 1 Full custom test chip implementation incorporating SRAM latchbased design and low power interconnect test circuits E-Workshop - February 22, 2007 34

Silicon Nanowire Interconnect Circuits Courtesy M. S. Islam, UC Davis Coordinate with Thrust 1 (Xue & Stan) New project to explore circuits and interconnect-centric architectures for exploiting silicon nanowires Year 1: characterize and model nanowire interconnect and explore potential advantages for energy scalable array E-Workshop - February 22, 2007 35

Acknowledgments Liping Guo Mackenzie Scott Bicky Zhou Zulfiqar Ansari Jamie Collier Prof. M. Saif Islam FCRP Interconnect Focus Center Xilinx University Program and Xilinx Research Labs E-Workshop - February 22, 2007 36

Low Power Interconnect Design Interconnect power must be minimized -Coarse-grained reconfigurable array has high logic to wire ratio -Low swing signaling not . Test Chip Design Full custom test chip implementation incorporating SRAM latch-based design and low power interconnect test circuits Semicustom DA Tile Implementation. E .

Related Documents:

theoretical framework for computational dynamics. It allows applications to meet the broad range of computational modeling needs coherently and with fast, structure-based computational algorithms. The paper describes the SOA computational ar-chitecture, the DARTS computational dynamics software, and appl

NUMPY FOR MATHEMATICAL COMPUTING 5.1 Introduction to mathematical computing in Python 5.2 What are arrays and matrices? Array indexing, array math, inspecting a NumPy array, and NumPy array manipulation Hands-on Exercise: Import a NumPy module, create an array using ND-array,

computational science basics 5 TABLE 1.2 Topics for Two Quarters (20 Weeks) of a computational Physics Course.* Computational Physics I Computational Physics II Week Topics Chapter Week Topics Chapter 1 Nonlinear ODEs 9I, II 1 Ising model, Metropolis 15I algorithm 2 Chaotic

CPU per Array CPUs, 32 2 x Intel CPUs, 12 core s per Array , 1.7 GHz 2 x dual -socket Intel core s per Array , 1.8 GHz 2 x dual -socket Intel CPUs, 48 core s per Array , 2. 1 GHz 2 x dual -socket Intel CPUs 64 2. 1 GHz System Memory /Cache per Array 128 GB 1 92 GB 384 GB 768 GB Max FAST Cache per

Return the current key and value pair from an array end Set the internal pointer to the last element of an array extract Import variables from an array into the current symbol table in_array Checks if a value exists in an array key Fetches a key from an array

PV Module Manufacturer and Model Number: Inverter Manufacturer and Model Number: Number of Modules: Number of Inverters: PV ARRAY PV Array Orientation . PV Array Tilt . Array frame is certified to AS1170.2 for installation location Array frame us installed to manufacturer’s instructions Roof penetrations are suitably sealed

Arrays and Array Lists Chapter Goals To become familiar with using arrays and array lists To learn about wrapper classes, auto-boxing and the generalized for loop To study common array algorithms Continued Chapter Goals To learn how to use two-dimensional arrays To understand when to choose array lists and

API Spec 16C - Specification for Choke and Kill Sytems Last update: December 17, 2014 16C 1st Edition Jan. 1993 9 16C-02-08 Background: Sections 9, 9.1, 9.2, and 9.3 outline the performance verification procedures. It does not specifically state that these performance verification procedures shall be done for all products covered by API 16C. In further parts of Section 9, specific performance .