IEEE Transactions On Multimedia - Cornell University

1y ago
12 Views
2 Downloads
1.08 MB
20 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Duke Fulford
Transcription

Lightweight Arithmetic forMobile Multimedia DevicesTsuhan ChenCarnegie Mellon Universitytsuhan@cmu.eduThanks to Fang Fang and Rob RutenbarIEEE Transactions on MultimediaEDICS Signal Processing for Multimedia Applications Components and Technologies for Multimedia Systems Human Factor, Interface and Interaction Multimedia Databases and File Systems Multimedia Communication and Networking System Integration Applications Standards and Related Issues1

Multimedia Applications on Mobile Devices Multimedia Processing– More and more applications are ported from PCs tomobile devices– Floating-point computational intensive Multimedia System Development– Media designers use 32-64bit floats in C foralgorithms– ASIC designers use 10-20bit fixed-point units forhardware– Serious design disconnectMultimedia Applications on Mobile Devices Multimedia Processing– More and more applications are ported from PCs tomobile devices– Floating-point computational intensive Multimedia System Development– Media designers use 32-64bit floats in C foralgorithms– ASIC designers use 10-20bit fixed-point units inhardware– Serious design disconnect2

Fixed-Point vs. Floating-PointFloating-pointFixed points Integersfractionexpfraction8 Limited dynamic range andprecision9 Wide dynamic range & highprecision9 Small, less powerconsumption8 Big, power intensive8 From SW to HW: timeconsuming and error-prone9 Easy translation from SW toHWHow about make this lightweight?Don’t use more than necessary.What Does “Lightweight” MeansexpLightweightfraction? Less bitsActually it’s more thanthis .IEEE StandardFP Formats and opsfor ordinary numbersVery small nums:denormalsWe can work on eachdimensionDelicate rounding modes 3

Lightweight Floating-Point Arithmetic Lightweight FP arithmetic is a middle-ground solutionLightweight FPFixed-Point Better numerical features than fixed-Floating-Pointpoint Less complicated than IEEE FPAcceptable energy consumptionEasy to prototype algorithms withEasy to implement into hardwareSoftware to Hardware CyclesMedia icArithmeticC classChip hardwareLightweightLightweightFPFP OpOpSynthesizableVerilog4

Design Flow ComparisonLightweight FP DesignC FPSW LibHW LibFixed-Point DesignC FPLightweight FPFixed-pointSW simulationSW tuningPass ?Pass ?HW designHW designDesign Flow ComparisonLightweight FP DesignC FPSW LibHW LibFixed-Point DesignC FPLightweight FPFixed-pointSW simulationSW tuningPass ?Pass ?HW designHW design5

IEEE Standard vs. Lightweight IPIEEE FP StandardLightweight Arithmetic IP 32 / 64 bits Fewer bits– 8 / 11 bits exponent– 23 / 52 bits mantissa– 1 sign bit– Fewer bits of fractionÆ less numerical precision– Fewer bits of exponentÆ less dynamic range Specsnormal numbers as well asspecial values (infinity),edge cases (INF - INF), etc. Which of the specialcases/numbers should besupported?IEEE Floats vs. CMUfloatsCMUfloatsIEEE FloatsFP Formats and opsfor ordinary numbersVery small nums:denormalsDelicate rounding modes Customizable format providingvariable dynamic range and precisionFraction [1, 23],exponent width [1,8] On-off switch for denormalization Multiple choices for rounding modeReal-rounding / Jamming / Truncation6

Rounding in CMUfloat We support not only IEEE rounding, but also two “quick &dirty” modesAllowed precisionAllowed precisionb2 b1 b0b2 b1 b0Allowed precisionb2 b1 b0killIEEE Rounding(Real rounding)TruncationJammingAllowed precisionAllowed precisionSimple logic,no addersAllowed precisionFinal, rounded resultAchieves best results, butrequires costly hardwareWhat most ASIC hardwaredesigners do, for efficiencyInvented in 1940s, betterthan truncate, similar HWC CMUfloat library Supported operatorsCmufloatdoublefloatintshortCmufloat */ , , ! Cmufloatdoublefloatintshort Other supported C features PointerCmufloat * a; ReferenceCmufloat & a ; ArrayCmufloat a[10][10] ; Argument passingfunc ( Cmufloat a ) I/O streamcout a;7

C Cmufloat Library Supported operatorsCmufloatdoublefloatintshortCmufloat */ , , ! Cmufloatdoublefloatintshort Other supported C features PointerCmufloat * a; ReferenceCmufloat & a ; ArrayCmufloat a[10][10] ; Argument passingfunc ( Cmufloat a ) I/O streamcout a;Software Library: Advantages Transparent mechanism to embed ‘Cmufloat’ in the algorithm– The overall structure of the source code can be preserved– Minimal effort in translating standard FP to lightweight FPCmufloat 14,5 a 0.5; // 14 bit fraction and 5 bit exponentCmufloat b 1.5;// Default Cmufloat is IEEE floatCmufloat 18,6 c[2];// Define an arrayfloat fa;c[1] a b;fa a * b;c[2] fa b;// Assign the result to float// Operation between float and Cmfloat8

Software Library: Advantages (Cont.) Arithmetic operators are implemented by bit-levelmanipulation: more preciseOur approach:Emulates the hardware implementation exactlyPrevious approachAdd( b, c) {a’ b c;a round (a’);Built-in FP operatorRound to limited bit-width}Summary: Features Supported Bit widths– Variable from 2 bits (1 sign 1 exp 0 man) to 32 bits ( IEEE std) Rounding– Use jamming (1.00011 rounds to 1.01)– Experiments show jamming is nearly as good as full IEEE rounding,always superior to truncation, yet same complexity as truncation Denormalized numbers– Not supported--our experiments on video/audio codecs suggest thatdenormal numbers do not improve the performance Exceptions– Support only the exceptional values for infinity, zero and NAN– Helps make the smaller FP sizes more robust9

Hardware Library: ASIC Design Flow Verilog to layout flow Timing & area analysis Power analysisStandard Cells,Verilog designTechnology Logical,physical, andLibrarytiming imLAYOUT:CadenceSilicon EnsembleBasic blocks:Synopsys Integer arithmetic,DesignWare multiplexors, etc * Mux.POWER:SynopsysDesignPowerLightweight FP Adders/Multipliers Feature Supported– Bit widths:Variable from 3 bits (1 sign 1 exp 1 frac) to 32bits ( IEEE std)– Rounding:Jamming / Truncationsig n- ou tlog ics ig n o utA d d itio nN o r m a liz a tio nR o u n d in gA lig n m e n ts ign 0s ign 1 1ro un dm an tis s a 0 sw a pm an tis s a 12’sc om p le m en t n2’sc om p lem e nt nLe ad ingzerosd et ec tio nnanze roinf init ym a ntis s ao utna nzeroin fin it ye x po ne ntou t e x po ne nt0s w ap e x po ne nt1 o ve rflo wa llon esun de rf lowa llon esS p ec ia l c aselo g ica llz er osS p ecial ca se d etectiona llz er os Design Issues– Design method– Subcomponent structures Core integer adder structure? Core shifter structure? Core integer multiplier structure?sig n 0sig n 1sig n o u tM u ltip licatio nm a n tissa 0m a n tissa 1N o rm a lizatio nR ou n din gxro u n d 1e x p o n e n t0b ia s- allo n esallz er osm a n tis saoutnanz e roin fin ityexponentoutoverflo wallo n esallz er osnanz e roin fin ity e x p o n e n t1S p e c ia l c a s elo g icun derflo wSp ecial case detec tion10

Floating Pt AdderBlue modules have large area and / or delaysign-outlogicsign 1 1roundmantissa0 swapmantissa1 n2’scomplement tymantissaoutnanzeroinfinityexponentout exponent0swap exponent1 overflowallonesunderflowallonesallzerosSpecial caselogicSpecial case detectionallzerosFloating Pt MultiplierBlue modules have large area and / or delaysign0sign1sign tissa1xround 1exponent0bias llonesallzerosmantissaout exponent1nanzeroinfinitySpecial caselogicunderflowSpecial case detectionWe see that the multiplier has less ‘over-head’ than the adder11

Design Examples: Adders32-bit Floating-point14-bit Floating-pt20-bit Fixed-pt2Area( um ) - post layoutDelay(ns) - post synthesys32-bit FP20-bit FIX14-bit FP2663448.9548662.441009625.77Design Examples: Multipliers32-bit Floating-point20-bit Fixed-point14-bitFloating-pt2Area( um ) - post layoutDelay(ns) - post synthesis32-bit FP20-bit FIX14-bit FP6071324.144073822.82885115.8912

Power Analysis IDCT in– 32-bit IEEE FP– 15-bit radix-16 lightweight FP– Fixed-point implementation 12-bit accuracy for constants Widest bit-width is 24 in the whole algorithm (not fine E FP9268101111360Lightweight a Encoding/DecodingEncoding of MediaVideo cameraWe can choose howaccurately we wish todecode the dataorEncoder101110010.Playback of MediaDecoderUncompressedmultimedia fileEncoded bitstream13

Video CodecIDCT requires floating point,and has an IEEE quality spec(1180-1990) that requirescomparison against a 64-bitIEEE double implementation H.261/263, MPEG-1/2/4, and even JPEG 8-bitFloatingpoint9-bit loatingpointIDCTIDCT9-bit9-bit MotionComp DMotionCompD8-bitVideo Quality vs. Bit-width Use PSNR (Peak-Signal-to-Noise) to measure perceptual video qualityQiTestvideoPiProposed CodecCMUfloatMeasurePSNR(Noise Qi – Pi ) CMUfloat can go very small, 14bits( 5 exponent 8 fraction 1 sign bits 14 total bits )3937PSNR(dB)35232119151711 1315139117957335Yellow pts show wherePSNR decreases by 0.2dBfrom asymptotic value3117 19 21 23 Fraction-width(bit)14

Rounding Modes Compare 3 rounding modes using IDCT video streamsComparison of Rounding MethodsPSNR(dB)Jamming is nearly as good asreal rounding in precision,but as simple as truncation 10 11 12 13 14 15 16 17Fraction-width(bit)Video Demo– IEEE double vs. variable-precision CMUfloatsDecoded with 14-bit“lightweight” IDCTDecoded with 64-bit“double” IDCTDecoded with 11-bit“lightweight” IDCT15

Hardware Reduction Using Lightweight FP Comparison in Area/Delay/Power– 32-bit IEEE FP IDCT / 14-bit lightweight FP IDCT with Jammingrounding / 20-bit fixed point IDCTIEEE FPLightweight solution Display Media software commonly done in full precision (32–64 bits)– Why do this if the display cannot handle it?– On a portable video player:Displayimage mDecoded ImageThis is really inefficientDecoded Image on2-bit B&W display Can’t we do better than this, with smarter operators?16

Low-Resolution Display (cont.)0 255EncoderBitstreamDecoderSimplest (dumbest):just encode/decode as usual,let the display “figure it out”0 255Better:decode with just enough precisionso a quantizer can retrieve right2-bit pixel values0 2550 255Decoderquantizer0 2550 2550 255EncoderDecoderquantizerquantizerBest:Encode and decode withmin precision needed soquantizer gets right pixels Results– Simplest: needs 20-bit lightweight floats to work– Better: needs 16-bit lightweight floats; even just 11-bits looks decent– Best: needs just 9-bit floats (4 fraction bits) to work just fine.Video Demo Full Precision (64 bit) Using 23 bits (IEEE 1180 passed) Using 11 bits (IEEE 1180 failed)17

How About Audio? MPEG-1/2 Layer 3 (MP3)MP3BitstreamFilterBank(MDCT).wav AnalysisDCTFilterBank(IMDCT)Joint StereoCodingDeMuxMuxScale&QuantizerSynthesis .wavIDCTJoint StereoDecodingScale&IQuantizerUse CMUfloatPerceptualModelModelPerceptual No standard tests for qualityAudio Quality Need to rely on subjective testing on perceptual quality– Mean Opinion Score (MOS) From 5 “imperceptible difference” to 1 “really annoying” Results– 8 subjects. 6-bit exponent and 3 7 bit fraction54MOS3Music 1 (Classic)Music 2 (Pop)2102345678# of Fraction Bits18

Conclusion Tradeoff between the “lightweight FP” and the “fixed-point”Hardware costDesign timeNumericalperformanceLighweight FPFixed-pointOngoing Work : Automatic Design FlowStandard C FP algorithmmain( ) {double x,y;x 2*x y; }ExhaustiveBit-widthsearchoptimizationfor bit-widthengineCMUfloatC classFP arithmeticVerilog librarymain(main() {) {CMUfloat x,y;CMUfloatx,y;x 2*x y;x2*x y; C lightweight FP algorithm withoptimal bit-width}Lightweight FP hardware design19

Recap Accomplishments– C lightweight FP arithmetic library– Verilog lightweight FP arithmetic library– Extensive experiments on video/audio/speech Is the lightweight FP solution universal?– No, tradeoff between fixed-point solution and lightweight FP solution Ongoing work– Automatic design flow Important for multimedia on low-power mobile devicesAdvanced Multimedia Processing LabPlease visit us at:http://amp.ece.cmu.edu20

IEEE Standard vs. Lightweight IP IEEE FP Standard 32 / 64 bits - 8 / 11 bits exponent - 23 / 52 bits mantissa - 1 sign bit Specs normal numbers as well as special values (infinity), edge cases (INF - INF), etc. Lightweight Arithmetic IP Fewer bits - Fewer bits of fraction Æless numerical precision - Fewer bits of exponent Æless .

Related Documents:

IEEE 3 Park Avenue New York, NY 10016-5997 USA 28 December 2012 IEEE Power and Energy Society IEEE Std 81 -2012 (Revision of IEEE Std 81-1983) Authorized licensed use limited to: Australian National University. Downloaded on July 27,2018 at 14:57:43 UTC from IEEE Xplore. Restrictions apply.File Size: 2MBPage Count: 86Explore furtherIEEE 81-2012 - IEEE Guide for Measuring Earth Resistivity .standards.ieee.org81-2012 - IEEE Guide for Measuring Earth Resistivity .ieeexplore.ieee.orgAn Overview Of The IEEE Standard 81 Fall-Of-Potential .www.agiusa.com(PDF) IEEE Std 80-2000 IEEE Guide for Safety in AC .www.academia.eduTesting and Evaluation of Grounding . - IEEE Web Hostingwww.ewh.ieee.orgRecommended to you b

Signal Processing, IEEE Transactions on IEEE Trans. Signal Process. IEEE Trans. Acoust., Speech, Signal Process.*(1975-1990) IEEE Trans. Audio Electroacoust.* (until 1974) Smart Grid, IEEE Transactions on IEEE Trans. Smart Grid Software Engineering, IEEE Transactions on IEEE Trans. Softw. Eng.

Project Report Yi Li Cornell University yl2326@cornell.edu Rudhir Gupta Cornell University rg495@cornell.edu Yoshiyuki Nagasaki Cornell University yn253@cornell.edu Tianhe Zhang Cornell University tz249@cornell.edu Abstract—For our project, we decided to experiment, desig

IEEE TRANSACTIONS ON IMAGE PROCESSING, TO APPEAR 1 Quality-Aware Images Zhou Wang, Member, IEEE, Guixing Wu, Student Member, IEEE, Hamid R. Sheikh, Member, IEEE, Eero P. Simoncelli, Senior Member, IEEE, En-Hui Yang, Senior Member, IEEE, and Alan C. Bovik, Fellow, IEEE Abstract— We propose the concept of quality-aware image, in which certain extracted features of the original (high-

IEEE Robotics and Automation Society IEEE Signal Processing Society IEEE Society on Social Implications of Technology IEEE Solid-State Circuits Society IEEE Systems, Man, and Cybernetics Society . IEEE Communications Standards Magazine IEEE Journal of Electromagnetics, RF and Microwaves in Medicine and Biology IEEE Transactions on Emerging .

Standards IEEE 802.1D-2004 for Spanning Tree Protocol IEEE 802.1p for Class of Service IEEE 802.1Q for VLAN Tagging IEEE 802.1s for Multiple Spanning Tree Protocol IEEE 802.1w for Rapid Spanning Tree Protocol IEEE 802.1X for authentication IEEE 802.3 for 10BaseT IEEE 802.3ab for 1000BaseT(X) IEEE 802.3ad for Port Trunk with LACP IEEE 802.3u for .

Introduction to Multimedia (continued) Multimedia becomes interactive multimedia when a user is given the option of controlling the elements. Interactive multimedia is called hypermedia when a user is provided a structure of linked elements for navigation. Multimedia developers develop multimedia projects.

Learn the phases involved in multimedia planning, design and production; Be able to use various multimedia authoring tools Be able to design and create interactive multimedia products Develop competencies in designing and producing instruction-al multimedia Apply contemporary theories of multimedia learning to the development of multimedia .