White PaperHardware DFP Accelerators:Reducing Financial Data CenterEnergy Consumption and TCOSilMinds LLC56 Misr-Helwan Rd.Badr Tower, Fl. 6, Suite 14Maadi, Helwan 11431Egyptwww.silminds.cominfo@silminds.com 20 (0)2 2753 0401Silicon for green computing
White Paper – October 2010Executive SummaryThe continuously increasing business dependence on IT services has resultedin a proportional increase of data centers number, scale, and server density.In particular, financial and other monetary related applications arecharacterized by large percentages of intense decimal floating pointcomputations, which are bound by stringent regulatory precisionrequirements. These applications typically utilize a software layer to handlethe required floating point precision and the conversions between binaryhardware and decimal input and output data, resulting in excessiveprocessing delay.Coprocessor decimal floating point hardware acceleration is a flexible andinexpensive approach that enables direct processing in decimal format. DFPoverall computation time speedup close to 20 30 times can be achieved,depending on application profile and accelerator’s dimensioning. With up to90% of many financial applications’ time spent on performing DFParithmetic, the savings could be very significant. Savings result from thereduction of the time needed to accomplish the DFP computation workloadson the hardware accelerator in comparison to the time needed toaccomplish them on the sever CPU by software.Total cost of ownership can be cut down at both capital and operationalexpenditure levels thanks to DFP arithmetic speedup. Proportional cut downof consumed energy, and hence complying with regulatory requirements forcontribution to harmful gas emissions, is equally achieved. These reachablecut downs imply relaxed hardware dimensioning, infrastructure, and realestate; as well as reduction in operational costs related to energyconsumption and service management. The result is more economic datacenter operation, opening doors for perhaps previously unforeseen businessopportunities.ContentsFinancial data centers—requirements & opportunities3Enhancing application (DFP) performance4Hardware DFP acceleration6Application centric DFP acceleration7Energy reduction evaluation10TCO reduction evaluation16Conclusion23References24Hardware DFP Accelerators:Reducing Financial Data Center Energy Consumption and TCOP. 2
White Paper – October 2010Financial data centers—requirements & opportunitiesBusiness and government dependence on networked IT applications is growingat an unprecedented rate. IT service providers, whether external (commercial)or internal to a specific organization, are shaping up as autonomous operationalentities conforming to provider-customer service management standards. Thesesynergies have resulted in a substantial global increase in data centers’ number,scale, and server density; clearly indicating a need to manage regulatorycompliance, capital and operational cost, energy spending, and applicationperformance.Regulatory organizations (such as Sarbanes Oxely in the U.S.) impose auditoryrequirements to ensure corporations’ financial integrity that include, amongseveral other aspects, financial accuracy. Centers running financial andmonetary related applications such as banks, insurance companies, stock andcurrency exchanges, and telecoms billing departments are subject to stringentaccuracy requirements; most relevantly in relation to decimal (natural) numberfractions.Financial applications are indeed characterized by intense Decimal Floating Point(DFP) calculations that may use up to more than 90% of the overall dataprocessing capacity. Since binary number processing is well known of itsinherent inability to precisely represent decimal fractions; software applicationcomponents have been developed to process numbers in decimal format. Thishas typically resulted in a performance penalty in terms of processing time andserver utilization.To address the DFP accuracy requirements and maintain feasible processingperformance, hardware DFP coprocessors have been developed andimplemented. Further to resolving the accuracy issue and alleviating the decimalprocessing and I/O bottlenecks, these coprocessors can be properly optimized tospeed up the DFP operations execution time. This economic performanceimprovement possibility has created opportunities for data center operators todeliver higher “accuracy compliant” processing throughput and thus improveservers’ effective utilization.Higher server DFP throughput, achieved through hardware computationspeedup, implies a reduction of the number of servers needed to carry the sameworkload without compromising performance and dependability. The higher thepercentage of DFP arithmetic operations within the financial application, thebetter the chances are to cut down on the number of servers; saving on bothenergy consumption and total cost of ownership (TCO).Hardware DFP Accelerators:Reducing Financial Data Center Energy Consumption and TCOP. 3
White Paper – October 2010Achieving suitable DFP execution speedup therefore creates better businessopportunities for financial and telecoms billing data centers and supports theglobal effort of reducing the environmentally harmful gas emissions resultingfrom coal, oil, and gas operated power stations. From server software andhardware vendors’ perspective, hardware acceleration performance gainsenhance the economic competitiveness of their products. This is particularlytrue when accelerators can be customized to match applications’ performanceaccording to their known characteristic or benchmarked DFP arithmeticoperations profile.This paper provides a comprehensive quantitative analysis demonstrating thepotential energy consumption and TCO savings thanks to the computationspeedup introduced by deploying DFP hardware accelerators. The analysishighlights the threefold benefit of deploying accelerators: accuracy compliance,performance gains, and cost reduction.Enhancing application (DFP) performanceImproving application performance is a continual development process thatinvolves hardware, software, and server systems level innovation. There is not onespecific approach that is known to accomplish certain level of performance gain forsuch a wide range of applications.Software performance improvement mainly rely on more efficient algorithms,middleware, and backend implementations. Hardware performance is typicallyenhanced through operating at higher “realizable” clock rates and deploying moreprocessor parallelism and pipelining. Server level performance has been mainlyassociated with virtualization and multiprocessing. The following treatment to theperformance enhancement approaches is within the context of supporting DFPwhile improving processing performance.Software solutionsDFP support has been introduced through a number of software packages likeJava’s BigDecimal, IBM C/C DecNumber, SAP NetWeaver DECFLOAT data type,and Intel DFP Math library (Anderson, 2009; Hartman, 2007). The main purpose ofadding DFP software packages was handling numbers in decimal format to achievethe required DFP precision. Integrating built-in decimal data types in business andfinancial applications features a flexible generic solution to the problem of binaryfloating point inaccuracy. However, software DFP is still much slower than binaryfloating point computation and may result in an inevitable application slow down.Hardware DFP Accelerators:Reducing Financial Data Center Energy Consumption and TCOP. 4
White Paper – October 2010Server-level solutions Systemsarchitecture techniquessuch as virtualization,multiprocessing, andserver consolidationserve a number ofgeneric data centercomputational andoperational issues Server efficiency may be loosely defined in our context as the relative amount ofdata that can be processed by one or more applications, while maintaining speedand accuracy requirements. From a systems perspective, effective serverutilization can be improved by one (or both) of the following approaches: 1) Higherdegree of processor sharing (virtualization); and/or 2) Economy of scale(multiprocessing, consolidation).Virtualization is the process of running multiple virtual machines (guest operatingsystems) over the same server platform (host hardware and operating system).They may or may notVirtualization improves server utilization in case the server is initially underutilized;support DFP; and theysomething that can be measured by monitoring the activities of system andaddress a wider rangeapplications processes. There are other drivers to implementing virtualization thanof objectives thanfocused and application improving server utilization; such as business continuity, prototyping and testing,and security. Virtualization could significantly improve the processing efficiency oftuned DFP coa server that is not heavily loaded, but it is not optimized to achieve that solely.processing andacceleration solutions.Recently, IBM released its enterprise class server System z9, powered by the novelPower6 processor, which incorporates DFP acceleration in compliance to IEEE 7542008 standard. System z9 is a high-end multiprocessor, packed with hardwareenhancements targeting a wide range of objectives and applications (Duale, 2007).The system ran a power benchmarking study, driven by a Java workload, whichshowed that the system power usage scaled very efficiently with increasing theworkload with same number of processors and also when increasing the numberof processors from 4-8 to 12-16 units. The metric IBM used for this study was thepercent transactions per busy KW, which decreased by 10% (Stahl, 2007).There was no mention in this study of the contribution of incorporating processorlevel DFP to reducing used power. The study did not target specific applications orcomputational patterns. The focus of the study was to demonstrate that generalpurpose multiprocessing boosts performance with a fair impact on the averageper-processor energy consumption.Hardware solutionsWell known approaches to improve general purpose hardware efficiency involveparallelism and pipelining at different hardware levels, caching performanceenhancement, alleviating I/O bottlenecks, and/or deploying special purposecoprocessors (such as the abundant graphics and video coprocessors).The use of special purpose DFP coprocessors may work with any of the aboveapproaches, adding the value of satisfying floating point precision requirements. Aserver system whose performance and/or utilization are enhanced through any ofthese approaches would still reap full benefits of specialized, configurable,Hardware DFP Accelerators:Reducing Financial Data Center Energy Consumption and TCOP. 5
White Paper – October 2010application-specific coprocessor accelerators. This is especially true given the easeand flexibility of deploying FPGA based accelerators which have a standard PCIeinterface.Hardware DFP acceleration Hardware DFPacceleration does notonly speed up financialand other monetaryapplications, but it alsoallows flexible andconfigurable sizing tomatch application DFParithmetic operationsprofile.Therefore optimizingperformance andenergy reduction byrealizing speedupwhere it is mostlyneeded.All monetary related applications require precision that is hindered by binaryfloating point number representation. The later is handled by conversion andmanipulation software that results in increasing the data center’s IT workload andconsequently utilizing more servers and consuming more electric power. DFPprecision has been the main motive driving the development of decimal computerarithmetic, which was standardized in IEEE 754-2008 and implemented in bothsoftware and hardware, as summarized in previous section.DFP arithmetic co-processing mainly contributes to reducing energy consumptionand overall capital and operational cost through the acceleration speedup thatenables reducing the number of servers. There are also few more specific factors: The energy consumption of an FPGA-based hardware DFP accelerator is lower thanthat of the processor performing an equivalent operation.Eliminating the software layer necessary for the mapping the application’s “natural”decimal number representation and the processor’ “inherent” binaryrepresentation speeds up execution time and improves server utilization.Re-configurability of FPGA-based accelerators enables optimizing the arithmeticcoprocessor core units sizing to match the application’s known DFP arithmeticoperations profile.Main operations in financial applicationsA study by IBM using an industry recognized decimal Telco billing benchmarkshowed that over 86% of execution time is spent in just four basic decimaloperations: addition, multiplication, rescaling, and number packing. Corefinancial computations demonstrate comparable percentage of decimaloperations times (Shulte, 2008).Further, benchmarks performed by Wisconsin University Michael Schulte’s group(Wang et Al., 2009) reported the following percentages of execution times for anumber of financial and telecoms billing applications (numbers are rounded tonearest whole percent integer):Hardware DFP Accelerators:Reducing Financial Data Center Energy Consumption and TCOP. 6
White Paper – October 2010 Banking: 75% of execution time is spent in DFP; with 43% spent in division andmultiplication, and the rest in addition or other adder-supported operationsEuro conversion: 93% of execution time is spent in DFP; with 63% spent in divisionand multiplication, and the rest in addition or other adder-supported operationsRisk assessment: 85% of execution time is spent in DFP; with 27% spent in divisionand multiplication, and the rest in addition or other adder-supported operationsTelco benchmark: 79% of execution time is spent in DFP; with 28% spent inmultiplication, and the rest in addition or other adder-supported operationsThe percentages reported above indicate the following: These four main DFP-intensive financial applications average 83% of DFPexecution time with a standard deviation off this percentage of less than 8%.Applications do approach, and may exceed, the 90% DFP mark.There is no generic acceleration solution for all DFP-intense financialapplications. Some are heavy on multiplication and division (or one of the twooperation types). Addition, and other operations that can be supported by ahardware decimal adder, constitute one to two thirds of the overall DFPexecution percentages.The above demonstrates that a reconfigurable implementation of hardware DFPcoprocessor accelerator is most fitted to optimize the performance-costtradeoff.Application centric DFP accelerationSilMinds was first to demonstrate that hardware DFP acceleration solutions,based on the IEEE 754-2008 standard, not only eliminate the floating pointaccuracy issue but also slash the total cost and the amount of consumed energyin the process, potentially by orders of magnitude.While a high end system such as IBM’s z9 offers a ready solution with potentiallylower overall data center energy consumption, it comes at a significantly highcost and offers little flexibility of adoption into existing datacenter’sinfrastructure. Hardware DFP accelerators on the other hand offer manyadvantages : Easy adoption into existing serversStrict DFP precision with multiple rounding modesBoosted DFP computational speedOverall energy conservationHardware DFP Accelerators:Reducing Financial Data Center Energy Consumption and TCOP. 7
White Paper – October 2010FPGA-based DFP acceleratorsHardware decimal arithmetic can be realized at processor, coprocessor, oraccelerator card level. Several architectures provided limited support for fixedpoint only, such as Intel x86, IBM PowerPC, HP PA-RISC, and Motorola 68x. Morerecently, IBM system z9 processor has incorporated instruction-level DFPsupport.SilMinds has pioneered the technology innovation to be first to produce DFPintellectual property (IP) cores and FPGA-based accelerator solutions for eitherin-socket or PCIe interface plug-in.Industry trends favor FPGA based accelerator products, over ASIC integration,due to their flexible nature that allows degrees of freedom for customizing,sizing, and testing of DFP functional units to meet target application demands.Quoting Industry Analyst Clive Maxfield’s article Binary Coded Decimal (BCD) 101- Part 1,” EE Times, 2007:“The decimal-encoded formats and arithmetic described in the IEEE standard forfloating point arithmetic is shipped in IBM's Power6 processor. But the mostinteresting arena may well be the creation of FPGA-based decimal arithmeticcoprocessors to serve the financial and commercial markets.”Maxfield added later in his article:“A standard for decimal arithmetic is available; also a variety of hardwareplatforms and a selection of robust tools are available. The only thing missing is alibrary of decimal arithmetic IP cores that can be accessed by the various designand development tools.”The FPGA solution can easily accommodate prototyping, field testing, anddeployment of custom configured and dimensioned DFP core units, based on theapplication’s decimal computing profile. FPGA units run at much lower clock ratethan the main microprocessor and consume a smaller amount of the energy thatthe processor would consume for an equivalent DFP workload. The loweraccelerator clock rate, especially with respect to main processor clock, does notimply an execution bottleneck due to the effective core unit pipelining andpotential parallelism.SilMinds FPGA-based accelerator benchmarksThis section describes a benchmark conducted by SilMinds to assess the potentialspeedup factor and power reduction that can be achieved by adopting hardwareDFP accelerators.Hardware DFP Accelerators:Reducing Financial Data Center Energy Consumption and TCOP. 8
White Paper – October 2010Telco BenchmarkThe Telco benchmark was run on a Linux machine based on a 64-bit, 2.8 GHz, AMDAthlon II X2, and 2GB of RAM. The PCIe based DFP hardware accelerator card wasinstalled on the machine PCIe slot. The benchmark was run in both the normal andthe hardware accelerated modes. The results showed that the DFP computationtime speedup due to hardware acceleration is in the order of 80 times comparedto a software decimal solution (overall runtime speed up of about 6 times). Thespeedup factor, shown in Figure 1, depends on the application workload, theaccelerator design, and the server hardware configuration.This benchmark illustrates the enormous application speedup promised byhardware DFP accelerators and the dependence on the application decimalcomputations profile. FPGA-based accelerators offer an excellent means fortesting and dimensioning coprocessor designs to maximize the speedup and/orthe energy reduction benefit.Figure 1: Computation speedup thanks to deploying SilMinds FPGA acceleratorHardware DFP Accelerators:Reducing Financial Data Center Energy Consumption and TCOP. 9
White Paper – October 2010Energy reduction evaluationA large financial data center may be comprised of tens or hundreds of serverracks. Each containing between 20 and 128 servers, depending on the serverpackaging configuration (2U, 1U, or blade). 2U and 1U servers each has their ownenclosures with power supply and fan. Multiple blade server cards are typicallypackaged within a common enclosure, sharing components such as power supply,fans, and network cards. The industry trend is leaning towards deploying higherdensity server packaging (Patterson, 2007).The energy consumption issue is beyond what is used by electronic componentsto function. Much of the energy consumed is turned into heat, requiring furtherlocalized, enclosure/rack wide, and center wide cooling. The energy consumed bycooling and power supply losses is equivalent to the amount of energy that iseffectively used for IT functions–about half of the total energy (Rasmussen, 2010).Figure 2 belo
Hardware DFP Accelerators: Reducing Financial Data Center Energy Consumption and TCO P. 6 application-specific coprocessor accelerators. This is especially true given the ease and flexibility of deploying FPGA based accelerators which have a standard PCIe interface. Har
Engine Systems Magnetic Fusion Systems Thermal Fuel Plasma DFP CSR ARC Figure 1 — Isp vs. [F] for QED(ARC/CSR), DFP and Other “Advanced” Space Propulsion System Concepts This paper presents a discussion of engineering issues involved in the DFP engine subsystems and system con cept, and of the performance potential of such systems.
CHAPTER 1 Accelerators Use of Accelerators Quite simply, accelerators give high energy to subatomic particles, which then col- . high-energy and nuclear physics, synchrotro n radiation research, medical therapies, and some industrial applications. The accel erator at SLAC is an electron accelera- . Par
IBM provides two types of accelerators for big data (see Figure 1): 1. Analytic accelerators . address specific data types or operations with advanced analytics, such as text analytics and geospatial data. 2. Application accelerators. address specific use cases, such as log analysis and social media insi
Particle accelerators, such as linear accelerator (LINAC) and cyclotron systems, increase the kinetic energy of particles for use in a variety of applications, ranging from scientific studies on particle physics to radiation therapy for cancer patients. Particle accelerators, like most sensitive medical and laboratory
The SDx IDE lets you customize a target platform with application-specific hardware accelerators, and data motion networks connecting accelerators to the platform. A simplified Zynq and DDR configuration with memory access ports and hardware accelerators is s
HVDCP QC3.0/QC2.0 (Quick Charge) Class A&B, FCP (Hisilicon Fast Charge Protocol), AFC (Samsung . Samsung 2.0A. IP2707 integrated USB Type-C DRP port controller, which can be configured as DFP, UFP or DRP mode, and can co-work with Type-C UFP/DFP/DRP devices. IP2707 support two USB ports charging control, in which DP1, DM1 support .
- HARDWARE USER MANUAL - MANUEL DE L'UTILISATEUR HARDWARE . - HARDWAREHANDLEIDING - MANUALE D'USO HARDWARE - MANUAL DEL USUARIO DEL HARDWARE - MANUAL DO UTILIZADOR DO HARDWARE . - 取扱説明書 - 硬件用户手册. 1/18 Compatible: PC Hardware User Manual . 2/18 U.S. Air Force A -10C attack aircraft HOTAS (**) (Hands On Throttle And .
BEVERAGES. Si avvisa la gentile clientela che per qualsiasi informazione sulla presenza di sostanze che possono provocare allergie ed intolleranze è possibile consultare l’apposita documentazione che verrà fornita, a richiesta, dal personale in servizio. Per garantire la continua presenza del nostro prodotto, alcune materie prime potrebbero essere surgelate all’origine o congelate in .