Leading Zero Anticipation And Detection -- Comparison Of .

2y ago
100 Views
4 Downloads
576.01 KB
6 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Louie Bolen
Transcription

Leading Zero Anticipation and Detection -- A Comparison of MethodsMartin S. Schmooklerl and Kevin J. Nowka2'IBM Server Development and 21BM Austin Research LaboratoryAustin, Texas USAmartins@austin.ibm.com, nowka@austin.ibm.comunits also. A Count Leading Zeros (CLZ) instruction isoften part of the fixed point instruction set, and the counting of leading digits of the divisor may be needed for somefixed point divide algorithms. Techniques that are knownfor speeding up LZDs can also be used for the encoding ofan LZA. Therefore, this paper includes brief descriptionsof two methods for efficiently obtaining a leading zerocount.The LZA can also detect the cases when the result ofaddition is all zeros. This too is a function which is usefulin both fixed point and floating point units. Therefore,some discussion of zero result predictors is included aswell.The earliest description of an LZA known to theauthors is by Kershaw, et a1 [2][3] which shows aManchester carry adder circuit with a second prechargedcircuit used for the detection of the leftmost significantdigit. It works for both leading zeros when the rcsult of asubtraction is positive, and for leading ones when theresult is negative. This basic algorithm is also used in theT9000 Transputer described by Knowles [4].An LZA described by Hvkenek and Montoye [5] alsohandles the general case of leading ones or zeros. Becausethis method is more complex and slower than efficientimplementations of the Kershaw method, wc do notdescribe this design in detail in this paper,Since then, Britton et al [6] and Suzuki et a1 [7] haveshown that a much simpler circuit can bc used when onecan m u m e that the subtraction result will bc positive.Further simplification is obtained when one also assumesthat the exponents differ by one, as shown by [81 and 191.Most of the LZAs which are described are inexact.They only examine the inputs from left to right, ignoring apossible carry from the right for each bit that it predicts tobe part of the leading string of zeros. Several papers [lo][ l l l 1121have also been published describing exact LZAswhich do take into account carries from the right, but theygcncrally result in excessive complexity and delay. However, one exact LZA [13] is described briefly because it issimple, and has delay comparablc to that of lhe adder.AbstractDesign of the leading zero anticipator ( L a ) ordetector (LZD) is pivotal to the normalization of resultsfor addition and fused multiplication-addition in highperjormance floating point processors. This paper formalizes the analysis and describes some alternativeorganizations and implementations from the known art. Itshows how choices made in the design are o f e n dependent on the overall design of the addition unit, on howsubtraction is handled when the exponents are the same,and on how it detects and corrects for the possible one-biterror of the 15%.1. IntroductionLeading zero anticipators predict the location of themost significant bit location of the result of a floatingpoint addition directly from the inputs to the adder. Thisdetermination of the leading digit position is performed inparallel with the addition step so as to enable the normalization shift to start as soon as the addition compleles.Many different solutions to the problem of designing anLZA have appeared in publications and patents. Theyhave varying degrees of complexity, and some operateonly on restricted cases. This paper describes what appearto be the simplest solutions for both the gencral and therestricted cases. It also includes a design that has not beenpreviously published except in a patent [l], but which isused in several commercial processors.The typical LZA consists of the generation of a stringof bits having approximately the same number of leadingzeros as the sum output. An LZD is then employed toencode the result. Several methods of designing the LZDare available, and the best choice oftcn depends on theadder design and on how the string of bits is created.LZDs are frequently used in fixed point arithmetic70-7695-1 150-3/01 10.00 0 2001 IEEEAuthorized licensed use limited to: University of Maryland Baltimore Cty. Downloaded on November 4, 2009 at 14:38 from IEEE Xplore. Restrictions apply.

An alternative to the exact LZA is to an error indicatorin parallel with the LZA computation. The Kershaw LZAincludes a circuit which uses the carries from the right togenerate a single error signal for the LZA which can beused to adjust the controls to the last stage of a multistagenormalizing shifter. The circuit is relatively simple, andthe error signal can be developed in parallel with the earlier stages of the normalizer. Thus, when one includes thecircuits for the error signal and adjustment of the shift controls, the result is an exact LZA.The principal concepts forcalculating this error signal are also included in this paper.The remainder of this paper describes the methods fordctecting leading digits, encoding a count of the leadingdigits, detecting a zero-value result, and correcting theerror in the inexact LZAs. We describe generalized leading digit detection and detail optimizations possible forrestricted cases.signed numbers, leading zeros may also occur with a starting sequence of Z*, and leading ones may occur with astarting sequence of G*.A starting sequence of Z* may also occur in effectiveaddition of floating point denormalized operands. If anLZA is to be used for both effective addition and subtraction, then it would be useful to prefix the sequence with aT for subtraction and a Z for addition. Also, we canappend a low order Z for an input carry of zero, and a loworder G for an input carry of one, as sometimes needed forsubtraction.2.1. Detection of first leading digit -- general caseKershaw, et a1 [21[31 recognized that each digit can beevaluated to determine if this digit can possibly be the firstleading digit by examination of this digit and its twoneighbors, one to the left and one to the right. Knowles 141formalized the solution by providing a truth table for setting an indicator 6. If the bits are numbered such that bit 0is the most significant, then, the indicator is equal lo onewhen:2. General leading digit detection and anticipationFor an arbitrary binary number, k-bits of leading zerocan be represented as the string of digits Oklx* , where thesuperscript represents k instances of the digit 0, x is eitherzero or one, and * indicates zero or more instances of thedigit x. Likewise, k-bits of leading one can be representedas lkOx* . Leading zero detection thus involves a determination of the position of the first non-zero digit, or equivalently the first transition from a zero digit i to a one digiti l. Leading one detection involves the location of the firsttransition from a one digit to an adjacent zero digit.In most of the literature, thc term Zeading zeros refersto a starting string of zeros prior to the first one, whileZeading ones refers to a starting string of ones prior to thefirst zero. However, there may be some confusion sinceseveral papers also use the term leading one predictor fordetermining the first one after a starting string of zeros.Therefore, in this paper, we avoid use of that term.Leading zeros occur when the result of a subtractionis positive, and leading ones occur when the result is negative. LZAs make use of the propagate (T), generate (G),and kill (2) functions for each bit position of the adderinputs A and B after swapping, alignment, and inversionhave taken place. These functions are defined as:If the indicator is set in position i and no other digit ofgreater significance has its indicator set, then the leadingdigit is cither i or i l.Essentially the same result appears in a recent paperby Bruguera and Lang [ 141.2.2. Separate detection of leading zeros and onesIf the detection of leading zeros and ones are doneseparately, then the indicators only need to examine bits iand i l. For the leading zeros case, the indicator, f f e r o S isequal to one, whenf e r o s O z i . 1i , O(2)If the indicator is set in position i and no other digit ofgreater significance has its indicator set, then the leadingdigit is either i or i l.T A O B , G AB, Z Likewise for the leading ones case, the indicator, f p n r s is equal to one, whenLeading zeros occur when the starting sequence hasthe pattem T*GZ*. If there are n bit positions before thefirst mismatch, then the sum will have either (n-1) or nleading zeros. Similarly, thc number of leading ones canbe found when the starting sequence is T*ZG*.For addition or subtraction with 2’s complementfpnes T i 0 Gi , i20(3)If the indicator is set in position i and no other digit ofgreater significance has its indicator set, then the leading8Authorized licensed use limited to: University of Maryland Baltimore Cty. Downloaded on November 4, 2009 at 14:38 from IEEE Xplore. Restrictions apply.

the exponents differ by one, the presumed smaller operandis shifted right one place and then inverted. Since theoperands must be normalized, the function in the first bitmust be G, and therefore the number of leading zeros isdetermined by the number of following bit positions thatare Zs. Therefore, the leading zero indicator in each following bit position is f:e'o" Zi I .digit is either i or i l.The indicators defined in equ. (2) and (3) are used inthe LZA by Schmookler and Mikan [l]. In that design, theindicators are ORed from the left to create two monotonicstrings of zeros followed by ones. The two strings are thenANDed together bit-wise to create a single monotonicstring whose first one predicts the bit position of the mostsignificant bit.2.4. An exact LZA2.3. Detection of first leading digit -- restrictedcasesA conceptually simple exact LZA described in [131 isintegrated with the adder. To handle both positive andnegative results, two separate adders are used, one assuming the first operand is larger, the other assuming the second operand is larger. The output carry from the firstadder is uscd to select the result which is positive. Eachadder includes its own LZA which is also selected.The indicators defined in equ. (2) and (3) can be simplified further when the detection is restricted to only leading zeros or leading ones. For example, when the circuitfor detection of leading zeros does not need to considercases where leading ones might result, then the leadingzero indicator can be simplified toSince each adder may assume that its result is positive, its LZA only needs to consider leading zeros. Theadder design uses carry select, so that for each group ofbits, two sets of conditional sums are generated, one setassuming input carry of zero, the other assuming inputcarry of one. The intemal carries then select the appropriate sums as they are evaluated. With each group of conditional sums, a conditional count of leading zcros isdetermined for the group. These conditional counts arethen also selected by the internal carries. This descriptionis a simplification of the actual design, which must alsotake into account the hierarchy of the adder and also generate the high order bits of the leading zero count fromlarger groups of bits.(4)a? shown by Suzuki et al 171. In that paper, a comparisonof the operands is performed to ensure that only thesmaller operand is complemented during subtraction.Other designs where this could be applied would be whereseparate adders are provided for use when the exponentsare equal. One adder calculates A-B and the other calculates B-A, and the result from the adder producing a carryout is selected. Each adder then needs only a leading zerodetector using indicators of the form shown above.An LZA based on equ. (4) is used in another recentpaper by Bruguera and Lang 1151.Another variation appears in a patent by Britton et al[6]. In this design, separate leading zero and leading onedetectors are used, and the adder output carry selectsbetween them. The leading zero detector uses indicatorsdefined as in equ. (41, and the leading one detector usesindicators as defined in cqu. ( 5 ) shown below:2.5. Comparing cost and delayIn this section, the LZA described by equ. (1) isreferred to as Kershaw, (the earliest reference), the LZAdescribed by equ. (2) and (3) is referred to as Schmookler,and the LZA described by equ. (4) and (5) is referred to asBritton.Only Kershaw and Schmookler cover both leadingzeros and ones, without using any carry signals from theadder. From the equations, it is apparent that Kershawwould have one or two more gate levels of delay for justthe indicators, and a few more total gates as well. However, Schmookler then requires separate ORing of signalsfrom the left for leading zeros and for leading ones, so thecosts are more comparable. Then, Schmookler alsorequires the resulting strings to be ANDed, so the totaldelays may also be about the same.Further simplification results for a case which is evenmore restricted, as described in [8J and [SI. Some floatingpoint adders provide separate dataflow paths for "far" and"near" cases. The far path is used for either effective addition or for subtraction of operands whose exponents differby more than one. No LZA is needed for the far path. Inthe near path, separate LZAs are used for subtraction ofoperands whose exponents are equal and for subtraction ofoperands whose exponents differ by one. This allows thedetection of the number of leading zeros to start in parallelwith swapping, aligning and inverting the operands. WhenNow comparing Kershaw with Britton, although the9Authorized licensed use limited to: University of Maryland Baltimore Cty. Downloaded on November 4, 2009 at 14:38 from IEEE Xplore. Restrictions apply.

[31 or through hierarchical or look-ahead techniques [41.The monotonic string method is used in several LZAs.The Power and Power2 processors employ the well-knownLZA designed by Hokenek and Montoye [5]. Five separatemonotonic strings are generated, including strings forleading ones, leading zeros and the case where all bit positions are Ts. These strings are ones followed by zeros,where the first zero indicates the location of the most significant bit position. Therefore, they are bit-wise ORedtogether to obtain a single string. The Power3 processor,and also several PowerPC processors such as some whichare used in the Power Macintosh, use the LZA bySchmookler [ 11, which creates two monotonic strings aswe previously described in section 2.2. Thus, the creationof monotonic strings in both of these designs is essential tocombining the several strings into a single string.In Kersaw [2][3], generating the count from a monotonic string is dictated by the use of precharged chains.One small circuit integrates the adder and LZA functionstogether. It uses a boot-strappcd Manchester carry chainfor the adder, which propagates the carries from right toleft, and it uses a similar precharged chain to propagate theFi signal from left to right under control of the local propagate signals to generate the monotonic Fi string and the 1of-32 coded string L. The carry signals and the Li are alsoused to create an error signal at each position, ei. The ORof these ei signals indicates that a 1-bit correction isneeded in both the shifter and the exponent. The creationof an error signal in this way also required generation ofthe monotonic string.In the 'I9000 described by Knowles [4], the LZA islogically similar to that of Kershaw, but with more standard lookahead techniques similar to the cany skip techniques used in their adder, The use of ORing to create themonotonic strings is due to its simplicity.In order to get the leading zero count, either simpleAND-OR functions of the F, signals or simple ORing ofparticular Li signals from equ. (6) and (7) permit easy andfast encoding of the count. For example, for an eight-bitsum,the shift amount which is determined by the binaryencoding of the location of the leading significant digit canbe formed by:indicator circuits are much smaller and faster with Britton,both the ORing and encoding of the shift signals must beduplicated for the two cases, before the adder carry signalis available for selection. Therefore, the cost of Brittonmay actually be slightly greater, and its delay is dependenton the speed of the adder. In the actual circuit implementations that are shown, Britton shows several enhancementsfor reducing the delay. Both use precharged chains ofNFET pass gates for propagating the leading zero signalfrom the high order bits to the lower order bits, to accomplish the ORing. However, Britton uses a regcnerativefeedback circuit in each bit position to help pull the chainlow. Britton also illustrates how a wide word can be broken up into smaller chains which operate in parallel to provide some lookahead.When one only needs to consider leading zeros, it isapparent that the LZA used by Suzuki would providelower cost and less delay than Kershaw.3. Encoding count of leading digitsThere are two basically distinct methods of obtainingan encoded count of the leading digits. One methodincludes the creation of a monotonic string of zeros followed by ones. The other method uses a hierarchical treestructure.3.1. Leading digit counting through monotonicstring productionThe restriction that no other digit of greater significance with an active indicator imposes a priority encodingfunction on the anticipator. The priority encoding involvesthe generation of the ORing of all indicator bits of greatersignificance. The Boolean inverse of this value is ANDedwith the indicator to signify that the position i contains thefirst leading digit:iFi CfjSAo F3F,This ORing function creates a monotonic string inwhich the digit i represents the ORing of all less significant indicators. Once this string is created, the indicator fiis ANDed with the inverse of the monotonic string in position i-1 to determine the position which is within one digitof the most significant digit of the result.The creation of the monotonic string can be accomplished through the use of Manchester carry techniques[2] L4 v L, v L, v L ,SA, F F v F F , L,vL3vL,vL,10Authorized licensed use limited to: University of Maryland Baltimore Cty. Downloaded on November 4, 2009 at 14:38 from IEEE Xplore. Restrictions apply.(8)

L,v L, v L, v L,arithmetic. For subtraction, however, since G, l, the onlysequence that can produce a zero result is T",which corresponds to both inputs being identical prior to inverting oneof them. Both of these cases are handled properly by theuse of FrLYs. It should also be pointed out that althoughthe Vassiliades method also lends itself to leading zerodetection, the Weinberger method does not. Nevertheless,it was the only known solution for many years.For floating point, if a full LZA is used, thenprovides an attractive way to determine a zero result. Othcrwise, since the T* G Z* sequence cannot produce leading zeros for effective addition, a simpler circuit may bechosen. For effective addition, a zero rcsult can only occurwhen both operands are zeros, therefore, ORing the zisignals would detect a non-zero result. For effective subtraction, both operands must be identical, so ORing the Tisignals would detect a non-zero result.3.2. Leading digit counting with tree structureThe other well-known method for LZC design consists ofa tree structure. For example, the string of n inputs mayfirst be partitioned into nl2 pairs of adjacent bits. For eachpair, a 2-bit leading zero count is generated, and the highorder bit also indicates when both bits are zeros. At thenext level, adjacent pairs are combined, a mux circuitselects the count from one of the pairs, and a new highorder bit is appended to the count which also indicates thatboth pairs are all zeros. This scheme is continued forlog2(n) levels. Some speedup can be obtained by detectinglarger groups of all zeros and using larger multi-waymuxes. This type of binary tree structures is describedmore fully for a leading z

Kershaw, et a1 [21[31 recognized that each digit can be evaluated to determine if this digit can possibly be the first leading digit by examination of this digit and its two neighbors, one to the left and one to the right. Knowles 141 formalized the solution

Related Documents:

2017 ZERO S / SR / DS / DSR 2017 ZERO S ZERO SR ZERO DS ZERO DSR TORCYCLES.COM YCLES.COM 88-08708.06 OWNER’S MANUAL OWNER’S MANUAL Zero Owner's Manual (S and

ZERO S ZERO SR ZERO DS ZERO DSR ZERO S / SR / DS / DSR TORCYCLES.COM 2016 2016 YCLES.COM 88-08461.04 OWNER’S MANUAL OWNER’S MANUAL Zero Owner's Manual (S and DS).book Page 1 Thursday, March 15, 2018 4:13 PM

1.64 6 M10 snow/ice detection, water surface cloud detection 2.13 7 M11 snow/ice detection, water surface cloud detection 3.75 20 M12 land and water surface cloud detection (VIIRS) 3.96 21 not used land and water surface cloud detection (MODIS) 8.55 29 M14 water surface ice cloud detection

Rapid detection kit for canine Parvovirus Rapid detection kit for canine Coronavirus Rapid detection kit for feline Parvovirus Rapid detection kit for feline Calicivirus Rapid detection kit for feline Herpesvirus Rapid detection kit for canine Parvovirus/canine Coronavirus Rapid detection kit for

List of Figures, Tables and Boxes Figures 2 Figure S1 Overview of the key nuances of net-zero target implementation approaches 5 Figure S2 Ten basic criteria for net-zero target transparency 14 Figure 1 Internet searches for net-zero emissions 15 Figure 2 Map of cities and regions pursuing net-zero emissions 16 Figure 3 Population of cities and regions with net-zero targets, by geographic region

emissions ('net zero emissions') has grown, so has the need for a common understanding on what net zero emissions means and how to achieve net zero goals. Investors are also putting pressure on companies to lay out their plans for reaching net zero emissions and to demonstrate how net zero pathways are integrated into their long-term strategy.4

The Net-Zero riteria are part of the STi’s Net-Zero Standard. The Net-Zero Standard, which entails both the Criteria and forthcoming Net-Zero Guidance, will be finalized by November 2021 in advance of the 2021 United Nations Climate Change Conference (COP26). Public consultation of the Net-Zero Guidance is scheduled to begin in July 2021.

Auto Zero Tracking Auto Zero Tracking automatically adjusts for zero weight. This capability allows the module to ignore material build-up in the weighing system within a pre-set auto zero tolerance. For auto zero to work, the current gross weight must be within the auto zero tolerance. The