CHiRP: Control-Flow History Reuse Prediction

2y ago

15 Views

2 Downloads

1.29 MB

15 Pages

Last View : 22d ago

Last Download : 3m ago

Upload by : Tripp Mcmullen

Report this link

Download PDF

Transcription

2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)CHiRP: Control-Flow History Reuse PredictionSamira Mirbagher-AjorpazElba GarzaComputer Science and EngineeringTexas A&M UniversityCollege Station, USAsamiramir@tamu.eduComputer Science and EngineeringTexas A&M UniversityCollege Station, USAelba@tamu.eduGilles PokamDaniel A. JiménezIntel LabsSanta Clara, USAgilles.a.pokam@intel.comComputer Science and EngineeringTexas A&M UniversityCollege Station, USAdjimenez@acm.orgAbstract—Translation Lookaside Buffers (TLBs) play a criticalrole in hardware-supported memory virtualization. To speed upaddress translation and reduce costly page table walks, TLBscache a small number of recently-used virtual-to-physical addresstranslations. TLBs must make the best use of their limitedcapacities. Thus, TLB entries with low potential for reuse shouldbe replaced by more useful entries. This paper contributes toan aspect of TLB management that has received little attentionin the literature: replacement policy. We show how predictivereplacement policies can be tailored toward TLBs to reduce missrates and improve overall performance.We begin by applying recently proposed predictive cachereplacement policies to the TLB. We show these policies do notwork well without considering speciﬁc TLB behavior. Next, weintroduce a novel TLB-focused predictive policy, Control-ﬂowHistory Reuse Prediction (CH I RP). This policy uses a historysignature and replacement algorithm that correlates to knownTLB behavior, outperforming other policies.For a 1024-entry 8-way set-associative L2 TLB with a 4KBpage size, we show that CH I RP reduces misses per 1000 instructions (MPKI) by an average 28.21% over the least-recentlyused (LRU) policy, outperforming Static Re-reference IntervalPrediction (SRRIP) [1], Global History Reuse Policy (GHRP) [2]and SHiP [3], which reduce MPKI by an average of 10.36%,9.03% and 0.88%, respectively.Index Terms—Translation Lookaside Buffers, ReplacementPolicies, Paging, MicroarchitecturesSHiPFig. 1. Comparing predictive policy efﬁciency with a heat map shows CH I RPmaintains more live TLB entries compared to other policies when analyzedon 870 different benchmarks. A lighter color block indicates higher TLBefﬁciency, while darker denotes lower efﬁciency.I. I NTRODUCTIONVirtual-to-physical address translation is expensive [4], [5],[6], [7], [8], [9], [10], [11], [12], [13], [14], [15]. Translationlookaside buffers (TLBs) help minimize the need for costlypage table walks by caching recently retrieved virtual-tophysical address mappings [16], [17].Recent studies by Google [18], asmDB [19], and Facebook [20] conﬁrm that modern deeply pipelined speculativeOoO CPUs face increasing challenges associated with TLBperformance. For example, server workloads show growingcode footprints and working set sizes [18], [21], [22], [23],placing tremendous pressure on caches and TLBs [24]. Thecaches and TLBs of future systems will need to improve at asimilar rate to maintain performance.Unfortunately, TLBs are limited in size, and thus reach, dueto power, timing, and area constraints [25]. The TLB lies on978-1-7281-7383-2/20/ 31.00 2020 IEEEDOI 10.1109/MICRO50266.2020.00023the critical path to accessing memory. Thus, increasing L2 TLBsizes to reduce TLB misses is difﬁcult because larger TLBsincur higher access latencies [26].Meanwhile, TLB misses are a ﬁrst-order concern in termsof their negative impact on performance. Recently studies [27],[28], [29] indicate that many programs can spend hundreds ofextra cycles conducting address translations that did not hit inthe TLBs. This is despite the fact that the Skylake architectureincludes special MMU/paging structure caches (or PSCs) tolessen the page walk penalty [30]. The study [27] ﬁnds L2TLB miss costs range from 16.3 cycles for Sandy Bridge in2011, increasing up to 212 for Skylake in 2015, 272 cyclesfor Broadwell Xeon in 2016, and 230 cycles for Coffee Lakein 2017. Such overhead is likely to be exacerbated in the131

future1 given that modern computing platforms can now besupplied with terabytes, and even petabytes, of main memory[32], [33] , all on various memory-intensive workloads thatare rapidly emerging [18], [19], [20], [28].Translation overheads running into 100 cycles have alsobeen reported in prior work [13], [14]. Address translationlatencies due to TLB misses represent between 20% and 50%of system run-times today [9], [10], [13], [14], [34], [35], [36],[37], [38], [39], [40], [41] and consume a substantial share ofprocessor energy [4], [5], [6], [7], [11], [15], [42].Peng et al. conduct a thorough study of TLB behavior of Javaapplications [43], reporting 230 cycle TLB miss latenciesand indicate TLB miss overhead accounts for 5.5% to 19%of the total execution time. Their study ﬁnds that ﬁve out ofseven benchmarks exhibit similar TLB overhead.These concerns motivate us to investigate mechanisms toimprove TLB performance that do not require increasingTLB sizes. Similar efforts to improve TLB performance haveincluded using varied page sizes and superpages [24], [44],[45], [46], [47], [48] as well as prefetching [36], [49], [50],[51].Fortunately, TLBs’ organization makes them amenable topredictive replacement policies. TLBs are organized withtagged set-associative SRAM arrays much like cache memories.Predictive replacement policies have been well-explored andhave been shown to perform well in data caches [3], [52],[53], [54] that depend on spatial and temporal locality ofdata accesses to maintain useful entries. Access patterns toTLBs are similar to cache accesses at a larger granularity.Thus, it is reasonable to apply previous work on cachereplacement/management to TLBs.TLB replacement policy has received little attention in theliterature. Recent work [14], [34], [36], [37], [38], [55], [56],[57], [58] advocates using an LRU replacement policy for alllevels of TLBs. Other prior work focuses either on reducingthe cost of a page table walk upon a TLB miss [10], [34],[49], [50], [51] or reducing the TLB miss rate by extendingthe size of the TLB [26]. In this paper, we suggest tacklingthe fundamental problem of the TLB’s insufﬁcient capacity byimproving its replacement policy.Our work builds on prior predictive replacement policiesgeared toward the last-level cache (LLC), such as static rereference interval prediction (SRRIP) [1], signature-based hitprediction (SHiP) [53], and Global History Reuse Prediction(GHRP) [2], to extract key insights for the TLB. We proposea novel mechanism, Control-ﬂow History Reuse Prediction(CH I RP), that provides superior prediction accuracy andperformance by better correlating to TLB reuse behavior.We begin with predictive policies adapted from the cachereplacement literature, in particular the last-level cache (LLC),and show that they are not a good ﬁt for TLBs. We showthat features used by these schemes do not correlate well toTLB reuse, resulting in negligible performance gains. Moreover,LLC-focused prediction policies are designed with less stringent1 The new generation of Intel processors, Sunny Cove [31], introduced5-levels radix page tabling.cycle time requirements and can tolerate several accessesto their prediction tables. TLBs, on the other hand, havetighter timing requirements for TLB access. Based on this andother insights, we introduce a policy that efﬁciently indexesprediction tables using a novel signature speciﬁcally designedto correlate to TLB behavior. We focus on the L2 TLB as L2TLB misses account for most of the cycles spent in the TLBmiss handler [41].This paper makes the following contributions:1) A ﬁrst study and exploration of TLB replacement policiesby implementing and adapting policies from previouswork on data caches and branch target buffers to the TLB.2) An intuition on why previous predictive replacementpolicies may or may not be as effective on TLBs. Weevaluate the impact of various optimizations on adaptedpredictive replacement policies over a large suite ofindustry-sourced traces.3) A new predictive replacement policy, Control-ﬂow HistoryReuse Prediction (CH I RP). This policy indexes predictiontables using a signature specially designed to correlatewith TLB behavior. It reduces L2 TLB misses by 28.21%on average over LRU, resulting in signiﬁcant speedup. Forexample, for a page walk latency of 150 cycles, CH I RPyields a geometric mean speedup of 4.8%.II. BACKGROUNDProcessor performance is affected by the TLB in two ways:the number of TLB misses and the TLB miss penalty in cycles.While other solutions have mainly focused on reducing theTLB miss penalty, very little work has focused on reducingthe number of misses in the TLB directly. There have been ahandful of papers on prefetching into the TLB [36], [49], [50].However, to the best of our knowledge, no previous work hasproposed a predictive replacement policy speciﬁcally for TLB.Rather, recent work employs LRU or Random replacementpolicies [14], [34], [36], [37], [38], [55], [56], [57], [58], [59].We advocate using a predictive replacement policy that relies ona variety of program features to guide TLB entry replacementto improve performance without needing to increase the TLB’ssize.Recent work in cache and BTB replacement shows thatreuse prediction can signiﬁcantly reduce misses and improveperformance [2], [3], [53], [54], [60], [61], [62], [63]. Predictivereplacement policies attempt to predict whether a cached itemwill be used again before it is evicted. If not, then it is a primecandidate for eviction. This idea is superior to LRU replacement,in which a block with no near-term reuse must migrate all theway down the recency stack before being replaced. However,a highly accurate predictive replacement policy for one cachelike structure may not work for another cache-like structure.For example Mirbagheret al. [2] show that while PC-basedpolicies such as SDBP [3] and SHiP [53] reduce the number ofdead blocks in the LLC, it is detrimental to instruction cachesand BTBs. We ﬁnd the same applies to TLBs.There are three main challenges in designing a predictivereplacement policy. The ﬁrst is ﬁnding the microarchitectural132

features that correlate with reuse for a particular cache-likestructure. These features vary for different structures such asthe TLB and data caches, and even different applications [2],[3], [53], [54], [60], [61], [63]. The second is building anefﬁcient signature by combining the identiﬁed correlatingfeatures. The features are combined to reduce their hardwarestorage budget and prediction time. The third is designing afast/low-cost prediction algorithm to use this signature. Thelatter is particularly important for the TLB as it lies on thecritical path to a memory access.Once we identiﬁed highly correlating features of TLB entryreuse, we adapted previous algorithms to propose a novel,low-cost algorithm speciﬁcally tailored for reuse predictionin L2 TLBs. Previous work on LLC reuse prediction thatuses prediction tables has used multiple features hashed tomultiple indices [3] or signature [54], [63] to combine severalpredictions into one. Because the TLB is on the critical path toaccessing memory, we reduce accesses to a single table witha signature combining several features as the most latencysensitive approach.We explore using predictive cache replacement policiessuch as static re-reference interval prediction (SRRIP) [1],signature-based hit prediction (SHiP) [53], and Global HistoryReuse Prediction (GHRP) [2] for the TLB, and propose a newmechanism, Control-ﬂow History Reuse Prediction (CH I RP),to better guide TLB entry replacement.A. Static Re-Reference Interval PredictionSRRIP [1] predicts which blocks will be referenced again(i.e. re-referenced) in the cache. Each block has a 2-bit rereference prediction value (RRPV) placing the block into oneof four categories ranging from near-immediate re-referenceto distant re-reference. A ﬁrst prediction is made on blockplacement and revised when a block is reused or replaced.Blocks with distant re-reference prediction are evicted. If thereare none, the RRPV for each block in the set is incrementeduntil there is at least one eviction candidate. We adapt SRRIPto work with TLB entries instead of cache blocks.sampling to generalize the behavior of accesses to suchstructures since the PC itself forms the index into the structure.We ﬁnd that sampling also does not work well for secondlevel TLBs. The reason is the coarser granularity of TLB entriesversus cache blocks. A PC accesses different data addresses thatare in the L2 TLB, which might lead one to believe samplingshould generalize across the TLB. However, in the LLC, onesampled set may map to many cache sets all accessed by thesame PC, which allows behavior to be generalized across sets.On the other hand, in the L2 TLB, one PC accesses data thatare mapped to much fewer TLB entries than cache blocks.Spatial locality for data accessed by a single PC does notexpand beyond a few TLB entries, so generalization fails.Because of this failure, in this work we evaluate SHiP withthe same general algorithm, but with bits of PC kept as metadatain each TLB entry, which is equivalent to keeping a samplerthe same size as the structure. We consider SHiP to be thebest cache replacement policy from previous work that wouldbe implementable under the tight timing requirements of theTLB access critical path.C. Global History Reuse PredictionGlobal History Reuse Prediction (GHRP) [2] is the stateof-the-art predictive replacement policy for BTB and i-cachereplacement. We adapt GHRP for TLB replacement. GHRPhas a structure similar to SHiP, but the signature used to indexthe prediction tables is speciﬁcally designed for instructionstreams. Like a branch predictor, it uses the global history ofconditional branch outcomes [64] as well as lower-order bitsfrom branch addresses to form an index into a table of countersthat keep track of reuse behavior.D. Ofﬂine LearningWe use insights from neural networks to design a new handcrafted feature that represents a program’s control-ﬂow historycompactly and that can be used with a much simpler linearlearning model. Ofﬂine training has been used for designingreplacement policies in the past through using genetic algorithmB. PC-Based Dead Block Predictorsby Jiménez et al. [65] and LSTM by Shi et al. [66]. Their workIn sampling-based Dead Block Prediction (SDBP) [3], a shows how insights from ofﬂine training can improve learningpredictor learns the pattern of accesses and evictions from a model for online prediction in the LLC. We use A DALINEsmall number of sets kept in a structure called the sampler. (ADAptive LINear Element) [67], [68] to ﬁnd insights for TLBWhen a load or store accesses the LLC, the address (PC) of replacement policy.A DALINE uses a vector of weights that records correlationsthat instruction is hashed to index prediction tables. Countersread from the tables are summed and thresholded to predict between an input vector and a target value. It can be used towhether the block is dead. In the original SDBP paper, blocks classify inputs into one of two classes.A DALINE computes the weighted sum of the input patternsare predicted on each access [3]. Signature-based Hit Prediction(SHiP) improves on this idea by using the prediction only for x(n).y(n) wT (n)x(n) θplacement in a RRIP-replaced cache, reducing the number ofpredictions and signiﬁcantly improving performance.A DALINE weights are updated after the desired outcome d(n)However, sampling is not suitable for structures indexed byof the predicted event is known. If the prediction was correctinstruction addresses such as the BTB and instruction cache [2].then the weights remain unchanged. Otherwise, the inputs areSampling works for data caches because the behavior of aused to update the corresponding weights.memory access instruction, represented by its PC, generalizesw(n 1) w(n) μ[d(n) y(n)]x(n)over the entire cache. Instruction streams do not allow set133

call this method Selective Hit Update. Selective Hit Updateimproves accuracy by reducing average MPKI by 5.85%.Previous work [2], [66] has shown that a longer historyE. CH I RPof past PCs would beneﬁt predictive replacement policies inWe explored adapting predictive cache replacement policies the LLC and i-cache. Figure 2 shows our results conductingto the TLB and observed that features that correlate well to a similar study for the TLB. Here, we analyze varying PCcache reuse behavior may not necessarily correlate well to TLB history lengths from 4 to 40 and their resulting speedups.reuse behavior. In contrast to a cache access, a TLB access is We ﬁnd that the beneﬁts of using longer global PC historyof coarser granularity with many PCs that map to the same for TLB reuse prediction diminishes beyond a length of 15.TLB entry. Furthermore, depending on the context, each such This contrasts with prior work on predictive policies for thePC may result in an eviction or a reuse of the same TLB entry. LLC, which show beneﬁts of using global PC history lengthWe ﬁnd that predicting a TLB entry’s reuse requires multiple of 60 or more. This is likely due to the coarse-grain naturefeatures that we compose into a single signature for better of TLB accesses that may limit the global history windowprediction accuracy and overhead reduction.from capturing enough information pertaining to TLB reuse.Toimprove on this, we augment the global PC history withIII. T HE R EUSE P REDICTION P ROBLEM IN TLB & O URbranchpath history information, resulting in a history lengthS OLUTIONgreater than 30 (Figure 2). Hence, our third observation is asWe ﬁnd that predictive policies for the LLC, instructionfollows:cache, and BTB do not apply well to L2 TLBs, and describeObservation 3: TLB reuse prediction does not beneﬁt from athe main reasons why in this section.global PC history of length 15 or more. However, by combiningWe simulated 870 workloads from a variety of categoriesbranch path history into a prediction signature, CH I RP canprovided publicly by Qualcomm [69] to prevent overﬁtting totake advantage of a PC history length of 30 or more.one type of workload. More information about the full detailsof our simulation methodology can be found in Section V.We ﬁrst applied signature-based hit prediction (SHiP) [53],which was shown to be useful in the LLC. SHiP uses onlythe address (PC) of the most recent instruction. However, ourresults show that a solely PC-based reuse entry prediction doesnot perform much better than LRU, giving a reduction in MPKIof only 0.88%.We investigated whether aliasing was the cause of theobserved mispredictions, but found that even with an unlimitedprediction table size (i.e. no aliasing), SHiP is not able todetect dead entries in the TLB, giving a reduction in MPKIof only 0.63%. Since prediction table size was not the sourceof the mispredictions, we further investigated by limiting theprediction to only a subset of the TLB sets and used LRU forthe rest. This technique also just slightly improves accuracy,reducing MPKI by 1.28%, leading to the following observation: Fig. 2. Speedup does not increase for global PC history length more than 15.However, combining branch history into signature, CH I RP can beneﬁt fromObservation 1: The inaccuracy in previous predictive poli- history lengths longer than 30.cies for the TLB is not due to conﬂicts among multiple setsbut rather within the sets themselves.Branch history is effective because L2 TLB accesses comeWe ﬁnd that a TLB entry may experience many hits from one from both data and instructions in the ﬁrst-level TLBs. Conor more PCs that map to the same entry before it is eventually ditional branch histories can reﬂect the data accesses whenevicted. This is because a larger range of unique addresses map global path history does not. Branch path history can alsoto the same entry in the TLB compared to accesses to a block reveal high-level program semantics that also contribute toin a cache. Indeed, there is a nearly two order-of-magnitude TLB misses.difference between a 4KB page and a 64B block.A. PC Bits Carry Uneven WeightsTherefore, we obtain our second observation:Observation 2: The coarse-grained nature of TLB accessesPrevious work [65], [66] shows that certain features fromresults in increased aliasing in previous predictive policies, program behavior are important to predicting reuse of a block inwhich cause the prediction counters to saturate too quickly, the LLC. We come to the same conclusion with regard to TLBs,rendering the predictor ineffective.recognizing that some bits of the PC carry more weight thanFrom Observation 2 we posit that in order to dissipate this others in reuse prediction. To show this for the case of TLBs,noise, we need to slow down the rate at which the prediction we use the weights of a trained A DALINE neural network tocounters are updated. We do this by limiting updates only to score the bits of PCs that we incorporate into the global history.hits of a TLB set different from the one last accessed. We The idea is based on the principle that the weights of the inputwhere μ is the learning-rate parameter and the difference d(n) y(n) is the error signal.134

less salient history bits down to make them less visible tothe learning process, allowing the prediction table to convergeto an accurate counter value with 3 times fewer entries thanGHRP.The above techniques of shifting and scaling the signaturebits are simple to implement in hardware and provide signiﬁcantreduction in TLB MPKI. Figure 6 shows that while addingconditional branch path history to the signature would reduceMPKI by 23.88%, adding two leading zeros in the path historywould allow the effect of conditional branch history to reduceMPKI by 26.98%.In the next section we discuss our signature function andthe individual effect of above optimizations.IV. C ONTROL - FLOW H ISTORY R EUSE P REDICTIONA LGORITHMA. OverviewFig. 3. Each row represents an ofﬂine-trained A DALINE weight vector forone benchmark. The x-axis shows the PC bit used as input. The white boxesshow reuse prediction in TLB entries is strongly correlated with bits 2 and bit3 of the PC.nodes corresponding to less important features are expected tobe smaller in trained A DALINE networks. The incorporationof appropriate regularization terms in the A DALINE updatefunction encourages such weights to converge to zero andultimately be eliminated.Figure 3 shows that the two lower-order bits of a PC address(bits 2 and 3) contain important information, as indicated bytheir higher-weight color values. Thus, passing on these bitsto the signature function yields a high chance of preservinginformation to reduce aliasing. In our proposed CH I RP policydescribed in Section IV, we keep these two correlated bits inthe global path history.CH I RP correlates TLB replacement with reuse history.CH I RP uses features that best correlate to reuse behaviorand combines them into a signature that is used to uniquelytag each TLB entry (IV-B).This signature is subsequently used to track the reusebehavior of the associated TLB entry by means of a predictiontable indexed by the signature (IV-C). The prediction table isupdated on an eviction or a reuse, and the resulting predictionstatus is written back into the corresponding TLB entry toinform the next TLB replacement operation (IV-D). Figure 4describes the main components of CH I RP and Algorithm 5provides the CH I RP algorithm.B. CH I RP SignatureCH I RP contributes four features that correlate with reusebehavior. The ﬁrst is the global path history of PCs. The globalpath history in CH I RP is 64 bits wide and is updated on eachB. Modeling Efﬁcient Signaturesaccess by shifting the two lower-order bits of the PC into theAliasing in the prediction table is harder to solve in the path history, followed by two zero bits (Figure 5, line 28), asTLB than caches. With TLB reuse prediction, far too many previously discussed in subsections III-A and III-B, respectively.PCs map to the same TLB entry, i.e., 64 times more than to The global path history in CH I RP allows recording the lasta cache block. The problem of aliasing is exacerbated further 16 accesses.with large footprint applications.The second and third features are the conditional andIf a counter in the prediction table changes direction unconditional indirect branch address history, respectively. Eachfrequently due to aliasing, the same problem will only be of these histories is 64 bits and is updated by shifting theexacerbated with a smaller table size. To achieve high reuse eight bits of the PC [11:4] into the branch history on everyprediction accuracy with a smaller table size, we have to solve conditional (resp. unconditional indirect) branch instruction(Algorithm 5, line 31.), recording the last 8 branch accessesaliasing ﬁrst.This problem can be addressed by coordinating how the for each type.input bits are transformed by designing a succinct signature.The fourth feature is the current PC, shifted right by two bits.We found that employing shifting and scaling techniques as The signature is constructed by XOR-ing the global path historydescribed by Lecun and Hinton [70], [71] improves prediction with the conditional branch history, the unconditional indirectaccuracy.branch history, and the shifted PC of the access (Figure 5,We accomplished this by injecting and shifting leading line 5).zeros into speciﬁc bit positions of different components ofTo compute indices into the prediction table, CH I RPthe signature including the global path history, conditional computes a 16-bit hash of the constructed signature. Forbranch history, indirect branch history, and the shifted PC of hashing, we ﬁrst use Robert Jenkins’ 64-bit mix function [72].the access (section IV).The mix function enables a single-bit change in the key toDoing this both shifts the individual PCs and scales the inﬂuence widely disparate bits in the hash result. We then135

Fig. 4. CH I RP TLB metadata and prediction table update ﬂow using a signature.1: int predTable[numCounters]2: procedure ACCESS TLB(int VA)3:set calcSet(VA)4:isMissed isTagMatch(VA)5:sign VA»2 pathHist condBrHist unCondBrHist6:index Hash(sign) mod 2167:cntrNew predTable[index]8:if isMissed true then miss9:entry victimEntry(set)10:if entry.isDead false then lru11:index Hash(entry.signature)12:updatePredTables(index, true)13:entry.ﬁrstHit true insertion14:else hit15:entry matchedEntry(set, tag)16:if entry.ﬁrstHit true then access table17:indices Hash(entry.signature)18:updatePredTables(index, false)19:entry.dead predict(cntrNew, deadThresh)20:entry.ﬁrstHit false21:entry.signature sign22:updateLRU stackP osition()23:UpdatePathHist(VA, pathHist)24:if instType conditionalBranch then25:UpdateBrHist(VA, condBrHist)26:if instType unConditionalBranch then27:UpdateBrHist(VA, uncondBrHist)28: procedure U PDATE PATH H IST(int VA, int history)29:history history 430:history (history VA2.3 )31: procedure U PDATE B R H IST(int VA, int history)32:history history 833:history (history VA4.11 )34: procedure PREDICT(int counter, int threshold)35:if counter threshold then return true36:else return false37: procedure VICTIM E NTRY(Set set)38:for int i 1 to associativity do39:entry set.entries[i]40:if entry.isDead true then return entryreturn LRUentry()41: procedure UPDATE P RED TABLE(int index, bool Dead)42:if Dead true then43:predTable[index] 44:else45:predTable[index] Fig. 5. CH I RP algorithm.take the modulo of the table size to generate the hash tableindex(Figure 5, line 6).Note that the signature relies on bits from the branch PC,not conditional branch outcomes or bits from branch targets.C. CH I RP Prediction TableCH I RP stores metadata for each L2 TLB entry, consisting of3 LRU stack position bits, a valid bit, a 16-bit signature and aprediction bit (See Figure 4, Updating TLB Metadata). CH I RPuses a table of saturating counters to provide a prediction.The table is indexed by a hash function of the signature. Thecorresponding counter is thresholded, and if the counter exceedsthe threshold, the entry is predicted as dead.136D. CH I RP OperationsIn contrast to SHiP and GHRP that require updating theprediction table on each TLB access, the bulk of CH I RPoperations occurs off the TLB critical path, with minimalimpact to TLB latency. In particular, CH I RP updates itsprediction table on a TLB miss only if the selected victim isLRU (i.e. no dead entry is found).The operations pertaining to a TLB miss involve (1) selectinga victim, (2) updating the victim’s reuse history in the predictiontable if the victim is LRU, and (3) updating the predictionmetadata for the new TLB entry.

a) Victim selectionmade to prediction structures. We ﬁnd that two speciﬁc eventsare sufﬁcient for an accurate training update:On a TLB miss, CH I RP ﬁrst attempts to select a victim The ﬁrst hit of an entry.among the entries predicted as dead. If no such entry is found, A miss in a set with no dead entry (this leads the algorithmCH I RP evicts the LRU entry (Figure 5, line 37).to choose an entry to evict based on LRU.)b) Prediction table updateWith this technique, CH I RP reduces the access ratio to theBecause CH I RP updates its prediction table only if the prediction tables by 90% compared to SHiP and GHRP (seevictim is LRU (Figure 5, line 10 – 12), evicting the LRU entry Figure 11), which must access tables on every access to theeffectively makes it a dead candidate the next time around. TLB.This justiﬁes why the prediction table needs be updated. TheComponentSizesignature of the victim entry is used to index the predictionPrediction bits1 bit 1024 128Btable and the corresponding counter is incremented, since theFirstHit bits1 bit 1024 128Bentry was just shown to be dead (Figure 5, line 41).Signature bits16 bits 1024 2KBPath history registerCond.

(GHRP) [2], to extract key insights for the TLB. We propose a novel mechanism, Control-ﬂow History Reuse Prediction (CHIRP), that provides superior prediction accuracy and performance by better correlating to TLB reuse behavior. We begin with predictive policies adapted from the cache repla

Related Documents:

Thru-Hull Chirp Transducers | AIRMAR

the transducer’s active surface. THRU-HULL. B265LH screenshot courtesy of Furuno. AIRMAR , DEFINING CHIRP TECHNOLOGY. Thru-Hull Chirp Transducers. Unlock the true potential of your fishfinder with the superior quality and . performance of an AIRMAR Chirp-ready transducer.

26 Views

2y ago

Fast Fourier Transforms and Signal Processing - Matlab

Example Matlab has a built-in chirp signal t 0:0.001:2 y chirp(t,0,1,150) This samples a chirp for 2 seconds at 1 kHz –The frequency of the signal increases with time, starting at 0 and crossing 150 Hz at 1 second sound(y) will play the sound through your sound card spectrogram(y,256,25

27 Views

2y ago

The fundamentals of millimeter wave radar sensors (Rev. A)

An FMCW radar system transmits a chirp signal and captures the signals reflected by objects in its path. Figure 3 represents a simplified block diagram of the main RF components of a FMCW radar. The radar operates as follows: A synthesizer (synth) generates a chirp. The chirp is transmitted by a transmit antenna (TX ant).

10 Views

6m ago

Recycle and Reuse of Domestic Wastewater

Introduction 2. History of Wastewater Reuse 3. Motivational Factors for Recycling/Reuse 4. Quality Issues of Wastewater Reuse/Recycling 5. Types of Wastewater Reuse . wastewater treatment, resulted in catastrophic epidemics of waterborne diseases during 1840s and 50s. However, when the water supply links with these diseases became clear, .

17 Views

1y ago

Wastewater Management in Developing Countries - WIPO

Wastewater Reuse Applications 4-1. Wastewater Reuse for Agriculture 4-2. Wastewater Reuse for Industry 4-3. Urban Applications 4-4. Wastewater Reuse for Environmental Water Enhancement 4-5. Groundwater Recharge 5. Key Factors for Establishing Initiatives 6. Building Capacity for Water and Wastewater Reuse 6-1 .

27 Views

1y ago

Salvage & Reuse - Vancouver

Salvage & Recycling Organize a successful salvage plan to maximize reuse opportunities and minimize environmental impact. Getting Organized Plan ahead, remain flexible and open to new opportunities. Reuse Working salvaged building materials into your renovation can create a sense of history while saving money and the environment. When Not to Reuse

9 Views

10m ago

Flow-Alert Flow Switch - Instrumart

This product provides a local flow indication and automatically signals the operator or PLC if flow is too high or too low. Uses of the Flow-Alert flow meter include: bearing lubrication, case drain verification, gun drill cooling, pump flow confirmation, etc. FLOW-ALERT FLOW SWITCH 6000 PSI / 414 BARS MAX GPM LPM OIL FIGURE 1- Flow-AlERt Flow .

23 Views

1y ago

Artificial Intelligence and future directions for ETSI

Artificial Intelligence of December 2018 [5] and in the EU communication on Artificial Intelligence for Europe [6], including billions of Euros allocated in the Digital Europe Programme _ [7]. This is due to potential economic gains (e.g. see OECD reports on AI investments [8] and on AI patents [9]), as well as economic risks (such as the issue of liability – Liability for Artificial .

57 Views

3y ago

Recent Views

Legal Proceedings and Legal Privilege Exemptions: Myth-busting - ICO

If asking for legal advice, say so, and start new email chain If giving legal advice, say so Involve lawyers (before litigation contemplated) Maintain confidentiality of legal advice documents Limit dissemination of legal advice (need to know; original only) Make internal communications re legal advice factual

1y ago

240 Views

Smart People Ask for (My) Advice: Seeking Advice Boosts .

advice strategically is likely to be a different experi-ence for the advice seeker than seeking advice with the intention of using it, from the advisor’s perspec-tive, strategic advice seeking may elicit the same per-ceptual effects as authentic advice seeking because the advice seeker’s intentions (and her reliance on advice)

3y ago

177 Views

Legal Action Group The Role of Advice Services in Health Outcomes

The Role of Advice Services in Health Outcomes Evidence Review and Mapping Study June 2015 The Role of Advice Services in Health Outcomes . tor.!Our! r,!

1y ago

170 Views

Legal Information vs Legal Advice Guidelines - TMCEC

giving legal advice. Legal advice is a written or oral statement that: o Interprets some aspect of the law, court rules, or court procedures; o Recommends a specific course of conduct a person should take in an actual or potential legal proceeding; or o Applies the law to the individual person's specific factual circumstances. What is Legal .

1y ago

225 Views

ProQual L2 Certificate Supporting Access to Legal Advice

R/502/7657 Communicating with legal advice clients 2 3 D/503/0822 Supporting clients to make use of the legal advice service 2 3 R/502/7660 Enabling legal advice clients to access signposting and referral opportunities 2 3 Optional Units - a minimum of 6 credits Unit Reference Number Unit Title Unit Level Credit Value

1y ago

173 Views

Guidance for opponents in civil legal aid cases - Scottish Legal Aid Board

injury case - may apply for civil legal aid (since this leaﬂet deals only with civil legal aid, where we refer to "legal aid" we mean "civil legal aid"). Legal aid is ﬁnancial help from public funds. It helps people who qualify to get legal advice and the help of a solicitor to put their case in court.

4m ago

110 Views

Priority Banking Tariff - Standard Chartered

Foreign exchange rate Free Free Free Free Free Free Free Free Free Free Free Free Free Free Free SMS Banking Daily Weekly Monthly. in USD or in other foreign currencies in VND . IDD rates min. VND 85,000 Annual Rental Fee12 Locker size Small Locker size Medium Locker size Large Rental Deposit12,13 Lock replacement

2y ago

206 Views

legal and ethical dimensions of practice - Dovetail

Material in this Guide should never be taken as providing you or any other person with legal advice. Legal advice regarding the application of the law to a particular circumstance or situation can only come from a legal practitioner. A range of sources for legal advice can be found in the Guide.

1y ago

167 Views

How Social Welfare Legal Advice and Social Prescribing can work .

The position of social welfare legal advice and its role in London's recovery The Mayor of London and partners should position social welfare legal advice as a core pillar of Londons recovery from the OVID-19 pandemic, with a core focus on ensuring adequate funding and practical support for advice agencies to ensure ongoing viability.

1y ago

172 Views

WHAT TO DO IF YOU ARE SEXUALLY HARASSED

There are many legal clinics or legal information centres you can contact to obtain legal information, educational resources or legal referrals. Alberta Central Alberta Community Legal Clinic (Red Deer) Centre for Public Legal Education Alberta Pro Bono Law Alberta Women's Centre Legal Advice Clinic (Calgary)

3y ago

245 Views

Legal Advocacy Essentials

Legal Advocacy Essentials: a core training for legal advocates Presented by the Washington State Coalition Against Domestic Violence, 2008. This information is not intended as a substitute for legal advice. 1 Legal Advocacy Essentials . A core training for legal advocates . Table of Contents . What is a legal advocate?

1y ago

249 Views

Legal & Corporate Services: Strategic Plan - CP6

the provision of legal advice, managing legal risk and managing the legal supply chain. By doing this well, the team will move towards its vision. Legal Services is made up of 4 teams, each serving different customers with a dedicated legal resource. This is summarised in the figure right. Although Legal Services has customerdistinct, -focussed .

1y ago

171 Views

Regulatory Guide RG 90 Example Statement of Advice: Scaled advice for a .

representatives and advisers who give personal advice to retail clients. It explains how and why we have developed an example Statement of Advice (SOA) for scaled advice (i.e. personal advice that is limited in scope) on personal insurance for a new retail client. The example SOA was developed in consultation with stakeholders, and we

1y ago

186 Views

Removal of licence disqualification - Legal Aid WA

agencies, permission must first be obtained from Legal Aid Western Australia. This Kit provides information about the law only and does not constitute legal advice. You should seek legal advice if you have a specific legal problem. Every effort is made to ensure that the information contai

2y ago

253 Views

Legal Information vs - txcourts.gov

giving legal advice. Legal advice is a written or oral statement that: Inter p rets some as ect of th elaw, courtles, or du s; Recomme nd s a pecific c ourse of ndu ters h ld k ein an actual or ntial legal proceeding; or 'sApplies th elaw to individu alperso n seci fic actu circums a . What is Legal Information?

1y ago

174 Views

CHiRP: Control-Flow History Reuse Prediction

It looks like you're using an ad-blocker