Understanding The Critical Path In Power State Transition .

2y ago
31 Views
2 Downloads
462.20 KB
6 Pages
Last View : 11d ago
Last Download : 3m ago
Upload by : Brady Himes
Transcription

Understanding the Critical Path in Power State Transition LatenciesSam (Likun) Xi, Marisabel Guevara‡ , Jared Nelson, Patrick Pensabene, and Benjamin C. Lee‡Systems Architecture Integration Laboratory. Pratt School of Engineering, Duke University‡ Corresponding authors: mg@cs.duke.edu, benjamin.c.lee@duke.eduAbstract—Increasing demands on datacenter computing prompts research in energy-efficient warehousescale systems. In one approach, server activation policiesinvoke low-power sleep states but the power statetransition latency must be small to produce effectiveenergy savings. Chrome OS and Arch Linux require50ms and 650ms, respectively, to enter sleep states.These states consume merely 4 6% of nominal power.By analyzing the critical path, we propose strategies forselecting hardware components and optimizing kernelresume sequences to make datacenter server activationviable. With fast transitions, server activation can provide better performance at lower energy than dynamicvoltage and frequency scaling.Keywords—Transition latency, suspend, sleep, ChromeOS, power consumption, datacenterI. I NTRODUCTIONDatacenter power consumption has rapidly grownby 36% in the five year period following 2005. As of2010, datacenters use 2% of the energy consumed inthe United States and 1.3% of the energy consumedaround the world [12]. In the USA, this amounted to77.5 billion kWh consumed [5]. A large portion ofthis energy is spent powering idle servers performinglittle useful work. Even at idle, servers typically drawabout 60% of peak power. Average datacenter serverutilization is 20 30%, which means a huge amountof power is consumed with no gain [10].There is a large body of research on reducingthe energy consumption of datacenters. Transitioningservers into a low-power sleep state during idle is oneapproach to improving the energy proportionality oflarge scale systems [6], [10]. The effectiveness of anactivation policy depends largely on the latency totransition between active and sleep power states.What are contributors to power state transitionlatencies? Prior work identifies opportunities to reducetransition latencies via hardware choices, e.g. mobileinstead of server components [13]. In this work, weconsider software contributors to power state transition latencies. In particular, we analyze power states instandard and mobile-class operating systems to drawlessons for datacenter servers.We find that stock Arch Linux running on a highend custom built desktop has transition latencies of650ms, which we deem too slow for sleep state powerpolicies. Alternatively, Chrome OS is an operatingsystem that has been optimized for fast boot andresume, and we observe that transition latencies of50ms are achievable. This sub-100ms latency is instrumental for server activation policies and datacenterenergy efficiency.We evaluate power mode responsiveness in thecontext of PowerNap, a server activation policy proposed in prior work [10]. By studying two machinesand operating systems, we determine that it is feasibleto put a server into sleep mode during idle periodsand wake it up quickly enough to achieve significantenergy savings with minimal impact on performanceand response time. Indeed, performance and powertrade-offs are often better than those from highlyresponsive dynamic voltage and frequency scaling.The rest of this paper is structured as follows. InSection 2, we discuss related work and server sleepstates. Sections 3 and 4 describe our experimentalmethodology and results, respectively. In Section 5,we provide a discussion about the results and theirapplicability to server sleep states. Finally, we conclude in Section 6.II. BACKGROUNDThe large variation in transition latencies measuredin prior work motivates our study. Prior measurements range from milliseconds to hundreds of secondsacross desktop or mobile platforms.Server Activation. Putting servers into sleep statesduring activity troughs is a common approach toreducing datacenter power consumption. Buildingenergy-proportional systems at this scale is a crucial part of improving datacenter energy-efficiency[3]. An energy-proportional system consumes powerin proportion to its workload. Server activation canimprove the energy efficiency of a system given theright system features and a power management policy.Meisner et al. propose PowerNap, a policy fortransitioning servers into sleep states [10]. They compare its power-conserving ability over a range of CPUutilizations and transition latencies, finding that sub100ms latencies are necessary to provide substantial power savings. However, the PowerNap policyoptimistically assumes that a 10ms latency is easilyachievable; in practice, this is actually quite difficult.Gandhi et al. propose a variety of power management policies, and characterize the performanceper-watt (PPW) of each over a range of idle powerconsumption and setup times [6]. This study showsthat transition policies can provide improvements inPPW with transition latencies of 20-50s. Alternatively,Agarwal et al. propose an approach to service a small

number of tasks while a server is in a sleep state [2].A low power processor on the network interface cardallows the system to maintain a low power sleep stateuntil the computational power of the main processoris required again.Server Sleep States. The Advanced Configurationand Power Interface (ACPI) specification defines astandard for an operating system to perform powermanagement of hardware components [1]. Serveractivation policies use the sleeping state S3, alsoknown as suspend-to-RAM. In S3 sleep, the processorcontext, cache contents, and chipset context are alllost, but main memory remains powered. We focuson the S3 state, which provides a balance betweenpower savings and resume time as everything but mainmemory is powered down.In contrast, the S2 state only powers down the processor; in modern servers, CPUs account for merely33% of total power usage [4]. On the other hand,the S4 “hibernate” sleep copies DRAM content to thehard disk and the system is completely powered down.Although S4 achieves greater power savings, DRAMdata must be restored upon resume thus incurring anexpensive transition latency [10].Using a power mode incurs a sleep and a resumelatency. We focus our analysis on resume latency asthis, to a large degree, determines the response timeof a datacenter using a power state transition policy.Further, if the resume latency is fast enough, longsleep latencies can be hidden, because if a queryarrives when a server is going to sleep, another servercan resume and service it. However, for the sake ofcompleteness, sleep latency is briefly discussed.III. M ETHODSKernel Logging. Linux kernels post version3.6 can report individual device resume timesduring power state transitions. We enable thisfunctionality with the command echo 1 /sys/power/pm print times.Suspend-toRAM was manually initiated by the command echo‘‘mem’’ /sys/power/state, and resumewas initiated by pressing an appropriate key. Wecapture the contents of the kernel ring buffer forseveral iterations of the sleep and resume sequence.We then analyze the sequences to find individualdevice resume times and trends of interest.To measure power, we use a watts up PROTM wattmeter. The meter plugs into the wall outlet and thedevice’s power supply plugs into the meter. For eachdevice, we initiate sleep and wakeup as describedabove. The meter samples power usage at 1Hz, whichis the highest timing resolution. Finally, we retrievethe data from the meter via USB.Testing Platforms. We study the transition latencies of two systems: a Samsung Chromebook Series 5550 and a custom built Arch Linux desktop (Table I).The Arch Linux platform provides a baseline for comparing latencies and orderings of kernel componentsduring system resume. We could not test our bladeserver because the BIOS did not support S3 sleep.For comparison, we consider the potential formobile-class operating systems in datacenter servers.Chrome OS is sary to a server (i.e. audio cards, video ports,etc.), the noirq stage can be optimistically shortenedPower consumption (W)Percent of total devicesshows that the primary contributors are CPUs (72ms), noirq (184ms), and additional devices (400ms).150S3 Sleep10050Wakeup00102030Time (s)4050Figure 3. Desktop power consumption while idlingand sleeping. The hardware setup causes the reportedsleeping power consumption to appear to be 0W, butS3 sleep actually consumes 5W, shown by the jumpin power to 5W before the wake up signal.to 100ms. However, an optimized resume latencycannot be less than 300ms because of the SATA links,which are the slowest individual devices.Figure 2 shows the histogram of resume start andcompletion times for each device on Arch Linux.Most devices initiate resume 200ms after the firstdevice and about 20% of devices complete resumeat 600ms. Figure 1 demonstrates that most devicescomplete resume in under 50ms. Thus, any optimization of sleep state responsiveness must address thefew devices with long resume latency.Finally, power measurements shown in Figure 3demonstrate that the desktop consumes around 130Wwhen idling in a fully powered state compared to amere 5W in S3 sleep, 3.8% of nominal.

Ethernet controllerGraphicsSound cardLid switchUSB controllerPower subsystemWatchdogPower subsystemWatchdogBatteryUSB controllerCPU heat monitorRate matching hubWebcamKeyboardTrackpad0100200300400500600Time (ms)Figure 4. Resume sequence timeline of the Chromebook. A breakdown of the overall 600ms resume latencyshows that the primary contributors are the graphics system (300ms), sound card (500ms), and webcam (450ms).12Resume start timeResume end time80Power consumption (W)Percent of total devices10060402008640.10.20.30.4Time (s)0.50.6Figure 5. Resume sequence start and finish timedistribution on the Chromebook. Most devices beginand finish their resumes within the first 100ms of theoverall resume sequence.Chrome OS. The Chromebook suspends all of itsdevices in 45ms and then its two physical cores at2ms each for a total of 49ms. The Chromebook usesIntel Celeron CPUs, whose microarchitecture is muchsimpler than that of the Core i7 used in the desktopand explains why the Chromebook’s sleep latency isso much shorter.The Chromebook takes about 600ms to resume allof its devices. The timeline of the critical path forthe resume sequence is shown in Figure 4. In starkcontrast to the desktop, the Chromebook’s noirqstage takes only 1.74ms; essentially all of the 600msis spent on simultaneously resuming the components.We find that the wireless network card takes muchlonger to resume than the the Gigabit Ethernet card(data not shown) typically used in servers.We can reduce the resume latency to around 50mswith some optimizations. First, note that the soundWake up2000S3 Sleep105101520Time (s)253035Figure 6. Chromebook power consumption whileidling and sleeping. In sleep, only 5% of nominalpower is consumed.card1 and graphics subsystems take up the majority ofthe 600ms latency, but they are generally not criticalor needed in servers (certain applications may benefit from dedicated multimedia processing hardware).This devices can either be removed entirely or havetheir resume sequences delayed until needed.Also, observe that USB host devices take roughly50-100ms to resume, but servers are unlikely to haveperipheral devices attached over USB. Thus, USBhosts can have their resume sequences delayed untilthey are required (if at all). Other irrelevant components like the webcam, lid switch, trackpad, andkeyboard can be stripped out entirely.With these optimizations, resume latency drops toaround 50ms. Kernel logs for the Chromebook did notreport any latency for remounting filesystems, whichwe attribute to the use of a SSD instead of a HDD forlocal storage. Re-establishing a network connection1The sound card takes a long time to resume because sleeps areinserted in its resume sequence to prevent it from making poppingsounds.

V. A NALYTICAL R ESULTSLet’s consider the implications of fast responsetimes. We evaluate a system using PowerNap, a policyfor server activation [10]. In this prior work, Meisneret al. conclude that a transition latency on the orderof 10ms is required to achieve significant powersavings and performance improvements over a systemusing DVFS. We apply the same methodology toevaluate the Chromebook and Arch Linux desktop asPowerNap systems with 50ms and 650ms latencies,respectively. For comparison, our evaluation includesdata for an optimistic 10ms latency.Meisner et al. use analytical models to approximate performance and power as a function of powerstate transition latencies and system utilization. Theperformance model estimates response time with anM/G/1 queue with exceptional first service time,which accounts for the power state transition latencyfor the first task that arrives after an idle period. Thepower model simply accounts for the fraction of timein each power state.We refer the reader to PowerNap for detailed analytical models [10]. With these models, we computeaverage power and response time for systems thathave different transition latencies: Generic PowerNapTt 10ms, ChromeOS PowerNap Tt 50ms, andLinux Desktop PowerNap Tt 650ms. Responsetimes are relative to a system with no power management policy.We further compare against dynamic voltage andfrequency scaling (DVFS). In comparison to serveractivation, DVFS is very responsive (e.g., microseconds) but offers limited benefit. Only processor poweris modulated and we estimate that processor poweraccounts for only 33% of the total [4]. With sufficiently fast resume times, server activation policiesmay offer more attractive performance-power tradeoffs than DVFS.Generic PowerNap system, Tt 10msChrome OS PowerNap, Tt 50msLinux desktop PowerNap, Tt 650msAverage Power (% Max Power)DVFS, FCPU 100%100806040200020406080Utilization (% maximum)100Figure 7. Power scaling comparison between Power-Nap and DVFS. The Chrome OS system’s power consumption is less than that of DVFS for typical serverutilization levels, whereas the Arch Linux system’sresume latency is too slow to be of any benefit.Generic PowerNap system, Tt 10msChrome OS with PowerNap, Tt 50msGeneric PowerNap system, T 100mstDVFS, FCPU 40%4Relative response timetakes 0.95s, yet servers in a datacenter can avoidrequesting a new IP address by the same techniquesmentioned for the Arch Linux system.Figure 5 shows the distribution of all device resume start and completion times on the Chromebookin time bins of 50ms. Nearly 80% of all devices beginresuming within 75ms of the first device and nearlyall complete their resume sequences within 50ms. Theother 20% of devices begin and finish resume around300ms, and by examining the complete device resumetimeline (not shown), we see that these devices arewaiting on the graphics subsystem. If we eliminatesuch long latency devices, we can either eliminate thedevices that are waiting or allow them to begin theirresume sequence earlier, thus shortening the overallresume latency.Finally, power measurements shown in Figure 6demonstrate that the Chromebook draws on average10W while idling in a fully powered state, but in S3sleep, it drops to 0.6-0.7W, which is merely 6% offull power consumption.3.532.521.51020406080Utilization (% maximum)100Figure 8. Response time scaling comparison betweenPowerNap and DVFS. The Chrome OS system hasfaster relative response times than that of DVFS atall utilizations, but beyond Tt 50ms, DVFS willprovide faster response times at low utilizations.In Figure 7, we find that PowerNap with transitionlatency of Tt 650ms is highly inefficient, consuming less power than DVFS only for near-idle systems(e.g., utilization below 4%). However, at Tt 50ms,PowerNap is far more competitive, consuming lesspower than DVFS even as utilization increases to35%. Furthermore, if Tt drops to 10ms, then PowerNap’s power advantage over DVFS increases upto 45% utilization. Because most datacenter servers

operate under 50% utilization [3], a 10ms latencywould be ideal. But we show that a 50ms latency ishighly competitive and viable with the optimizationsto device and kernel resume sequences.In terms of performance, Figure 8 indicates thatPowerNap will always be slower than DVFS givenlong resume latencies. Tt 650ms is not competitiveand Tt 100ms is only competitive when systemutilization is above 30%.2 As utilization increases,Tt 100ms converges to DVFS. However, at Tt 50ms, PowerNap relative response times are alwaysfaster than that of DVFS, regardless of utilization.If we could reduce Tt to 10ms, we could furtherreduce response times. When utilization is less than45%, the point at which PowerNap consumes morepower than DVFS, response times for Tt 10ms are28% to 44% lower than response times for Tt 50ms.Our results indicate that ChromeOS is good candidatefor implementing PowerNap, whereas generic ArchLinux is not.These findings motivate further research into optimizing server operating systems for fast transition latencies and specific hardware configurations.ChromeOS is in fact a version of Linux engineeredfor these targets. As blade servers in a datacenter tendto be identical, from an energy-efficiency perspectiveit makes sense to optimize their software stacks forspecific configurations. Server operating systems exist in many forms, but typically performance takesprecedence over energy efficiency in their design.Furthermore, since most datacenters never shut downservers unless absolutely necessary, many servers donot support sleep states in the first place.Our work shows that optimizing transition latencies in the operating system can significantly reducepower consumption and improve response times fordatacenters. This is a first step in studying the impactof mobile computing design choices, such as fixedhardware configurations, low-power sleep states, andfast transition latencies, on server applications.Many of our conclusions are derived fromsoftware-based optimizations, so even though we collect measurements from mobile hardware instead ofexclusively desktop/server-class hardware, these conclusions will generalize regardless of the choice ofhardware. For instance, Chrome OS, a Linux basedsystem, demonstrates that it is possible to reduceLinux power state transition latency to 50ms. Eventhough it runs on mobile-class hardware, there isongoing research in using mobile-class hardware forservers [8] [9] [13].VI. C ONCLUSIONWe present experiments that support the use ofserver activation policies in datacenters. On ChromeOS, we measure sleep and resume latencies of 50mseach. On a Arch Linux desktop, we observe sleep2With data from Meisner et al. we extrapolate the 50msresponse time curve reproduced his figures with estimated fits.and resume latencies of 680 and 650ms, respectively.Sleeping systems consume merely 4-6% of the powerconsumed in an idling but fully powered state.Using PowerNap, we find that ChromeOS cansubstantially reduce power consumption and improveperformance over an equivalent system using DVFSfor dynamic power management, but Arch Linuxtransition latencies are too slow to provide any appreciable benefits. These results demonstrate thatserver activation policies for datacenter servers are apromising route towards reducing power with modestperformance impact.ACKNOWLEDGMENTSThis work is supported by NSF grant CCF1149252 (CAREER) and by STARnet, a Semiconductor Research Corporation program, sponsored byMARCO and DARPA. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarilyreflect the views of these sponsors.We would like to thank David Hendricks,Sameer Nanda, Sonny Rao, Hung-Te Lin, and LiamMcLoughlin from the Chrome OS team at Google.R EFERENCES[1] Advanced Configuration and Power Interface Specification.Rev. 5.0, 6 Dec 2011.[2] Y. Agarwal, S. Hodges, R. Chandra, J. Scott, P. Bahl, andR. Gupta. “Somniloquy: Augmenting Network Interfaces toReduce PC Energy Usage”, Proc. 6th NSDI, Apr 2009.[3] L. A. Barroso, U. Holzle. “The Case for Energy-ProportionalComputing”. IEEE Computer 40.12 (2007), pp. 33-37.[4] L. A. Barroso, U. Holzle. The Datacenter as a Computer:An Introduction to the Design of Warehouse-Scale Machines.Morgan and Claypool Publishers, 2009.[5] ”Electricity Consumption Data, USA”. Index Mundi, n.d.Web. 3 Dec 2012.[6] A. Gandhi and M. Harchol-Balter. “Are sleep states effectivein datacenters?”, International Green Computing Conference, pp. 1-10, Jun 2012.[7] Google Inc. “The Chromium Projects: Chromium OS”. http://www.chromium.org.[8] M. Guevara, B. Lubin, B. C. Lee. “Navigating heterogeneousprocessors with market mechanisms”, Proc. 19th HPCA. Feb2013.[9] K. Malladi, I. Shaeffer, L. Gopalakrishnan, D. Lo, B.C. Lee,M. Horowitz. “Rethinking DRAM power modes for energyproportionality”, Proc. 44th MICRO. Dec 2012.[10] D. Meisner, B.T. Gold, and T.F. Wenisch. “PowerNap:Eliminating Server Idle Power”, Proc. 14th ASPLOS, pp.205-216, Mar 2009.[11] D. Meisner, C.M. Sadler, L. A. Barroso, W-D Weber, T.F. Wenisch. “Power Management of Online Data-IntensiveServices”, Proc. 38th ISCA, pp. 319-330, Jun 2011.[12] R. Miller. “Report: Data Center Energy Use is Moderating”,Data Center Knowledge. N.p., 2011.[13] V. J. Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. “MobileProcessors for Energy-Efficient Web Search”, ACM Transactions on Computer Systems 29.4 (2011).

data must be restored upon resume thus incurring an expensive transition latency [10]. Using a power mode incurs a sleep and a resume latency. We focus our analysis on resume latency as this, to a large degree, determines the response time of a datacenter using a power state transition policy. Further, if the resume latency is fast enough, long

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Definition (Critical Path for a Project) Thecritical path for a projectis the critical path from START to END. Definition (Critical Time) Thecritical timefor a vertex or project is the processing time of its critical path. Robb T. Koether (Hampden-Sydney College) The Critical-Path Algorithm Mon, Apr 20, 2015 4 / 20

MARCH 1973/FIFTY CENTS o 1 u ar CC,, tonics INCLUDING Electronics World UNDERSTANDING NEW FM TUNER SPECS CRYSTALS FOR CB BUILD: 1;: .Á Low Cóst Digital Clock ','Thé Light.Probé *Stage Lighting for thé Amateur s. Po ROCK\ MUSIC AND NOISE POLLUTION HOW WE HEAR THE WAY WE DO TEST REPORTS: - Dynacó FM -51 . ti Whárfedale W60E Speaker System' .