Decelerating Suspend And Resume In Operating Systems

2y ago
70 Views
2 Downloads
617.68 KB
25 Pages
Last View : 1d ago
Last Download : 3m ago
Upload by : Dahlia Ryals
Transcription

Decelerating Suspend andResume in Operating SystemsXSEL, Purdue ECEShuang Zhai, Liwei Guo, Xiangyu Li, and Felix Xiaozhu Lin02/21/20171

Mobile/IoT devices see many short-lived tasks Sleeping for a long time; Waking up frequently Smartwatch: 72 times per day Each task is short-lived Smartwatch: 10 secs Background task: 1 sec2

Suspend/Resume OS WorkflowCPU ONCPU OFFSync FilesystemCall IO DriversSuspendResumeFreeze TasksThaw TasksCall IO DriversCPU OFFCPU ON3

Suspend/Resume Is Expensive Slow suspend/resume is long known for desktop/server Suspend/resume mostly slowed down by SATA and USB devices These machines suspend/resume only occasionally Much worse on mobile/IoT due to short-lived tasks Suspend/resume takes 500 ms on Samsung Note4 Smartphone E.g. consume 43% of total energy on sensing benchmark[1] Need to understand suspend/resume on mobile/IoT devices1.M. Lentz, J. Litton, and B. Bhattacharjee. Drowsy power management. SOSP, 20154

Profiling Suspend/ResumeNexus 5Samsung GearSamsung Note 4Panda Board5

Suspend/Resume on Mobile SoC Is SlowSuspendResumeNexus 5119 ms88 msGear191 ms159 msNote 4231 ms316 msPanda262 ms492 ms6

Main Reason: IO Power Transitions Are Slowthaw tasksfreeze exus 5GearSuspendNote 4Pandasync fsNexus5GearNote4ResumePanda7

Slow IOs Are Various and DiverseNexus 5GearNote 4serial hslmdsspciehmmc hostmmc hostdwmmc2Top IO devicesNexus 5187Gear120Note 4116Number of IO drivers for each platform8

Alternative Solution:Async IO Power Transitions Asynchronous PM overlaps power transitions of multiple IO devices Key difficulty: dependencies among hundreds of IO devices Subtle and implicit OS may not know them Linux kernel community has a long debate Still very conservative about async PM9

Our Key ideaObjectiveOffloading suspend/resume to a very weak coreHardware supportA weak core (common on mobile SoCs)Software supportA baremetal virtual executor on the weak coreSuspendResumeKernelVirtualExecutionCPUWeak coreDRAMIOOffloading suspend/resume viavirtualization10

Weak Core on Modern SoCs Low power cores already exist on modern SoCs E.g. Apple motion coprocessor (Cortex-M3) Shared memory and IO bus; incoherent cache domain Heterogeneous but similar ISA (ARMv7/8 vs ARMv7m)11

Weak Core on Modern SoCs Low power cores already exist on modern SoCs E.g. Apple motion coprocessor (Cortex-M3) Shared memory and IO bus; incoherent cache domain Heterogeneous but similar ISA (ARMv7/8 vs ARMv7m) Weak cores are ideal for executing OS suspend/resume Idle power 3.8 mW vs 30 mW Kernel execution favors weak cores [1] Small code working set Less predictable control flow1. J. Mogul, J. Mudigonda, N. Binkert, P. Ranganathan, and V. Talwar. Using asymmetric singleisa cmps to save energy on operating systems. Micro, IEEE, 200812

Software Challenges Objective: Execute the kernel suspend/resume path on a weakcore, without cache coherence and without a unified ISA Manually partitioning mature kernels is infeasible Modern kernels are beasts Windows: 45M SLoC1 Linux 4.4: 16M SLoC2 Suspend resume code is complicate (30k SLoC in Linux) Commodity kernels are rapidly evolving1. 55322. https://www.linuxcounter.net/statistics/kernel13

Our Solution Launching a virtual machine on the weakcore to execute unmodified kernel binary forthe main CPUSuspendResumeKernelVirtualExecutionCPUWeak coreDRAM This contrasts with traditional virtualizationIOOffloading suspend/resume viavirtualization Host is much more powerful than guest14

System OverviewMainCoreMainCoreCPU ONCPU ONCPU OFFWeakCoreWeak ONSuspendCPU OFFSuspendedBinarytranslation ofunmodifiedkernelResumeCPU ONCommodityCPU ONOur SystemWeak OFF15

Does This Really Work? No one would believe binary translation works for us We need aggressive optimizations 20x slow down from initial implementation Reason: commodity binary translators are generic and conservative Status register is emulated Frequent Interrupt Check16

Our Key Optimizations Exploit ISA similarity (ARMv7 vs ARMv7M) Baremetal stacks Relaxed handling of interrupts and exceptions Kernel virtual memory17

Current ImplementationLinux 4.4QEMU basedvirtual executorDual ARMCortex-A9Dual ARMCortex-M3Cache & MMUCacheSystem-level InterconnectMainMemoryIO Platform: TI OMAP4 SoC Trimming down QEMU from 2.6MSLoC to 50.5K SLoc 4.5K SLoC new code A first-of-its-kind virtualizationenvironment on an embeddedcoreTI OMAP418

MicrobenchmarksNativeTranslated (unoptimized)Translated (optimized)callbackkfifo Performance metric: # of CPU cycles Baseline: native compilation & execution on themain CPU (Cortex-A9) Native: native compilation & execution onweak core (Cortex-M3) Translated (unoptimized/optimized): translated execution on weak coreglob0X5X10X15X20XOverhead in terms of cycle count19

MicrobenchmarksNativeTranslated (unoptimized)Translated (optimized)callbackkfifo Performance metric: # of CPU cycles Baseline: native compilation & execution on themain CPU (Cortex-A9) Native: native compilation & execution onweak core (Cortex-M3) Translated (unoptimized/optimized): translated execution on weak coreglob0X5X10X15X20XOverhead in terms of cycle count Optimization Result: 5x overhead reduction 2x within native execution Estimated Energy Saving: 70% energy reduced in suspend/resume 30% overall battery life extended1.M. Lentz, J. Litton, and B. Bhattacharjee. Drowsy power management. SOSP, 2015 Benchmark: Mobile sensing scenario [1] 20

Summary Observation: Busy/idle waits for IOs bottleneck OSsuspend/resume path Goal: Offloading suspend/resume to a weak corewith incoherent cache and heterogenous ISAs Key idea: Binary translate and execute unmodifiedkernel on weak core Highlight: For the first time we run a virtualenvironment on an embedded core for offloadingspecific kernel pathsSuspendResumeKernelVirtualExecutionCPUWeak coreDRAMMain CoreIOWeak CoreCPU ONCPU OFFBinary translation ofunmodified kernelCPU ONWeak ONSuspendedWeak OFF21

Q/A22

ARM big.LITTLESoCLittle Core PowerBig Core PowerRatioExynos 543085 mW (Cortex A7)750 mW (Cortex A15)8.8Exynos 5433189 mW (Cortex A53)1480 mW (CortexA57)7.8OMAP 446021.1 mW (Cortex M3)672 mW (Cortex A9)31.8Power Consumption Comparison between ARM big.LITTLE and OMAP4PerformanceEnergyPerformance/EnergyA15 (Exynox 5430)99.69MB/s19.75mWh 5.04A7 (Exynos 5430)77.93MB/s10.56mWh 7.38A57 (Exynos 5433)155.29MB/s27.72mWh 5.60A53 (Exynos 5433)109.36MB/s17.11mWh 6.39BaseMark OS II - XML Parsing Energy Efficiency23

How do we estimate our energy saving Without offloading: Ecpu (Tbusy exec Tbusy wait) * Pbusy Tidle * Pidle With offloading: Epm X * F * Tbusy exec * P’busy Tbusy wait * P’busy Tidle * P’idle24

Prior Art: Multikernel OSesSingle System Image One kernel for each type of cores Helios [1]Barrelfish [2]K2 [3]Popcorn Linux [4] Kernels often pass messages tocommunicate They give up compatibility with commoditykernels1.2.3.4.Kernel AKernel BCore ACore BDRAMIOA multikernel OSE. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. Hunt. Helios: heterogeneous multiprocessing with satellite kernels. SOSP, 2009.Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. TheMultikernel: A new OS architecture for scalable multicore systems. SOSP, 2009.F. X. Lin, Z. Wang, and L. Zhong. K2: A mobile operating system for heterogeneous coherence domains. ASPLOS, 2014Antonio Barbalace, Marina Sadini, Saif B.M. Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray and Binoy Ravindran,"Popcorn:25Bridging the Programmability Gap in Heterogeneous-ISA Platforms“. EuroSys, 2015

Suspend/Resume Is Expensive Slow suspend/resume is long known for desktop/server Suspend/resume mostly slowed down by SATA and USB devices These machines suspend/resume only occasionally Much worse on mobile/IoT due to short-lived tasks Suspend/resume takes 500 ms on Samsung Note4 Smartphone

Related Documents:

How to force Dark Suspend/Resume Run-time suspend display before system-suspend: xset -display :0 dpms force off sleep 2 sudo analyze_suspend.py Display will not be resumed upon system-resume, but availability is platfor

device_power_down() suspend devices which fail to suspend in device_suspend() function or others pm_ops- enter() main routines dependent on 74xx. The system suspend here device_power_up() resume devices suspend_finish() device_resume() resume devices entried in the dpm_off table

Suspend (S3) or hibernate (S4) can not be executed if CPU0 is detected offline: Because x86 BIOS requires CPU0 to resume from sleep To successfully resume from suspend/hibernate, CPU0 must be online before suspend or hibernate: Suspend or hibernate

suspend and resume callbacks only. The late suspend and early resume callbacks are only provided by a few drivers for special purposes. Accordingly, during sus-pend and analogously during resume there is a time in-terval in which devices may not be operational or even accessible to their drivers, but the processors can receive interrupts.

Don’t runtime resume the device during system suspend and system resume, unless it’s going to be used. - Avoid wasting energy. - Decrease system suspend time and system resume time. The runtime PM centric approach - how? Re-use runtime PM callbacks for system suspend/resume

3.4. Resume signal When a device is in the suspend state, the data K state on the device port means a resume from the suspend state. This means that the resume signal is the change from the data J state to the data K state in the full-speed segment. For example, the following timing diagram shows how USB ho

4.2.4 Using Oracle Primavera P6 Suspend and Resume It is unfortunate that the P6 Suspend and Resume requires the activity to have started but the picture below demonstrates how you can use the Suspend and Resume plus add an additional activity added below to

The automotive data ecosystem is large and complex, with fluctuating partnerships and alliances. Many players are working on positioning themselves in a future-ready place in the ecosystem. In this chapter we will therefore dive into topics related to the automotive data ecosystem, vehicle communication, use cases for vehicle generated data and market dynamics. KPMG Digital 7 Automotive .