Handling Concurrency In Embedded Software Systems From .

2y ago
112 Views
3 Downloads
4.13 MB
66 Pages
Last View : 17d ago
Last Download : 2m ago
Upload by : Milo Davies
Transcription

JASMIN JAHIĆ, SIMON ic.github.io/hipeac202109:30 - 13:00, 18.01.2021,BUDAPEST, HUNGARYHANDLINGCONCURRENCY INEMBEDDED SOFTWARESYSTEMS FROMARCHITECTURAL POINTOF VIEW: PART 1

9:30Session 1: Fundamental Issues withConcurrency in Embedded SoftwareSystems from Architectural Point of View10:3010:45AGENDASession 2: Modelling and DSE Methodsfor Mixed-Critical Software Systemsusing Multicore Architectures11:4512:0013:00Session 3: Synchronization inConcurrent Software is an ArchitecturalDecision

9:30Session 1: Fundamental Issues withConcurrency in Embedded SoftwareSystems from Architectural Point of View10:3010:45AGENDASession 2: Modelling and DSE Methodsfor Mixed-Critical Software Systemsusing Multicore Architectures11:4512:0013:00Session 3: Synchronization inConcurrent Software is an ArchitecturalDecision

9:30Introduction to the topicUnderstand the basics of software systemarchitectureSESSION 1Understand the basics of computing lawsand how they relate to architecture topic10:30Understand important architecturalproperties of embedded systems affectedby introducing concurrency

LITERATURE [1] The Free Lunch Is Over: A Fundamental Turn TowardConcurrency in Software, Dr. Dobb's Journal, 30(3), March 2005 [2] Software Architecture in Practice, Len Bass, Paul Clements, RickKazman, 3rd edition, 2012 [3] Pragmatic Evaluation of Software Architectures, J. Knodel, M.Naab, 2016 [4] G. M. Amdahl, "Computer Architecture and Amdahl's Law," inComputer, vol. 46, no. 12, pp. 38-46, Dec. 2013 [5] A glimpse of real-time systems theory and practice in the wake ofmulticore processors and mixed-criticality, Tullio Vardanega,University of Padua, Italy, ACACES 2020, HiPEAC es/8/ The Art of Multiprocessor Programming, M. Herlihy, N. Shavit, 2011

MOORE’S LAWAND cessor-trend-data

Free lunch: Every new generation of processors wouldexecute with higher frequency – software executionbecomes automatically faster – is over! [1] Post Dennard scaling breakdown performance drivers:MOORE’S LAWAND DENNARDSCALING Computer architecture improvements Concurrency and parallelism (forced to use multicores) Power consumption Drivers for using multicores Improve execution time Improve throughput Redundancy (availability, reliability) Power consumption Without compromising other system quality propertiesPentium DualCore, 2007Athlon 64 X2, 2007

SOFTWARESYSTEMARCHITECTURE “Software architecture is the structure of thestructures of the system, which comprisesoftware components, the externally visibleproperties of those components, and therelationships among them.” [2] Requirements Drivers Decisions

n spaceexplorationReasoningDecisionmaking

ISO/IEC 25010:2011 - systems and softwarequality requirements and evaluationSOFTWAREQUALITY ISO/IEC/IEEE 12207 - systems and softwareengineering - software life cycle processes IEEE 730 - software quality assurance IEEE 1012 - verification and validation (V&V)Functional suitabilityPerformance efficiencyCompatibilityUsabilityFunctional completenessTime yFunctional correctnessResource utilizationInteroperabilityLearnabilityFunctional appropriatenessCapacityOperability

QUALITYDRIVERS Quantification of quality in a context Quality template [3]IDUnique identifierStatusNameName of scenarioOwnerQualityRelated quality attribute: exactly one attributeshould be t applying to this scenario. May describeboth context and status of the system.StimulusThe event or condition arising from thisscenario.ResponseThe expected reaction of the system to thescenario event.

QUALITYDRIVERS FORADOPTINGMULTICORES:SET#1 Execution time Redundancy (availability, reliability) Power consumption

EXECUTION TIME:IDEAL QUALITYDRIVEREXPECTATIONSID StatusName OwnerQualityExecution on software is executing on asingle core CPU.#cores 1Execution time tStimulusMigrate to a double core CPU#cores 2ResponseReduce execution time by half.Execution time t/2

Some operations have to execute physically sequentially.THEORETICALLIMITATIONS OFPERFORMANCEGAINS [4] “If one decided to improve the performance byputting two processors side by side with shared memory,one would find approximately 2.2 times as muchhardware. The additional two-tenths in hardwareaccomplish the crossbar switching for the sharing. Theresulting performance achieved would be about 1.8. the assumption each processor utilizing half of thememories about half of the time. “, ILLIAC IV computer Gene M. Amdahl. 1967. Validity of the single processorapproach to achieving large scale computing capabilities.In Proceedings of the April 18-20, 1967, spring jointcomputer conference (AFIPS '67 (Spring)). Association forComputing Machinery, New York, NY, USA, 483–485.

THEORETICALLIMITATIONS OFPERFORMANCEGAINS Some logical problems are hard or impracticalto partition into parts that can executeconcurrently. Amdahl’s law 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 𝑇𝑠 𝑇𝑝𝑇𝑝𝑇𝑠 𝑛1; n – number of cores; T 11 𝑇𝑠 (𝑇𝑠 𝑐𝑜𝑛𝑠𝑡.) lim𝑇𝑠 𝑛1𝑇𝑛 𝑇𝑠 𝑝𝑛 𝑇1𝑠 Assumptions: Fixed-sized problem; Tp is independent of n. The slowest task’s part limits the speedupParallelizableNot parallelizable – sequential onlyTExecution time T

AMDAHL’S LAW Effect of Amdahl’s law on speedup as a fraction of clockcycle time in serial mode, John L. Hennessy and David A.Patterson. 2019. A new golden age for computerarchitecture. Commun. ACM 62, 2 (February 2019), 48–60.DOI:https://doi.org/10.1145/3282307 “For example, when only 1% of the time is serial, thespeedup for a 64-processor configuration is about 35. “

𝑇 𝑇𝑠 𝑇𝑝 /n; Assumptions:GUSTAFSON’SLAW The problem scales with the number of availablecores (NOT fixed-sized problem) Fixed execution time Increase in throughput John L. Gustafson. 1988. Reevaluating Amdahl'slaw. Commun. ACM 31, 5 (May 1988), 532–533

AMDAHL’S VSGUSTAFSONASSUMPTIONS𝐴𝑚𝑑𝑎ℎ𝑙 ′ 𝑠 𝑙𝑎𝑤B.H.H. Juurlink and C. H. Meenderinck.2012. Amdahl's law for predicting thefuture of multicores consideredharmful. SIGARCH Comput. Archit.News 40, 2 (May 2012), ustafson’s 𝑙𝑎𝑤

EXECUTION TIME Parallelise a single task Increase throughputImproveexecution timeAverage caseexecution timeWorst case executiontimeSingle taskUser experienceReal-time constraintsGroup of tasksUser experience(New features)Real-time constraints/Freedom from interferenceFrequency ofexecution [app,execution path]TimeBest CaseExecution TimeWorst CaseExecution TimeUpper Bound

SOFTWARE INEMBEDDEDSYSTEMS7[s]5[s].7[s]?.7[s]5[s].

WHAT COULDPOSSIBLY GOWRONG?Supervised Testing of EmbeddedConcurrent Software, PhD thesis,Jasmin Jahic, 2020

QUALITYDRIVERS FORADOPTINGMULTICORES:SET#2 Average execution time User experience Real-time constraints Safety-critical Do not compromise execution correctnessImproveexecution timeAverage caseexecution timeWorst case executiontimeSingle taskUser experienceReal-time constraintsGroup of tasksNew featuresReal-time constraints/Freedom from interference

QUALITYPROPERTIESOF EMBEDDEDSYSTEMSRELATED TOMULTICORESSet#1Execution timeRedundancy (availability, reliability)Power consumptionSet#2Average execution timeUser experienceReal-time constraintsSafety-criticalDo not compromise execution correctness

EXECUTIONTIME: SIMPLECASECore#1

EXECUTIONTIME: SIMPLECASECore#1Sense amplifiersCache replacement policyL1 CacheTranslation lookasidebuffer (TLB)L2 CachePage tableCore#1Memory busMemorycontrollerPipeline and te-backHDD/SSDDRAM memory banks

CHALLENGE:EXECUTIONTIME CPU: Pipelines Speculation Cache behaviour Cache pre-emption Memory hierarchy Application software Execution path - Input Design and Analysis of Time-Critical Systems, JanReineke, Saarland University, Germany, SummerSchool ACACES 2017

MEMORYACCESSComputer architecture : a quantitative approach / John L. Hennessy, David A. Patterson.5th edition, 2011Patterson, D.A. & Hennessy, J.L. (2017). Computer organization and design: Thehardware/software interface RISC-V edition

htm#shot5

EXECUTIONTIME:MULTIPLETASKS CASE Single core execution time: 12 [s] Dual-core execution time: 7 [s] Speedup: 1.71x7[s]5[s]Core#1Core#2

Cache replacement policyCache coherenceL1 CacheL2 CacheSense amplifiersDRAM memory banksTranslation lookasidebuffer (TLB)Core#1Page tableL1 CacheCore#2Memory busMemorycontrollerPipeline and kI1HDD/SSDI2I3Pipeline and te-backEXECUTION TIME:MULTIPLE TASKS CASE

“The WCET of even the simplest single-pathprogram running alone on a CPU does not stay thesame when other programs run on other CPUs” [5]Single task execution timePROARTIS: PRObabilisticAnalyzable Real TimeSystems ncyWCET OFTASKS ONMULTICORESExecution time

EXECUTIONTIME:MULTIPLETASKS CASE New task 3: 7 [s]7[s]5[s]Core#1Core#2

EXECUTIONTIME:MULTIPLETASKS CASE Single core execution time: 19 [s] Dual-core execution time: 12 [s] Speedup: 1.58x7[s]5[s]Core#17[s]Core#2

Tasks schedulingCache replacement policyCache coherenceL1 CacheL2 CacheSense amplifiersDRAM memory banksTranslation lookasidebuffer (TLB)Core#1Page tableL1 CacheCore#2Memory busMemorycontrollerPipeline and kI1HDD/SSDI2I3Pipeline and te-backEXECUTION TIME:MULTIPLE TASKS CASE

QUALITYDRIVERS FORADOPTINGMULTICORES:SET#3 Core affinity Scheduling policy Interrupts

Definitions [5]: A valid schedule is said to be feasible if itsatisfies the temporal constraints of every job.SCHEDULINGONMULTICOREPROCESSORS A job set is said to be schedulable by ascheduling algorithm if that algorithm alwaysproduces a valid schedule for that problem A scheduling algorithm is optimal if it alwaysproduces a feasible schedule when one exists Utilisation Ui of a task Ti: The ratio betweenexecution time (Ci) of a task and a period of time𝐶Pi: 𝑈𝑖 𝑃𝑖𝑖 Utilisation for the system: U σ𝑖 𝑈𝑖 m; m –number of cores

UtilisationSCHEDULINGONMULTICOREPROCESSORS For m resources (cores) and n tasks, how toschedule tasks so to avoid underutilisation ofresources? How to avoid idle resources? (withoutusing static scheduling), while at the same time Minimise pre-emption Minimise spinning Deadlines No optimal on-line scheduler can exist for a set ofjobs with two or more distinct deadlines on any(𝑚 1) multiprocessor system. Theorem [Hong,Leung: RTSS 1988, IEEE TCO 1992]

7[s]5[s]Core#1EXECUTIONTIME:MULTIPLETASKS CASE7[s]Core#27[s]5[s]7[s]TimeToo late to decide aboutscheduling.

EXECUTIONTIME:MULTIPLETHREADS CASE Single core execution time: 19 [s] Dual-core execution time: 9.5 [s] Speedup: 2x (ideally, but not really)7[s]2.5[s]Core#15[s]4.5[s]Core#2

CONCURRENCYBUG EXAMPLECPUCORE 1: CORE 2:thread1 thread2Sthread1100R(S)S100200100 100W(S)thread2S100100200thread1R(S)100 S)200R(S)S100thread1LOCKthread2LOCKR(S)100100 100200W(S)WAIT200200150200-50W(S)UNLOCKR(S)

Ways and means to partition software partitioning strategyQUALITYDRIVERS FORADOPTINGMULTICORES:SET#4 Thread start-up time Synchronisation Liveness Concurrency bugs Bugs that exist on execution paths possible onlybecause of concurrency

QUALITY PROPERTIES OF EMBEDDED SYSTEMSRELATED TO MULTICORESSet#1Execution timeRedundancy(availability, reliability)Power consumptionSet#2Average executiontimeUser experienceReal-time constraintsSafety-criticalDo not compromiseexecution correctnessSet#3Core affinityScheduling policyInterruptsSet#4Ways and means topartition software partitioning strategyThread start-up timeSynchronisationLivenessConcurrency bugsBugs that exist onexecution pathspossible only becauseof concurrency

CPU performance � 𝑐𝑜𝑢𝑛𝑡 𝐶𝑃𝐼𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒 Instruction countCOMPUTERARCHITECTUREIMPROVEMENTS CPI - cycles per instruction Clock rate Focus on architectural improvements and how to use thelarger number of transistors without being reliant onsilicon performance improvements Instruction set (e.g., RISC-V) Instruction-level parallelism - Pipelining Data-level parallelism Prediction (e.g., branch prediction)

Process pProcess pThread 1AMULTITHREADEDPROCESSThread 2Thread nT1:StackpointerStackpointerHeapAllocatedBSS & DATA(staticvariables)TextAllocatedHeapBSS & DATA (static variables)Text„ each thread runs independently of the others, and each thread may run a different sequence ofinstructions.“, C Concurrency in action, practical multithreading, Anthony Williams, 2012

FREE LUNCHID001StatusName OwnerQualityAverage case execution time – single task – tSingle task is executing on a CPUExecution time tStimulusMigrate to a new hardware (CPU) generationplatform#cores, CPU architectureimprovements, CPU frequency,memory (size, speed,hierarchy)ResponseSignificantly reduced (by factor k) execution time Execution time t/k

FREE LUNCHDriver#001#cores – irrelevant –k1 0Execution time t/kk k1 k2 k3 k4CPU architectureimprovements - k2CPU frequency – k3 0Memory – k4

FREE LUNCHID001StatusName OwnerQualityAverage case execution time – single task – nonew tasks - no ingle task is executing on a CPUExecution time tStimulusMigrate to a new hardware (CPU) generationplatform#cores, CPU architectureimprovements, CPU frequency,memory (size, speed,hierarchy)ResponseSignificantly reduced (by factor k) execution time Execution time t/k

Set#3Core affinityTHROUGHPUT ANDUSER EXPERIENCEScheduling policyInterruptsID002StatusName OwnerQualityAverage case execution time – multiple tasks – no Stakeholdersnew tasks - no partitioningQuantificationEnvironmentMultiple tasks are executing on a CPUExecution time tStimulusMigrate to a new hardware (CPU) generationplatform#cores, CPU architectureimprovements, CPU frequency,memory (size, speed,hierarchy), set#3 paramsResponseSignificantly reduced (by factor k) execution time Execution time t/k

Set#3Core affinityTHROUGHPUT ANDNEW FEATURESScheduling policyInterruptsID003StatusName OwnerQualityAverage case execution time – multiple tasks –new tasks – no ultiple tasks are executing on a CPUExecution time tStimulusAdd new features/new tasks and reconfigure thesystem#features (and theirrequirements), set#3 paramsResponseSystem runs with the new features, and with anew execution time that is acceptable#newFeatures, new executiontime

Set#3THROUGHPUT ANDRECONFIGURATIONCore affinityScheduling policyInterruptsID004StatusName OwnerQualityAverage case execution time – multiple tasks – no Stakeholdersnew tasks - no partitioningQuantificationEnvironmentMultiple tasks are executing on a multicore CPUExecution time tStimulusConfigure set#3 parametersset#3 paramsResponseSignificantly reduced (by factor k) execution time Execution time t/k

Set#3Core affinitySPEEDUP OF ASINGLE TASKSet#4Scheduling policyWays and means to partition software partitioning strategyInterruptsThread start-up timeSynchronisationLivenessConcurrency bugsBugs that exist on execution paths possibleonly because of concurrencyID005StatusName OwnerQualityAverage case execution time – single task –partitioning – no ask is executing on a CPUExecution time t; #cores 1StimulusPartition the task into threads#treads 1, set#3 params, set#4params (partitioning strategy,thread start-up time)ResponseSignificantly reduced (by factor k) execution time Execution time t/k

Set#3Core affinitySPEEDUP OF ASINGLE TASKSet#4Scheduling policyWays and means to partition software partitioning strategyInterruptsThread start-up timeSynchronisationLivenessConcurrency bugsBugs that exist on execution paths possibleonly because of concurrencyID006StatusName OwnerQualityAverage case execution time – single task –partitioning – dependencies, shared memoryStakeholdersQuantificationEnvironmentTask is executing on a CPUExecution time t; #cores 1StimulusPartition the task into threads#treads 1, set#4 params, set#3paramsResponseSignificantly reduced (by factor k) execution time Execution time t/k

SOFTWAREPARTITIONING MULTITHREADING What else is affected by partitioning softwaretasks into threads? Part 3: Synchronization in Concurrent Software isan Architectural Decision

Set#3Core affinityScheduling policyInterrupts We can try and limit concurrency (set#3parameters)WHAT ABOUTWORST CASEEXECUTIONTIME? In general, more cores and more tasks makes itharder to predict WCET – increase hardwareinterference Optimal scheduling in multicores Some theoretical concepts – hard to implement [5](RTOS not ready) Use multicores to decrease WCET? Not (always) a good idea [5]

SOME APPROACHESFOR PREDICTINGEXECUTION TIMEBB4BB6BB2BB5 Usually WCETBB1 Precision Timed (PRET)Machines ptolemy.berkeley.edu/projects/chess/pret/ aiT WCET Analyzers www.absint.com/aitBB3Tasks schedulingCache replacement policyCache coherenceL1 Cache Timing Behavior ofAUTOSAR Multi-Core ECUs- www.timingarchitects.com/L2 CacheSense amplifiersTranslation lookasidebuffer (TLB)Core#1 Binary executables Intrinsic cache andpipeline behaviorBB7Page tableL1 CacheCore#2Memory busMemorycontrollerPipeline and kI1HDD/SSDI2I3Pipeline and te-backDRAM memory banks

Model hardware – level depends onprediction needs Transistors Memory (cache, DRAM, cache policy)ARCHITECTUREMODELLING Processor (pipelining, temperature,number of cores, frequency) Static code analysis Dynamic monitoring Perform analysis on models

AMALTHEA Open source tool platform for engineering embeddedmulti- and many-core software systems http://www.amalthea-project.org/

ARCHITECTURALVIEWS FORCONCURRENCYAND 191-viewpoints-andperspectives.pdf Process View - ”4 1”view,P. B. Kruchten, “The 4 1view model ofarchitecture,” IEEEsoftware, vol. 12, no. 6, pp.42–50, 1995 Concurrency View, N.Rozanski and E. Woods,Software systemsarchitecture: working withstakeholders usingviewpoints andperspectives, 2nd ed.Upper Saddle River, NJ:Addison-Wesley, 2012.

ARCHITECTURALVIEWS FORMULTITHREADEDPROGRAMS - AFRAMEWORK FORAUTOMATICEXTRACTION OFCONCURRENCYRELATEDARCHITECTURALPROPERTIESFROM ithub.io/

SystemCSIMULATORS Memory (e.g., DRAMSys: Tool forOptimizing Memory Systems throughSimulation Analyses https://www.iese.fraunhofer.de/en/innovation trends/autonomoussystems/memtonomy/DRAMSys.html) The Sniper Multi-Core Simulator https://snipersim.org//w/The Sniper Multi-Core Simulator gem5 - https://www.gem5.org/

Speedups from performance engineering a program that multiplies two4096-by-4096 matrices. “Absolute speedup” is time relative to Python, and“relative speedup,” which we show with an additional digit of precision, istime relative to the preceding line.(4) parallelizing the code to run on all 18 of the processing cores, (5)exploiting the processor’s memory hierarchy, (6) vectorizing the code, and(7) using Intel’s special Advanced Vector Extensions (AVX) instructions.IS CONC

LITERATURE [1] The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software, Dr.Dobb's Journal, 30(3),

Related Documents:

devoted to designing concurrency control methods for RTDBS and to evaluating their performance. Most of these algorithms use serializability as correctness criteria and are based on one of the two basic concurrency control mechanisms: Pessimistic Concurrency Control [3, 12] or Optimistic Concurrency Control [2, 4, 5, 6, 11]. However, 2PL

Concurrency control Concurrency control in DBS methods for scheduling the operations of database transactions in a way which guarantees serializability of all transactions ("between system start and shutdown") Primary concurrency control methods – Locking (most important) – Optimistic concurrency control – Time stamps

Core Java Concurrency From its creation, Java has supported key concurrency concepts such as threads and locks. This guide helps Java developers working with multi-threaded programs to understand the core concurrency concepts and how to apply them. Topics covered in this guide include built-in Java

CS390C: Principles of Concurrency and Parallelism Course Overview Introduction to Concurrency and Parallelism Basic Concepts Interaction Models for Concurrent Tasks Shared Memory, Message-Passing, Data Parallel Elements of Concurrency Threads, Co-routines, Events Correctness Data races, linearizability, deadlocks, livelocks, serializability

Schemes for Concurrency control Locking Server attempts to gain an exclusive ‘lock’ that is about to be used by one of its operations in a transaction. Can use different lock types (read/write for example) Two-phase locking Optimistic concurrency control Time-stamp based concurrency control

of Software Engineering Lecture 17: Deadlock Slides created by Magee and Kramer for the Concurrency textbook. Concurrency: Deadlock 2 Magee/Kramer 2nd Edition Chapter 6 Deadlock. Concurrency: Deadlock 3 Magee/Kramer 2nd Editio

2. Embedded systems Vs General Computing system Page 4 Sec 1.2 ; 3. History of embedded systems , classification of embedded system Page 5,6 Sec 1.3 , Sec 1,4 . 4. Major application area of embedded sys Page 7 Sec 1.5 5. Purpose of embeded system Page 8 Sec 1.6 6. Typical Embedded sys: Core of embedded system Page 15 Chap 2 : 7. Memory Page 28

The success of the American Revolution inspired subsequent revolutions in both the Old and New Worlds. The French Revolution of 1789 was rooted in complex political, social, and economic causes. Politically, the king was an absolute monarch with unlimited powers to levy taxes, conduct foreign affairs, and make and enforce any law he deemed necessary. Socially, the French people were divided .