Arm Architecture 2020 Extensions - Linaro

1y ago
8 Views
2 Downloads
892.75 KB
21 Pages
Last View : 17d ago
Last Download : 3m ago
Upload by : Harley Spears
Transcription

Linaro Connect Sept 2020Arm Architecture 2020 ExtensionsMartin WeidmannDirector Product Management, ATG ARMLVC20-214 2020 Arm Limited (or its affiliates)

Annual cadence: evolving the CPU architecturePointerAuthCryptoAArch64Armv8.0-A2 2020 Arm Limited (or its Armv8.1-A Armv8.2-A Armv8.3-A Armv8.4-A Armv8.5-A Armv8.6-A201420152016201720182019

What’s new in 2020Armv8.7-AFuture Architecture Technologies Support for 52-bit addresses with 4K and 16Kgranules Enhanced support for PCIe hot unplug Atomic store operations for interacting withaccelerators WFE/WFI with timeouts Improvements to PAN Branch Record Buffer Call Stack Recorder3 2020 Arm Limited (or its affiliates)

Armv8.7-A 2020 Arm Limited (or its affiliates)

52-bit VA/IPA/PAs Armv8.2-A introduced support for 52-bit VAs, IPAs and PAs Only available when using 64K granuleThere are some platforms which are unable to adopt 64K granules Armv8.7 extends 52 bit addressing to 4K and 16K granules Input addresses: T*SZ field rules relaxed to allow specifying up to 52 bits of address spaceCan result in an extra level of walk (level -1) Output addresses: 5To enable the larger output addresses, shareability attribute moved from descriptors to TCR ELx 2020 Arm Limited (or its affiliates)

PCIe hot-unplug (1)PCIe devices can be unplugged at anytime ArmInterconnectTypically 50 msIf a CPU has outstanding accesses tothe removed device, it could beblocked until the timeout expires50ms 100 million cycles @ 2GHzOther CPUs can continue to makeprogress6ArmLDR x0, PCIe ACould occur in the middle of an accessPCIe root complex responsible forgenerating a response after a fixedout– CPU blocked untilRoot Complexgeneratesresponse aftertimeout 2020 Arm Limited (or its affiliates)DeviceunexpectedlyremovedPCIe rootcomplexPCIe device APCIe device BRAM

PCIe hot-unplug (2)TLBI DSB waits for all memory transactionsusing old translation to completeWaits fortransaction tocomplete beforeresponding toTLBIArmBlocked waitingforacknowledgmentof TLBILDR x0, PCIe ATLBIIf a CPU receiving a TLBI simply waits for alloutstanding memory transactions to becomplete A CPU awaiting a PCIe hot-unpluggedcompletion will take 50mS toacknowledge the TLBICPU that sent out the TLBI now alsoexposed to a 50mS delayImpact beyond the CPU that is directlytalking to the PCIe endpoint 2020 Arm Limited (or its affiliates)TLBIDSBLDR x0, RAMInterconnectPCIe rootcomplexPCIe device APCIe device B7ArmRAM

XS attributeNew XS attribute for devices withpotentially long delays New qualifier to Device-*Can respond toincoming TLBIwithout waitingfor timeoutArmArmLDR x0, PCIe ATLBITLBI nXSDSB nXSLDR x0, RAMAckTLBI and DSB changes Existing TLBIs are unchangedNew TLBIs added which are only requiredto wait for transactions with XS 0 (fast)to completeSimilar changes to DSBInterconnectPCIe rootcomplexRAMMarked as Fast in TTsXS 0PCIe device AReceiving PE now only needs to trackwhether outstanding transactions arefast or slow (XS 1)PCIe device BMarked as Slow in TTsXS 18 2020 Arm Limited (or its affiliates)

64-byte atomicsArmWhether itemgot placed onthe queuesuccessfullyST64BV Check successEnqueueoperationGrowing trend for accelerators that support 64-byteatomic load and storesNew store with return value:Interconnect ST64BV Xs,Xt,[Xn SP]ST64BV0 Xs,Xt,[Xn SP]–PCIe rootcomplexAcceleratorWorkitemqueue9 2020 Arm Limited (or its affiliates) Substitutes the bottom 32 bits for value in ACCDATA EL1Xs acts a return valueNew instructions without return value: ST64B Xt, [Xn SP]LD64B Xt, [Xn SP]Only permitted to non-cacheable addresses

WFE/WFI with timeoutA lot of potential usage of WFE/WFI is blocked by the unlimited time that the WFx might beasleepArmv8.7 introduces variants of WFE/WFI with a software specified timeout Specified in terms of time, not cyclesWFET Xd / WFIT Xd Xd holds a 64-bit value to compare with CNTCVT EL0–––10Generates a local event to wake the WFxT when CNTCVT EL0 Xd valueIf WFxT is woken for any reason, the count will be discardedEvent only applies to the PE that executes the instruction 2020 Arm Limited (or its affiliates)

Enhancing PAN PSTATE.PAN was introduced as part of Armv8.1 Causes a permission fault for a privileged data access to unprivileged data memoryLooks for AP[1] 1 in the page tables. Did not consider the case of privileged data access to user execute-only memory AP[1] 0 but UXN 0Siguza’s blog - https://siguza.github.io/PAN/ Linux had offered the ability to mark memory as User Execute-only to protect JITed code Makes it harder to attack the JITed code, or to find gadgets within it But that can be exploited to work around the PAN protection SCTLR EL1/2 bit to extend PAN so it causes EL1/2 access to fault on pages that have EL0instruction or data access 11Permitted to built in any version from v8.1 2020 Arm Limited (or its affiliates)

Future Architecture TechnologiesDebug and VisibilityEnhancing software development on Arm 2020 Arm Limited (or its affiliates)

Enabling developersEnhanced performance attributionWhat’s my hot code?Feed in to tools like AutoFDO Where in my call stack or call graph?Better attribution of events to real code paths Enhanced debugLow-impact consumption of call stacks Stack unwinding in software is slow andintroduces probe effectsLower-overhead logging 13Capturing call stacks on interesting events,such as malloc/free 2020 Arm Limited (or its affiliates)

Enabling developersEnhanced performance attributionWhat’s my hot code?Feed in to tools like AutoFDO Where in my call stack or call graph?Better attribution of events to real code paths Enhanced debugLow-impact consumption of call stacks Stack unwinding in software is slow andintroduces probe effectsLower-overhead logging 14Capturing call stacks on interesting events,such as malloc/free 2020 Arm Limited (or its affiliates)

Enabling developersERROR: AddressSanitizer: heap-use-after-free on address 0x60700000dfb5READ of size 1 at 0x60700000dfb5 thread T0Enhanced performance attribution#0 0x4007d7 in D(char*) /home/use after free.cpp:6What’s my hot code?#2 0x2b3e6817ac04 in libc start main (/lib64/libc.so.6 0x21c04)#1 0x4007d7 in main /home/use after free.cpp:45#3 0x400826(a.out 0x400826)Feed in to tools like AutoFDO Where in my call stack or call graph?Better attribution of events to real code paths 0x60700000dfb5 is located 5 bytes inside of 80-byte regionfreed by thread T0 here:#0 0x2b3e669d3800 in interceptor free asan malloc linux.cc:45#1 0x4007a7 in C() /home/use after free.cpp:19Enhanced debug#2 0x4007a7 in B() /home/use after free.cpp:35Low-impact consumption of call stacks#4 0x4007a7 in main /home/use after free.cpp:44 Stack unwinding in software is slow andintroduces probe effectsLower-overhead logging Capturing call stacks on interesting events,such as malloc/free#3 0x4007a7 in A() /home/use after free.cpp:39#5 0x2b3e6817ac04 in libc start main (/lib64/libc.so.6 0x21c04)previously allocated by thread T0 here:#0 0x2b3e669d3b18 in interceptor malloc asan malloc linux.cc:62#1 0x40079c in C() /home/use after free.cpp:18#2 0x40079c in B() /home/use after free.cpp:35#3 0x40079c in A() /home/use after free.cpp:39#4 0x40079c in main /home/use after free.cpp:44#5 0x2b3e6817ac04 in libc start main (/lib64/libc.so.6 0x21c04)15 2020 Arm Limited (or its affiliates)

Call Stack Recorder Extension (CSRE)ObjectiveCapture the call stack in an easy to consume format To copy on periodic or event-driven interrupts To copy on malloc/free callsRequirementsUserspace or Kernel controllableCapture in main memory Allows simple memcpy() of contents, andscales to capture the full call stackLow overheads while recording and contextswitchingLow cost of reading the call stack on malloc/free16 2020 Arm Limited (or its affiliates)Call Stack Record Buffer Allocated by kernel or userspaceCall Stack Record Value of LR after BL, 8 bytes in size Written on each BL, updating the pointerFaults reported synchronously Pointer decremented on each RETCall Stack Recorder Separate controls provided at each ofEL0/EL1/EL2, for fast switching of recorder onentry to kernel Usual traps for Virtualization Separate traps for EL0 register reads and writes

Branch Record Buffer Extension (BRBE)ObjectiveBranch RecordCapture a recent sequence of branches inan easy-to-consume format For statistical capture and feed in to FDO tooling Source VA of taken branch/exception Target VA of taken branch/exception Info: source/target valid, branch/exception typeRequirementsBranch Record BufferLow compute and memory overheads forcapture/analysis EL1 feature, for recording EL0 and/or EL1 Accessible via system registers, with 3 perrecord 96 registers, for up to 32 records Banking system for 32 records Invalidate-buffer instruction To prevent future reads revealing old data ISB needed before reading (but not TSBCSYNC) Program trace (ETM) can be expensive in multiple ways Prefer uncompressed capture and no need for theprogram imageLow overheads while recordingLow cost of context switch and on reading17 2020 Arm Limited (or its affiliates)

Find out more 2020 Arm Limited (or its affiliates)

More /cpu-architecture/a-profile/exploration-toolsArm Architecture Reference Manual expected Jan 202119 2020 Arm Limited (or its affiliates)

Martin Weidmann Director Product Managementmartin.weidmann@arm.com 2020 Arm Limited (or its affiliates)Thank 사합니다धन्यवाद شكرا ًধন্যবাদ תודה

The Arm trademarks featured in this presentation are registeredtrademarks or trademarks of Arm Limited (or its subsidiaries) inthe US and/or elsewhere. All rights reserved. All other marksfeatured may be trademarks of their respective owners.www.arm.com/company/policies/trademarks 2020 Arm Limited (or its affiliates)

to wait for transactions with XS 0 (fast) to complete Similar changes to DSB Receiving PE now only needs to track whether outstanding transactions are fast or slow (XS 1) Arm LDR x0, PCIe A Arm TLBI nXS DSB nXS LDR x0, RAM Interconnect PCIe root complex PCIe device A PCIe device B RAM TLBI Can respond to incoming TLBI without waiting for .

Related Documents:

Reference Material §ARM ARM(“Architecture Reference Manual ”) §ARM DDI 0100E covers v5TE DSP extensions §Can be purchased from booksellers - ISBN 0-201-737191 (Addison-Wesley) §Available for download from ARM’swebsite §ARM v7-M ARM available for download from ARM’swebsite §Conta

(Dual-Monitor Arm, Single Monitor Arm, Monitor Arm Laptop Stand) If your monitor is too heavy (arm won't stay up) or too light (arm won't stay down), you need to . adjust the spring arm tension to hold them at the right height. Insert the long M6 Allen Wrench that came with your Monitor Arm into the . opening to adjust the tension bolt.

Trx arm workouts pdf. Trx band arm workout. Trx full arm workout. Trx chest and arm workout. Trx shoulder and arm workout. Trx arm workout video. Trx arm workout youtube. Whether you're looking for TRX exercises for beginners or a more advanced TRX workout plan, these moves have something for everyone. "The pike is a personal favorite of mine!"

The purpose of this manual is to describe Thumb -2, its Instruction Set Architecture (ISA), and the changes to the programmers' model it introduces. This ma nual also describes the extensions to the ARM ISA introduced at the same time. Thumb-2 is a superset of the ARMv6 Thumb ISA described in the ARM Architecture Reference Manual (ARM DDI .

SVE support in progress by Arm Commercial tools Arm Fast Models (for bare-metal simulation) ArmIE (for process simulation) Arm Compiler (for bare metal use-case) Arm Compiler for Linux (for Linux user-space use-case)

Figure 2. Design of Space craft with robotic arm space in the launching vehicle compared to the traditional rigid, fixed geometry robotic arm. Figure 3. Morphing robotic arm section 3. DYNAMIC MODEL OF ROBOTIC ARM In this section, dynamic model of the morphing arm based on telescopic type morphing beam is derived. The robotic arm is assumed to .

8.) Once the Loading Arm is mounted to inlet-supply piping, reconnect torsion-spring link arm via pivot pin. 9.) Secure pivot pin with E-clips. 10.) Raise outboard section of Loading Arm until Link Arm holes align with pin lugs on outboard piping. 11.) For a supported boom-style arm, Loading Arm must be supported until the pillow block or the

Created and organised by The Interface Mechanical Civil ‘Thou’ (μm) 1/16 (mm) EN 13001-02 Regular, Variable, & Occasional Loads