Vector And SIMD Processors

2y ago

25 Views

2 Downloads

652.40 KB

24 Pages

Last View : 20d ago

Last Download : 3m ago

Upload by : Callan Shouse

Report this link

Download PDF

Transcription

Vector and SIMDProcessorsEric Welch & James EvansMultiple Processor SystemsSpring 2013

Outline IntroductionTraditional Vector Processorsooooo History & mance OptimizationsModern SIMD ProcessorsoooIntroductionArchitecturesUse in signal and image processing

History of Vector Processors Early WorkoDevelopment started in the early 1960s at Westinghouse Goal of the Solomon project was to substantially increase arithmeticperformance by using many simple co-processors under the control ofa single master CPUAllowed single algorithm to be applied to large data set SupercomputersooDominated supercomputer design through the 1970s into the 1990sCray platforms were the most notable vector supercomputers oCray -1: Introduced in 1976Cray-2, Cray X-MP, Cray Y-MPDemise In the late 1990s, the price-toperformance ratio drasticallyincreased for conventionalmicroprocessor designs

Description of Vector Processors CPU that implements an instruction set that operates on1-D arrays, called vectorsVectors contain multiple data elementsNumber of data elements per vector is typically referredto as the vector lengthBoth instructions and data are pipelined to reducedecoding timeSCALAR(1 operation)r2r1 r3add r3, r1, r2VECTOR(N operations)v1v2 v3vectorlengthadd.vv v3, v1, v2

Advantages of Vector Processors Require Lower Instruction Bandwitho Easier Addressing of Main Memoryo oLoop-related control hazards from the loop are eliminatedScalable Platformo Unlike cache access, every data element that is requested by theprocessor is actually used – no cache missesLatency only occurs once per vector during pipelined loadingSimplification of Control Hazardso Load/Store units access memory with known patternsElimination of Memory Wastageo Reduced by fewer fetches and decodesIncrease performance by using more hardware resourcesReduced Code SizeoShort, single instruction can describe N operations

Vector Processor Architectures Memory-to-Memory Architecture (Traditional)For all vector operation, operands are fetched directly frommain memory, then routed to the functional unito Results are written back to main memoryo Includes early vector machines through mid 1980s:o oAdvanced Scientific Computer (TI), Cyber 200 & ETA-10Major reason for demise was due to large startup time

Vector Processor Architectures (cont) Register-to-Register Architecture (Modern)All vector operations occur between vector registerso If necessary, operands are fetched from main memory intoa set of vector registers (load-store unit)o Includes all vector machines since the late 1980s:o oConvex, Cray, Fujitsu, Hitachi, NECSIMD processors are based on this architecture

Components of Vector Processors Vector Registersoooo Vector Functional Units (FUs)ooo Fully pipelined, new operation every cyclePerforms arithmetic and logic operationsTypically 4-8 different unitsVector Load-Store Units (LSUs)o Typically 8-32 vector registers with 64 - 128 64-bit elementsEach contains a vector of double-precision numbersRegister size determines the maximum vector lengthEach includes at least 2 read and 1 write portsMoves vectors between memory and registersScalar RegistersoSingle elements for interconnecting FUs, LSUs, and registers

Performance Optimizations Increase Memory Bandwidthoo Strip Miningo Equivalent to data forwarding in vector processorsResults of one pipeline are fed into operand registers of another pipelineScatter and Gatheroo Generates code to allow vector operands whose size is less than or greaterthan size of vector registersVector Chainingoo Memory banks are used to reduce load/store latencyAllow multiple simultaneous outstanding memory requestsRetrieves data elements scattered thorughout memory and packs theminto sequential vectors in vector registersPromotes data locality and reduces data pollutionMultiple Parallel Lanes, or PipesoAllows vector operation to be performed in parallel on multiple elements ofthe vector

Vector Chaining Example

Organization of Cray Supercomputer

Performance of Cray Supercomputers

Modern SIMD Introduction Single Instruction Multiple Data is part of Flynn's taxonomy(not MIMD as discussed in class)Performs same instruction on multiple data pointsconcurrentlyTakes advantage of data level parallelism within analgorithmCommonly used in image and signal processingapplicationso Large number of samples or pixels calculated with the same instructionDisadvantages:o Larger registers and functional units use more chip area and powero Difficult to parallelize some algorithms (Amdahl's Law)o Parallelization requires explicit instructions from the programmer

SIMD Processor Performance TrendsSIMD andMIMDSIMD

Modern SIMD Processors Most modern CPUs have SIMD architectureso Intel SSE and MMX, ARM NEON, MIPS MDMX These architectures include instruction set extensionswhich allow both sequential and parallel instructionsto be executed Some architectures include separate SIMD coprocessorsfor handling these instructions ARM NEONo Included in Cortex-A8 and Cortex-A9 processorsIntel SSEoIntroduced in 1999 in the Pentium III processoroSSE4 currently used in Core series

SIMD Processor Introduction128 bitsVLD.32 Q1, 0x0Q1M[3]M[2]M[1]M[0]VLD.32 Q2, 0x8Q2M[11]M[10]M[9]M[8]VADD.U32 Q3, Q2, Q1Q3M[3] M[11]M[2] M[10]M[1] M[9]M[0] M[8]VST.32 Q3, 0x100x10M[0] M[8]0x11M[1] M[9]non-SIMD8 load instructions4 add instructions4 store instructionsSIMD2 load instructions1 add instruction1 store instruction0x12M[2] M[10]16 total instructions4 total instructions0x13M[3] M[11]Possible Speedup 4

ARM NEON SIMD Architecture 16 128-bit SIMDregistersSeparatesequential andSIMD processorsBoth have accessto same L2 cachebut separate L1cachesInstructionsfetched in ARMprocessor andsent to NEONcoprocessorARM ProcessorNEON CoprocessorARM Cortex-A8 Processor and NEON SIMD coprocessor

Intel SSE SIMD Architecture Intel Core ArchitectureStreaming SIMDExtensions16 128-bit registersSIMD instructionsexecuted along withsequential instructionsAdds floating pointoperations to Intel'sMMX SIMDCombined SSE Functional Units

Software Programming Intel and ARM both have vectorizing compilers whichwill compile code using SIMD instructionsMany audio/video SIMD libraries availableTo achieve best performance, custom coding at theassembly level should be used

Specialized Instructions NEON SIMDo VZIP – Interleaves two vectorso VMLA – Multiply and accumulateo VRECPE – Reciprocal estimateo VRSQRTE – Reciprocal square root estimate Intel SSE4o PAVG – Vector averageo DPPS, DPPD – Dot producto PREFETCHT0 – Prefetch data into all cache levelso MONITOR, MWAIT – Used to synchronize across threads

Performance Impact of SIMD NEON on Cortex-A8 with gcc compiler Two vectorization methodso Intrinsics - Using intrinsic functions to vectorizeo Manual - Vectorizing using instructions at the assembly levelApplied to an image warping algorithmo Mapping a pixel from a source to destination image by an offseto Calculated on four pixels in parallel (max speedup 4)OriginalExecution time rogramming methods1.000of different SIMD2.1953.090

Performance Impact of SIMD SSE on Intel i7 and AltiVec on IBM Power 7 processorsSIMD applied to Media Bench II which containsmultimedia applications for encoding/decoding mediafiles (JPEG, H263, MPEG2)Tested three compilers with three methods:o Auto Vectorization - No changes to codeo Transformations - Code changes to help compiler vectorizeo Intrinsics - Functions which compile to SIMDMethodXLCICCGCCAutoVectorization1.66 (52.94%)1.84(71.77%)1.58 (44.71%)Transformations2.972.38-IntrinsicsAverage speedup and parallelizable loops for Media Bench II3.152.45-

Conclusions Vector processors provided the early foundation forprocessing large amounts of data in parallelVector processing techniques can still be found in videogame consoles and graphics acceleratorsSIMD extensions are a decendant of vector processorsand included in most modern processorsChallenging programming and Amdahl’s Law are themain factors limiting the performance of SIMD

Questions?

Components of Vector Processors Vector Registers o Typically 8-32 vector registers with 64 - 128 64-bit elements o Each contains a vector of double-precision numbers o Register size determines the maximum vector length o Each includes at least 2 read and 1 write ports Vector Functional Units (FUs) o Fully pipelin

Related Documents:

Concepts Introduced in Chapter 4 SIMD Advantages

Concepts Introduced in Chapter 4 vector architectures SIMD ISA extensions graphics processing units (GPUs) loop dependence analysis Vector Computers SIMD Extensions GPUs Loop Deps SIMD Advantages SIMD architectures can signi cantly improve performance by exploiting DLP when available in applications. SIMD processors are more energy e cient than .

10 Views

1y ago

Vector Processors - Indiana University Bloomington

Why Vector processors Basic Vector Architecture Vector Execution time Vector load - store units and Vector memory systems Vector length - VLR Vector stride Enhancing Vector performance Measuring Vector performance SSE Instruction set and Applications A case study - Intel Larrabee vector processor

28 Views

7m ago

Bringing the Full Power of Modern Hardware to the Open Web Platform - W3

Emscripten now targets SIMD.JS C/C JavaScript* 1.00 2.03 7.18 8.13 0 2 4 6 8 10 Speedup over Scalar JS Scalar JS Scalar C SIMD JS SIMD C Emscripten brings native SIMD apps to the open web platform Near-native SIMD.JS speedup

14 Views

6m ago

BODAS-service Version 3

PEAK PCAN-USB, PEAK PCAN-USB Pro, PEAK PCAN-PCI, PEAK PCAN-PCI Express, Vector CANboard XL, Vector CANcase XL, Vector CANcard X, Vector CANcard XL, Vector CANcard XLe, Vector VN1610, Vector VN1611, Vector VN1630, Vector VN1640, Vector VN89xx, Son-theim CANUSBlight, Sontheim CANUSB, S

155 Views

2y ago

Computer Architecture and Design

instruction, multiple data [1]) architecture and shared memory vector architecture. An early example of a distributed memory SIMD (DM-SIMD) architecture is the Illiac-IV [2]. A typical DM-SIMD architecture has a general-purpose scalar p

36 Views

2y ago

Performance comparison of SIMD-based HEVC decoders on mobile processor

adding vector eight times either in horizontal or vertical direction can improve performance with NEON for most of 8-bit and 16-bit interpolation modes. SIMD can process 4, 8 or even 16 adjacent samples at a time depending on the block width. Fig. 4. HEVC Inter/Intra Prediction B. SIMD with Intra Prediction

9 Views

1y ago

Intel SIMD architecture - 國立臺灣大學

Intel SIMD architecture Computer Organization and Assembly Languages Yung-Yu Chuang 2007/1/7 2 Overview SIMD MMX architectures MMX instructions

9 Views

1y ago

Updating Description Logic ABoxes

Introduction Description logics (DLs) are a prominent family of logic-based formalisms for the representation of and reasoning about conceptual knowledge (Baader et al. 2003). In DLs, concepts are used to describe classes of individuals sharing common properties. For example, the following concept de-scribes the class of all parents with only happy children: Personu has-child.Personu has .

39 Views

3y ago

Recent Views

AUTOMOTIVE INDUSTRY ANALYSIS REPORT and GUIDE

3.1 General Outlook of the Automotive Industry in the World 7 3.2 Overview of the Automotive Industry in Turkey 10 3.3 Overview of the Automotive Industry in TR42 Region 12 4 Effects of COVID-19 Outbreak on the Automotive Industry 15 5 Trends Specific to the Automotive Industry 20 5.1 Special Trends in the Automotive Industry in the World 20

1y ago

86 Views

Automotive Pathway Automotive Services Fundamentals

Automotive Pathway Automotive Services Fundamentals Course Number: IT11 Prerequisite: None Aligned Industry Credential: S/P2- Safety and Pollution Prevention and SP2- Mechanical and Pollution Prevention Description: This course introduces automotive safety, basic automotive terminology, system & component identification, knowledge and int

2y ago

228 Views

Articulation Agreements: College of Applied Technologies .

Hernando High School FL Automotive . Central Nine Career Center IN Automotive Elkhart Area Career Center IN Automotive . Kokomo Area Career Center IN Automotive North Lawrence Vo-Tech IN MLR Porter County Career Center IN Automotive Richmond High School IN Automotive Southeastern Career

2y ago

376 Views

Automotive Basics - Auto Upkeep

Automotive Basics - Course Description "Automotive Basics includes knowledge of the basic automotive systems and the theory and principles of the components that make up each system and how to service these systems. Automotive Basics includes applicable safety and environmental rules and regulations. In Automotive Basics, students will gain

1y ago

197 Views

Automotive Automotive Automotive - HSBC Bank Malaysia

This Merchant list is subject to change from time to time. Merchant(s) who are terminated from the Instalment program after the published date might still be reflected in this list. HSBC Cardholder(s) are advised to confirm the availability of HSBC Card Instalment Plan with the merchant. Automotive Automotive Automotive

1y ago

173 Views

On the Road: U.S. Automotive Parts Industry Annual Assessment

Table 12: Acquisitions of U.S. Automotive Parts Companies (SIC 3714) Table 13: Automotive Parts Exports, 2000-2010 Table 14: Automotive Parts Imports, 2000-2010 . Automotive parts consumption is linked to the demand for new vehicles, since roughly 70 percent of U.S. automotive parts production is for Original Equipment (OE) products. .

10m ago

72 Views

EMC TEST SYSTEMS FOR AUTOMOTIVE

AUTOMOTIVE EMC TEST SYSTEMS FOR AUTOMOTIVE ELECTRONICS AUTOMOTIVE EMC TEST SYSTEMS FOR AUTOMOTIVE ELECTRONICS Step 1 Step 2 Step 3: Set the parameters Step 4: Active test. Load dump pulses have high pulse energy, which can be highly destructive to electrical or electronic equipment. The LD 200N series simulates these pulses with high energy in a range of up to 1.2 seconds. The LD 200N .

3y ago

266 Views

Automotive Manufacturing - Select Georgia

Jobs created by Georgia’s automotive-related locations Toyo Tire North America Manufacturing and expansions in the last three years 32,000 Automotive-related engineers and production workers in Georgia Sources: EMSI 2020.3, press releases and Automotive Database, Georgia Power Community & Economic Development, 2020 Automotive Manufacturing

2y ago

166 Views

#1 OSAT for Automotive Packaging and Test

We Know Automotive Amkor has extensive experience with automotive process requirements shipping billions of units every year for automotive applications. Our packages meet or exceed automotive quality, reliability, burn-in and safe launch plan criteria. Amkor also has failure analysis, tri-temp test and statistical process capability in all .

1y ago

145 Views

Ipsos Automotive Center of Excellence

Global Automotive Center of Excellence -2014 Ipsos Automotive 9 Automotive Center of Excellence As global automotive markets get more sophisticated, they require vehicle manufacturers to offer the most relevant market propositions to match consumer needs. There is greater value than ever before for a global research partner, who understands

1y ago

126 Views

All about automotive engineering in a pocketbook The 8th edition has .

Automotive Automotive Handbook Handbook All about automotive engineering in a pocketbook The 8th edition has been revised and extended. Automotive Handbook Reference handbook for academic and personal use. ISBN 978--7680-4851-3 Contents - central themes Basic principles: physics, materials, machine parts, joining and bonding techniques

1y ago

135 Views

Brochure: Advanced Flash Storage Solutions for Automotive Applications

iNAND Automotive Embedded Flash Drives (EFDs) are designed to support the harsh environments, high reliability and quality required by the automotive industry. The automotive iNAND product portfolio supports both UFS and e.MMC interfaces in a small 11.5x13mm package with a wide range of capacities to provide automotive OEMs and Tier-1

1y ago

161 Views

Industry Skills Forecast and Proposed Schedule of Work Automotive

Executive summary The Automotive Retail, Service and Repair (AUR) and Automotive Manufacturing (AUM) Training Packages are critical elements in the Vocational Education and Training (VET) system, playing central roles in the training of learners that engage in the automotive industries. A productive and valuable Automotive Training

1y ago

134 Views

Automotive Programs Student Handbook - SCCIowa

include a basic knowledge of all facets of the automotive repair industry, followed by classroom practice and drills of basic skills utilized in the automotive repair industry. The curriculum includes an internship experience in an automotive repair business. The curriculum is evaluated and revised as automotive repair needs change in the industry.

10m ago

72 Views

automotIve

automotive manufacturers worldwide. Those companies that take a forward-thinking approach will gain a competitive advantage and secure a leadership position in a realigned automotive value chain. At Seco, we partner with OEMs and other vehicle-based organisations around the globe to help automotive manufacturers overcome their

3y ago

145 Views

Vector And SIMD Processors

It looks like you're using an ad-blocker