Vector Processors - Indiana University Bloomington

7m ago

27 Views

1 Downloads

822.83 KB

49 Pages

Last View : 23d ago

Last Download : 3m ago

Upload by : Gia Hauser

Report this link

Download PDF

Transcription

Vector Processors Kavitha Chandrasekar Sreesudhan Ramkumar

Agenda Why Vector processors Basic Vector Architecture Vector Execution time Vector load - store units and Vector memory systems Vector length Control Vector stride

Limitations of ILP ILP: – Increase in instruction width (superscalar) – Increase in machine pipeline depth – Hence, Increase in number of in-flight instructions Need for increase in hardware structures like ROB, rename register files Need to increase logic to track dependences Even in VLIW, increase in hardware and logic is required

Vector Processor Work on linear arrays of numbers(vectors) Each iteration of a loop becomes one element of the vector Overcoming limitations of ILP: – Dramatic reduction in fetch and decode bandwidth. – No data hazard between elements of the same vector. – Data hazard logic is required only between two vector instructions – Heavily interleaved memory banks. Hence latency of initiating memory access versus cache access is amortized. – Since loops are reduced to vector instructions, there are no control hazards – Good performance for poor locality

Basic Architecture Vector and Scalar units Types: – Vector-register processors – Memory-memory Vector processors Vector Units – Vector registers (with 2 read and 1 write ports) – Vector functional units (fully pipelined) – Vector Load Store unit(fully pipelined) – Set of scalar registers

VMIPS vector instructions

MIPS vs VMIPS (DAXPY loop)

Execution time of vector instructions Factors: – length of operand vectors – structural hazards among operations – data dependences Overhead: – initiating multiple vector instructions in a clock cycle – Start-up overhead (more details soon)

Vector Execution time (contd.) Terms: – Convoy: – set of vector instructions that can begin execution together in one clock period – Instructions in a convoy must not contain any structural or data hazards – Analogous to placing scalar instructions in VLIW – One convoy must finish before another begins – Chime: Unit of time taken to execute one convoy Hence for vector sequence m convoys executes in m chimes Hence for vector length of n, time m n clock cycles

Example Convoy

Start-up overhead Startup time: Time between initialization of the instruction and time the first result emerges from pipeline Once pipeline is full, result is produced every cycle. If vector lengths were infinite, startup overhead is amortized But for finite vector lengths, it adds significant overhead

Startup overhead-example

Vector Load-Store Units and Vector Memory Systems Start-up time: Time to get first word from memory into a register To produce results every clock multiple memory banks are used Need for multiple memory banks in vector processors: – Many vector processors allow multiple loads and stores per clock cycle – Support for nonsequential access – Support for sharing of system memory by multiple processors

Example Number of memory banks required:

Real world issues Vector length in a program is not always fixed(say 64) Need to access non adjacent elements from memory Solutions: – Vector length Control – Vector Stride

Vector Length Control Example: Here value of ‘n’ might be known only during runtime. In case of parameters to procedure, it changes even during runtime Hence, VLR (Vector Length Register) is used to control the length of a vector operation during runtime MVL (Maximum Vector Length) holds the maximum length of a vector operation (processor dependent)

Vector Length Control(contd.) Strip mining: – When vector operation is longer than MVL, this concept is used

Execution time due to strip mining Key factors that contribute to the running time of a strip-mined loop consisting of a sequence of convoys: 1. Number of convoys in the loop, which determines the number of chimes. 2. Overhead for each strip-mined sequence of convoys. This overhead consists of the cost of executing the scalar code for strip-mining each block, plus the vector start-up cost for each convoy. Total running time for a vector sequence operating on a vector of length n,Tn:

Example

Vector Stride To overcome access to nonadjacent elements in memory Example: This loop can be strip-mined as a vector multiplication Each row of B would be first operand and each column of C would be second operand For memory organization as column major order, B’s elements would be non-adjacent Stride is distance(uniform) between the non-adjacent elements. Allows access of nonsequential memory elements

Vector processors - Contd.

Agenda Enhancing Vector performance Measuring Vector performance SSE Instruction set and Applications A case study - Intel Larrabee vector processor Pitfalls and Fallacies

Enhancing Vector performance General o Pipelining individual operations of one instruction o Reducing Startup latency Addressing following hazards effectively o Structural hazards o Data hazards o Control hazards

Pipelining & reducing Startup latency

Addressing Structural hazards - Multiple Lanes

Addressing Structural hazards - Multiple Lanes Addressed using pipelining and parallel lanes

Multiple Lanes - Contd. Registers & Floating point units are localized within lanes

Addressing Data hazards - Flexible chaining Similar to Forwarding Chaining allows a vector operation to start as soon as the individual elements of its vector source operand become available Example: Instruction Startup time (cycles) Vector length (units) MULV.D V1, V2, V3 7 64 ADDV.D V4, V1, V5 6 64

Flexible Chaining - Contd. MULV.D V1, V2, V3 ADDV.D V4, V1, V5 Unchained Chained Time (cycles) VLM VLA STM VLM/A STM STA 141 STA 77 cycles / result 141 / 64 2.2 FLOPS / clock 128 / 141 0.9 cycle 77 / 64 1.2 128 / 77 1.7

Addressing Control hazards - Vector mask Instructions involving control statement can't run in vector mode Solution: o Convert control dependence into data dependence by executing control statement and updating vector mask register o Run data dependent instructions in vector mode based on value in value mask register

Vector mask - Contd.

Improving Vector mask - Scatter & Gather method Step 1: Set VM to 1 based on control condition Step 2: Create CVI - Create Vector Index based on VM o Create an index vector which points to addresses of valid contents Step 3: LVI - Load Vector Index (GATHER) o Load valid operands based on step 2 Step 4: Execute arithemetic operation on compressed vector Step 5: SVI - Store Vector Index (SCATTER) o Store valid output based on step 2

Scatter & Gather - Contd.

Comparison - Basic vector mask & Scatter - Gather Conclusion: Scatter & Gather will run faster if less than one-quarter of elements are non zero

Enhancing Vector performance - Summary General o Pipelining individual operations of one instruction o Reducing Startup latency Structural hazards o Multiple Lanes Data hazards o Flexible chaining Control hazards o Basic vector mask o Scatter & Gather

Measuring Vector Performance - Total execution time Scale for measuring performance: Total execution time of the vector loop - Tn o Used to compare performance of different instructions on processor o o o o o o Unit - clock cycles n - vector length MVL - maximum vector length Tloop - Loop overhead Tstart - startup overhead Tch ime - unit of convoys

Measuring Vector Performance MFLOPS MFLOPS - Millions of FLoating point Operations Per Second o Used to compare performance of two different processors MFLOPS - Rn Unit - operations / second MFLOPS - Rinfinity (theoritical / peak performance) o

SSE Instructions Streaming SIMD Extensions (SSE) is a SIMD instruction set extension to the x86 architecture Streaming SIMD Extensions are similar to vector instructions. SSE originally added eight new 128-bit registers known as XMM0 through XMM7 Each register packs together: four 32-bit single precision floating point numbers or two 64-bit double precision floating point numbers or two 64-bit integers or four 32-bit integers or eight 16-bit short integers or sixteen 8-bit bytes or characters.

SSE Instruction set & Applications Sample instruction set for floating point operations o Scalar – ADDSS, SUBSS, MULSS, DIVSS o Packed – ADDPS, SUBPS, MULPS, DIVPS Example Applications - multimedia, scientific and financial applications

A Case study - Intel Larrabee Architecture a many-core visual computing architecture code Intel’s new approach to a GPU Considered to be a hybrid between a multi-core CPU and a GPU Combines functions of a multi-core CPU with the functions of a GPU

Larrabee - The Big picture in order execution (Execution is also more deterministic so instruction and task scheduling can be done by the compiler) Each Larrabee core contains a 512-bit vector processing unit, able to process 16 single precision floating point numbers at a time. uses extended x86 architecture set with additional features like scatter / gather instructions and a mask register designed to make using the vector unit easier and more efficient.

Larrabee VPU Architecture 16 wide vector ALU in one core executes interger, single precision, float and double precision float instructions choice of 16 - Tradeoff between increased computational density and difficulty of high utilization with wider one suports swizzling and replication Mask register and index register operations

Larrabee Data types 32 512-bit vector registers & 8 16-bit vector mask registers Each element of vector register can be o 8 wide - to store 16 float 32's or 16 int 32's o 16 wide - to store 8 float 64's or 8 int 64's

Larrabee Instruction set vector arithmetic, logic and shift vector mask generation vector load / store swizzling Vector multiply - add, multiply - sub instructions

Past, Present & Future of Vector processors Past o Cray X1 o Earth simulator Present o Cray Jaguar o Larrabee Future: AVE (Advanced Vector Extensions) o Sandy Bridge (Intel) o Bulldozer (AMD)

Pitfalls and Fallacies Pitfalls: o Concentrating on peak performance and ignoring start up overhead (on memory-memory vector architecture) o Increasing Vector performance, without comparable increase in scalar performance Fallacy o You can get vector performance without providing memory bandwidth (by reusing vector registers)

Recap Why Vector processors Basic Vector Architecture Vector Execution time Vector load - store units and Vector memory systems Vector length - VLR Vector stride Enhancing Vector performance Measuring Vector performance SSE Instruction set and Applications A case study - Intel Larrabee vector processor Pitfalls and Fallacies

References Computer Architecture - A quantitative approach 4th edition (Appendix A, F & G, chapter 2 & 3) Cray X1 df Larrabee official page on intel http://software.intel.com/enus/articles/larrabee/ Larrabee http://www.gpucomputing.org/drdobbs 042909 final.pdf om-intel-about-larrabee/

Thank you.

Why Vector processors Basic Vector Architecture Vector Execution time Vector load - store units and Vector memory systems Vector length - VLR Vector stride Enhancing Vector performance Measuring Vector performance SSE Instruction set and Applications A case study - Intel Larrabee vector processor

Related Documents:

for Heritage Christian School - Indiana

Indiana State University 2 5.0% University of Southern Indiana 0 0.0% Indiana University-Bloomington 6 15.0% Indiana University-East 0 0.0% Indiana University-Kokomo 1 2.5% Indiana University-Northwest 0 0.0% Indiana University-Purdue University-Indianapolis 4 10.0% Indiana University-South Bend 0 0.0% Indiana University-Southeast 1 2.5%

36 Views

1y ago

BODAS-service Version 3

PEAK PCAN-USB, PEAK PCAN-USB Pro, PEAK PCAN-PCI, PEAK PCAN-PCI Express, Vector CANboard XL, Vector CANcase XL, Vector CANcard X, Vector CANcard XL, Vector CANcard XLe, Vector VN1610, Vector VN1611, Vector VN1630, Vector VN1640, Vector VN89xx, Son-theim CANUSBlight, Sontheim CANUSB, S

155 Views

2y ago

TRACKING PUBLIC AND PRIVATE RESPONSES TO THE …

Bloomington, IN 47401 ramansh@iu.edu Byungkyu Lee Department of Sociology Indiana University Bloomington, IN 47405 bl11@indiana.edu Ana Bento School of Public Health Indiana University Bloomington, IN 47401 abento@iu.edu Kosali I. Simon O’Neill School of Public and Environmental Affairs Indiana Universit

10 Views

2y ago

Vector and SIMD Processors

Components of Vector Processors Vector Registers o Typically 8-32 vector registers with 64 - 128 64-bit elements o Each contains a vector of double-precision numbers o Register size determines the maximum vector length o Each includes at least 2 read and 1 write ports Vector Functional Units (FUs) o Fully pipelin

25 Views

2y ago

“I am uncomfortable sharing what I can’t see”: Privacy ...

Indiana University Bloomington Bryan Dosono Syracuse University Tousif Ahmed Indiana University Bloomington Apu Kapadia Indiana University Bloomington Bryan Semaan Syracuse University Abstract The emergence of camera-based assistive technologies has empowered people with visual impai

15 Views

2y ago

City of Bloomington, Indiana Police Department Organizational Assessment

While a police agency's work is based on a framework ofcriminal statutes, open records laws, and the like, an academic institution works within a . The Bloomington Police Department is a -service law enforcement agency serving the City of full Bloomington, Indiana. Bloomington's population was 80,405 in the 2010 census; as of 2018 it was ,

8 Views

1y ago

Diversity Mapping: Indiana University Bloomington - IU

Oﬃce of President, Indiana University Oﬃce of Provost, IU Bloomington Faculty & Academic Aﬀairs Oﬃce of the Treasurer, Indiana University 0% 25% 50% 75% 100% 100% 100% 99% 98% 1% 2% 1st Order 2nd Order 3rd Order 4th Order IU Bloomington's Diversity Actions

8 Views

1y ago

FEMINIST CRITICISM: AN INTRODUCTION

FEMINIST CRITICISM: AN INTRODUCTION SANDEEP KUMAR SHARMA Research Scholar Department of English Punjabi University, Patiala (Punjab) INDIA Feminist criticism began as a kind of revolution against the traditional literary criticism which was male-centred that considered women's writing as inferior. A feeling prevailed among the traditional literary critics that women were incapable of any .

142 Views

3y ago

Recent Views

Legal Proceedings and Legal Privilege Exemptions: Myth-busting - ICO

If asking for legal advice, say so, and start new email chain If giving legal advice, say so Involve lawyers (before litigation contemplated) Maintain confidentiality of legal advice documents Limit dissemination of legal advice (need to know; original only) Make internal communications re legal advice factual

1y ago

240 Views

Smart People Ask for (My) Advice: Seeking Advice Boosts .

advice strategically is likely to be a different experi-ence for the advice seeker than seeking advice with the intention of using it, from the advisor’s perspec-tive, strategic advice seeking may elicit the same per-ceptual effects as authentic advice seeking because the advice seeker’s intentions (and her reliance on advice)

3y ago

177 Views

Legal Action Group The Role of Advice Services in Health Outcomes

The Role of Advice Services in Health Outcomes Evidence Review and Mapping Study June 2015 The Role of Advice Services in Health Outcomes . tor.!Our! r,!

1y ago

170 Views

Legal Information vs Legal Advice Guidelines - TMCEC

giving legal advice. Legal advice is a written or oral statement that: o Interprets some aspect of the law, court rules, or court procedures; o Recommends a specific course of conduct a person should take in an actual or potential legal proceeding; or o Applies the law to the individual person's specific factual circumstances. What is Legal .

1y ago

225 Views

ProQual L2 Certificate Supporting Access to Legal Advice

R/502/7657 Communicating with legal advice clients 2 3 D/503/0822 Supporting clients to make use of the legal advice service 2 3 R/502/7660 Enabling legal advice clients to access signposting and referral opportunities 2 3 Optional Units - a minimum of 6 credits Unit Reference Number Unit Title Unit Level Credit Value

1y ago

173 Views

Guidance for opponents in civil legal aid cases - Scottish Legal Aid Board

injury case - may apply for civil legal aid (since this leaﬂet deals only with civil legal aid, where we refer to "legal aid" we mean "civil legal aid"). Legal aid is ﬁnancial help from public funds. It helps people who qualify to get legal advice and the help of a solicitor to put their case in court.

4m ago

110 Views

Priority Banking Tariff - Standard Chartered

Foreign exchange rate Free Free Free Free Free Free Free Free Free Free Free Free Free Free Free SMS Banking Daily Weekly Monthly. in USD or in other foreign currencies in VND . IDD rates min. VND 85,000 Annual Rental Fee12 Locker size Small Locker size Medium Locker size Large Rental Deposit12,13 Lock replacement

2y ago

206 Views

legal and ethical dimensions of practice - Dovetail

Material in this Guide should never be taken as providing you or any other person with legal advice. Legal advice regarding the application of the law to a particular circumstance or situation can only come from a legal practitioner. A range of sources for legal advice can be found in the Guide.

1y ago

167 Views

How Social Welfare Legal Advice and Social Prescribing can work .

The position of social welfare legal advice and its role in London's recovery The Mayor of London and partners should position social welfare legal advice as a core pillar of Londons recovery from the OVID-19 pandemic, with a core focus on ensuring adequate funding and practical support for advice agencies to ensure ongoing viability.

1y ago

172 Views

WHAT TO DO IF YOU ARE SEXUALLY HARASSED

There are many legal clinics or legal information centres you can contact to obtain legal information, educational resources or legal referrals. Alberta Central Alberta Community Legal Clinic (Red Deer) Centre for Public Legal Education Alberta Pro Bono Law Alberta Women's Centre Legal Advice Clinic (Calgary)

3y ago

245 Views

Legal Advocacy Essentials

Legal Advocacy Essentials: a core training for legal advocates Presented by the Washington State Coalition Against Domestic Violence, 2008. This information is not intended as a substitute for legal advice. 1 Legal Advocacy Essentials . A core training for legal advocates . Table of Contents . What is a legal advocate?

1y ago

249 Views

Legal & Corporate Services: Strategic Plan - CP6

the provision of legal advice, managing legal risk and managing the legal supply chain. By doing this well, the team will move towards its vision. Legal Services is made up of 4 teams, each serving different customers with a dedicated legal resource. This is summarised in the figure right. Although Legal Services has customerdistinct, -focussed .

1y ago

171 Views

Regulatory Guide RG 90 Example Statement of Advice: Scaled advice for a .

representatives and advisers who give personal advice to retail clients. It explains how and why we have developed an example Statement of Advice (SOA) for scaled advice (i.e. personal advice that is limited in scope) on personal insurance for a new retail client. The example SOA was developed in consultation with stakeholders, and we

1y ago

186 Views

Removal of licence disqualification - Legal Aid WA

agencies, permission must first be obtained from Legal Aid Western Australia. This Kit provides information about the law only and does not constitute legal advice. You should seek legal advice if you have a specific legal problem. Every effort is made to ensure that the information contai

2y ago

253 Views

Legal Information vs - txcourts.gov

giving legal advice. Legal advice is a written or oral statement that: Inter p rets some as ect of th elaw, courtles, or du s; Recomme nd s a pecific c ourse of ndu ters h ld k ein an actual or ntial legal proceeding; or 'sApplies th elaw to individu alperso n seci fic actu circums a . What is Legal Information?

1y ago

174 Views

Vector Processors - Indiana University Bloomington

It looks like you're using an ad-blocker