CIS 665: GPU Programming And Architecture

2y ago
41 Views
3 Downloads
299.82 KB
47 Pages
Last View : 23d ago
Last Download : 2m ago
Upload by : Eli Jorgenson
Transcription

CIS 665:GPU Programming andArchitectureOriginal Slides by: Suresh VenkatasubramanianUpdates by Joseph Kider

AdministriviaInstructorzzzJoseph Kider(kiderj at seas.upenn.edu)Office Hours: Tuesdays 3-5pmOffice Location: Moore 103 HMS LabMeeting TimezzTime: Tuesday and Thursday 6-9pmLocation: Towne 319Websitehttp://www.seas.upenn.edu/ cis665/

AdministriviaCourse DescriptionThis course will examine the architecture and capabilities of modern GPUs (graphicsprocessing unit). The GPU has grown in power over recent years, to the point wheremany computations can be performed faster on the GPU than on a traditional CPU.GPUs have also become programmable, allowing them to be used for a diverse setof applications far removed from traditional graphics settings.Topics covered will include architectural aspects of modern GPUs, with a special focuson their streaming parallel nature, writing programs on the GPU using high levellanguages like Cg, CUDA, SlabOps, and using the GPU for graphics and generalpurpose applications in the area of geometry modelling, physical simulation, scientificcomputing and games.The course will be hands-on; there will be regular programming assignments, andstudents will also be expected to work on a project (most likely a larger programmingendeavour, though more theoretical options will be considered).zNOTE: Students will be expected to have a basic understanding of computerarchitecture, graphics, and OpenGL.zzCourse SurveyTime Change

AdministriviaGradingzzzzGrading for this course is as follows: There is no final or midterm exams. The grading will be based on homeworks,projects, and presentation. Detailed allocations are tentativelyas follows:Homeworks (3-4) (75%): Each student will complete 3-4programming assignments over the semester. Theseassignments start to fill the student's 'toolbox' of techniquesand provide an understanding for the implementation of gamerendering, animation, and general purpose algorithms beingperformed on GPUs. The last assignment will include an openarea to choose a problem of your choice to solve.Paper Presentation (20%): Each student will present one or twopapers on a topic that interests them based on a short list ofimportant papers and subject areas relevant to the GPUliterature.Quizzes and class participation (5%): A small portion to checkif you're attending and paying attention in classes.

Bonus DayszEach of you get three bonus dayszzzzzA bonus day is a no-questions-asked one-dayextension that can be used on most assignmentsYou can use multiple bonus days on the same thingIntended to cover illnesses, interview visits, justneeding more time, etc.I have a strict late policy : if its not turned in ontime 11:59pm of due date, 25% is deducted, 2days late 50%, 3 days 75%, more 99% use yourbonus days.Always add a readme, note if you use bonusdays.

AdministriviazDo I need a GPU? What do I need?zYes: NVIDIA G8 series or higherzNo: HMS Lab - Computers with G80 Architecture Cards(by request/need and limited number only (3-5), firstcome first serve)

Course GoalszLearn how to program massively parallelprocessors and achievez highperformancez functionality and maintainabilityz scalability across future generationszAcquire technical knowledge required toachieve the above goalsz principlesand patterns of parallelprogrammingz processor architecture features andconstraintsz programming API, tools and techniques

Academic HonestyzzzYou are allowed and encouraged to discussassignments with other students in the class.Getting verbal advice/help from people who’vealready taken the course is also fine.Any reference to assignments from previousterms or web postings is unacceptableAny copying of non-trivial code is unacceptablezzNon-trivial more than a line or soIncludes reading someone else’s code and then goingoff to write your own.

Academic Honesty (cont.)zPenalties for academic dishonesty:z Zeroon the assignment for the first occasionz Automatic failure of the course for repeatoffenses

Text/Notes1.2.3.4.5.6.No required text you have to buy.GPU Gems 1 – 3 (1 and 2 online)NVIDIA, NVidia CUDA Programming Guide,NVidia, 2007 (reference book)T. Mattson, et al “Patterns for ParallelProgramming,” Addison Wesley, 2005(recomm.)The CG Tutorial (online)Lecture notes will be posted at the web site

Tentative ScheduleReview Syllabusz Talk about paper topic choicez

What is a GPU?GPU stands for Graphics Processing UnitSimply – It is the processor that resides onyour graphics card.GPUs allow us to achieve the unprecedentedgraphics capabilities now available ingames (ATI Demo)

Why Program on the GPU ? GPU Observed GFLOPS CPU Theoretical peak GFLOPS20052006From 2006 GDC Presentation Nvidia

Why Massively Parallel ProcessorzA quiet revolution and potential build-upzzzzCalculation: 367 GFLOPS vs. 32 GFLOPSMemory Bandwidth: 86.4 GB/s vs. 8.4 GB/sUntil last couple years, programmed through graphics APIGPU in every PC and workstation – massive volume and potentialimpact

How has this come about ?zzzzzGame design has become ever more sophisticated.Fast GPUs are used to implement complex shader andrendering operations for real-time effects.In turn, the demand for speed has led to ever-increasinginnovation in card design.The NV40 architecture has 225 million transistors,compared to about 175 million for the Pentium 4 EE 3.2Ghz chip.The gaming industry has overtaken the defense, finance,oil and healthcare industries as the main driving factorfor high performance processors.

GPU Fast co-processor ?GPU speed increasing at cubed-Moore’sLaw.z This is a consequence of the data-parallelstreaming aspects of the GPU.z GPUs are cheap ! Put a couple together,and you can get a super-computer.zSo can we use the GPUfor general-purposecomputing ?NYT May 26, 2003: TECHNOLOGY; From PlayStationto Supercomputer for 50,000:National Center for Supercomputing Applications atUniversity of Illinois at Urbana-Champaign buildssupercomputer using 70 individual Sony Playstation 2machines; project required no hardware engineeringother than mounting Playstations in a rack andconnecting them with high-speed network switch

Future Apps Reflect aConcurrent WorldzExciting applications in future masscomputing market have been traditionallyconsidered “supercomputing applications”z Moleculardynamics simulation, Video and audio codingand manipulation, 3D imaging and visualization,Consumer game physics, and virtual reality productsz These“Super-apps” represent and modelphysical, concurrent worldzVarious granularities of parallelism exist,but z programmingmodel must not hinder parallelimplementationz data delivery needs careful management

Yes ! Wealth of applicationsData AnalysisMotion PlanningParticle SystemsVoronoi DiagramsForce-field simulationMolecular Dynamics Graph DrawingGeometric OptimizationPhysical SimulationMatrix MultiplicationDatabase queriesConjugate GradientSorting and SearchingRange queriesImage ProcessingSignal ProcessingFinanceOptimizationPlanningRadar, Sonar, Oil Exploration and graphics too !!

When does “GPU fast co-processor” work ?Real-time visualization of complexphenomenaThe GPU (like a fast parallel processor)can simulate physical processes like fluidflow, n-body systems, molecular dynamicsIn general: Massively Parallel Tasks

When does “GPU fast coprocessor” work ?Interactive data analysisFor effective visualization of data,interactivity is key

When does “GPU fast co-processor” work ?Rendering complex scenes (like the ATIdemo)Procedural shaders can offload much of the expensive renderingwork to the GPU. Still not the Holy Grail of “80 million triangles at 30frames/sec*”, but it helps.* Alvy Ray Smith, Pixar.Note: The GeForce 8800 has an effective 36.8 billion texel/second fill rate

General-purpose Programming on the GPU:What do you need ?In the abstract:z A model of the processorz A high level languageIn practical terms:z Programming tools(compiler/debugger/optimizer/)z Benchmarking

Follow the languagezzzzSome GPU architecture details hidden, unlikeCPUs (Less now than previously).OpenGL (or DirectX) provides a state machinethat represents the rendering pipeline.Early GPU programs used properties of the statemachine to “program” the GPU.Recent GPUs provide high level programminglanguages to work with the GPU as a generalpurpose processor

Programming using OpenGLstatezOne “programmed” in OpenGL using statevariables like blend functions, depth testsand stencil testsglEnable( GL BLEND ) ;glBlendEquationEXT ( GL MIN EXT ) ;glBlendFunc( GL ONE, GL ONE ) ;

Follow the languagezAs the rendering pipeline became morecomplex, new functionality was added tothe state machine (via extensions)zWith the introduction of vertex andfragment programs, full programmabilitywas introduced to the pipeline.

Follow the languagezWith fragment programs, one could writegeneral programs at each fragmentMULFLRFRCSUBtmp, fragment.texcoord[0], size.x;intg, tmp;frac, tmp;frac 1, frac, 1.0;But writing (pseudo)-assembly code isclumsy and error-prone.

Follow the languagezFinally, with the advent of high levellanguages like HLSL, Cg, GLSL, CUDA,CTM, BrookGPU, and Sh, generalpurpose programming has become easy:float4 main({}in float2 texcoords : TEXCOORD0,in float2 wpos : WPOS,uniform samplerRECT pbuffer,uniform sampler2D nvlogo) : COLORfloat4 currentColor texRECT(pbuffer, wpos);float4 logo tex2D(nvlogo, texcoords);return currentColor (logo * 0.0003);

A Unifying theme: StreamingAll the graphics language models share basicproperties:1.2.3.They view the frame buffer as an array of“pixel computers”, with the same programrunning at each pixel (SIMD)Fragments are streamed to each pixelcomputerThe pixel programs have limited state.

What is stream programming?A stream is a sequence of data (could benumbers, colors, RGBA vectors, )z A kernel is a (fragment) program that runson each element of a stream, generatingan output stream (pixel buffer).z

Stream Program GPUKernel vertex/fragment programz Input stream stream of fragments orvertices or texture dataz Output stream frame buffer or pixelbuffer or texture.z Multiple kernels multi-pass renderingsequence on the GPU.z

To program the GPU, one mustthink of it as a (parallel) streamprocessor.

What is the cost of a program ?Each kernel represents one pass of amulti-pass computation on the GPU.z Readbacks from the GPU to main memoryare expensive, and so is transferring datato the GPU.z Thus, the number of kernels in a streamprogram is one measure of how expensivea computation is.z

What is the cost of a program ?Each kernel is a geometry/vertex/fragmentor CUDA program. The more complex theprogram, the longer a fragment takes tomove through a rendering pipeline.z Complexity of kernel is another measureof cost in a stream program.z

What is the cost of a program ?Texture or memory accesses on the GPUcan be expensive if accesses are nonlocalz Number of memory accesses is also ameasure of complexity in a streamprogram.z

What is the cost of a program ?Conditional Statements do not work wellon streaming processorsz Fragmentation of code is also a measureof complexity in a stream program.z

The GPGPU ChallengeBe cognizant of the stream nature of theGPU.z Design algorithms that minimize costunder streaming measures of complexityrather than traditional measures.z Implement these algorithms efficiently onthe GPU, keeping in mind the limitedresources (memory, program length) andvarious bottlenecks (conditionals) on thecard.z

What will this course cover ?

1. Stream Programming PrinciplesOpenGL, the fixed-function pipeline andthe programmable pipelinez The principles of stream hardwarez Viewing the GPU as a realization of astream programming abstractionz How do we program with streams ?zHow should one think in terms of streams ?

2. Basic Shadersz Howdo we compute complex effects found intoday’s games?z ParallaxMappingz Reflectionsz Skin and Hairz And more .

3. Special EffectszHow do we interactz ParticleSystemsz Deformable Meshz Morphingz Animation

4. GPGPUzHow do we use the GPU as a fast co-processor?zzzGPGPU Languages such as CUDAHigh Performance ComputingNumerical methods and linear algebra:zzzzzzzzzInner productsMatrix-vector operationsMatrix-Matrix operationsSortingFluid SimulationsFast Fourier TransformsGraph AlgorithmsAnd More At what point does the GPU become faster than the CPU formatrix operations ? For other operations ?( This will be about half the course)

5. OptimizationsHow do we use the full potential of theGPU?z What makes the GPU fast?z What tools are there to analyze theperformance of our algorithms?z

6. Physics, The future of the GPU?Physical Simulationz Collision Detectionz

7. Artificial Intelligence(The next future of GPU)Massive Simulationsz Flocking Algorithmsz Conjugant Gradientz

What we want you to get out of this course!1.2.3.4.5.6.Understanding of the GPU as a graphicspipelineUnderstanding of the GPU as a highperformance compute deviceUnderstanding of GPU architecturesProgramming in CG and CUDAExposure to many core graphics effectsperformed on GPUsExposure to many core parallel algorithmsperformed on GPUs

Main Languages we will usezOpenGL and CGzzzzGraphics languages for understanding visual effects.You should already have an understanding ofOpenGL, please see myself or Joe after class if this isnot the caseWe will NOT be using DirectX or HLSL because ofthe high learning curveCUDAzzGPGPU language. This will be used for any generalpurpose algorithms. (Only works on NVIDIA cards)We will NOT be using CTM because it is a lower levellanguage than CUDA.

Class URLszBlackboard site:z Checkhere for assignments andannouncements.zCourse Websitewww.seas.upenn.edu/ cis665z Checkhere for lectures and related articlesz READ the related articles! (They’re good)

Bonus Days zEach of you get three bonus days zA bonus day is a no-questions-asked one-day extension that can be used on most assignments zYou can use multiple bonus days on the same thing zIntended to cover illnesses, interview visits, just needing more time, etc. zI have a strict late policy : if its not turned in on time 11:59pm of due date, 25% is deducted, 2

Related Documents:

CIS 175 Java II CMSC 150 CIS 178 Java Programming I CIS 260JA CIS 179 Java Programming II CIS 260JA or CIS # CIS 189 Python MIS 150 CIS 303 Intro to Data Base CIS # CIS 332 Data Base and SQL CIS 255 CIS 338 SQL/Oracle CIS # CIS 346 Data Base Design CIS # CIS 402 COBOL CIS # CIS 451 PLTW - Comp Sci Applications CIS #

Steel City 665 Series Metallic Covers Cat. Std. Pkg. Load UPC Number Description Qty. Rating Number 665-CST-SW-BRS Solid Brass Cover for 665 Series Floor Boxes 1 1,500 lbs. 785991-04815 665-CST-SWR-BRS Solid Brass Cover with Recess for Floor Covering, 665 Series 1 1,500 lbs. 785991-04817 665-CST-S

665-4MAAP* AV plate for 665 series floor boxes, accepts 4 Extron MAAP plates 1 785991-04846 665-WT Replacement wire tunnel for 665-SC and 665-AV2 floor boxes 4 785991-61868 * Extron MAA

cis-Cyclobutane-1,2-dicarboxylicAnhydride 62 cis-l,2-Bis(hydroxymethyl)cyclobutane 62 cis-l,2-Bis(bromomethyl)cyclobutane 62 cis-l,2-Bis(cyanomethyl)cyclobutane 62 cis-l,2-CyclobutanediaceticAcid 62 DimethylCyclobutane-cis-1,2-di-cC-bromoacetate 62 cetate withSodiumHydride 62

CIS Microsoft Windows 7 Benchmark v3.1.0 Y Y CIS Microsoft Windows 8 Benchmark v1.0.0 Y Y CIS Microsoft Windows 8.1 Benchmark v2.3.0 Y Y CIS Microsoft Windows 10 Enterprise Release 1703 Benchmark v1.3.0 Y Y CIS Microsoft Windows 10 Enterprise Release 1709 Benchmark v1.4.0 Y Y CIS .

OpenCV GPU header file Upload image from CPU to GPU memory Allocate a temp output image on the GPU Process images on the GPU Process images on the GPU Download image from GPU to CPU mem OpenCV CUDA example #include opencv2/opencv.hpp #include <

the CIS’s suitability to be a Qualifying CIS; or 5. winding up of an Qualifying CIS; and (l) in addition to the requirements in (a) – (k) above, the CIS Operator must be subject to the requirements in its Home Jurisdiction. 1.10 A CIS Operator which participates in this Framework is de

Peter-Michael Osera posera@cis.upenn.edu Richard Eisenberg eir@cis.upenn.edu Christian DeLozier delozier@cis.upenn.edu Santosh Nagarakatte santoshn@cis.upenn.edu Milo M. K. Martin milom@cis.upenn.edu Steve Zdancewic stevez@cis.upenn.edu August 5, 2013 Core Ironclad is a c