Asynchronous Peer-to-Peer Device Communication - NVIDIA

1y ago

4 Views

1 Downloads

4.30 MB

28 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Harley Spears

Report this link

Download PDF

Transcription

13th ANNUAL WORKSHOP 2017Asynchronous Peer-to-Peer Device CommunicationFeras Daoud, Leon Romanovsky[ 28 March, 2017 ]

AgendaPeer-to-Peer communicationPeerDirect technologyPeerDirect and PeerDirect AsyncPerformanceUpstream workOpenFabrics Alliance Workshop 2017

Peer-to-Peer Communication3OpenFabrics Alliance Workshop 2017

Peer-to-Peer Communication“Direct data transfer between PCI-E deviceswithout the need to use main memory as atemporary storage or use of the CPU formoving data.” Main advantages: Allow direct data transfer between devices Control the peers directly from other peer devices Accelerate transfers between different PCI-Edevices Improve latency, system throughput, CPUutilization, energy usage Cut out the middlemanOpenFabrics Alliance Workshop 2017

PeerDirect Technology5OpenFabrics Alliance Workshop 2017

TimelineOpenFabrics Alliance Workshop 2017

Prior To GPUDirect GPUs use driver-allocated pinnedmemory buffers for transfers RDMA driver use pinned buffers forzero-copy kernel-bypasscommunication It was impossible for RDMA driversto pin memory allocated by the GPU Userspace needed to copy databetween the GPU driver’s systemmemory region and the RDMAmemory regionCPU MemoryCPU12ChipsetGPU penFabrics Alliance Workshop 2017

GPUDirect/GPUDirect P2P GPU and RDMA device share theCPU Memorysame “pinned” buffers GPU copies the data to systemmemory RDMA device sends it from thereCPU1ChipsetGPU Memory Advantages Eliminate the need to make a redundant copyGPUin CUDA host memory Eliminate CPU bandwidth and tlenecks8OpenFabrics Alliance Workshop 2017

GPUDirect RDMA/PeerDirect CPU synchronizes between GPUCPU Memorytasks and data transfer HCA directly accesses GPUmemoryCPUChipset AdvantagesGPU Memory Direct path for data exchange Eliminate the need to make aGPUredundant copy in host nFabrics Alliance Workshop 2017

GPUDirect RDMA/PeerDirectCPU UtilizationGPU CPU HCAwhile(fin) {gpu kernel , stream (buf);cudaStreamSynchronize(stream);ibv post send(buf);ibv poll cq(cqe);}10OpenFabrics Alliance Workshop 2017

GPUDirect Async/PeerDirect Async Control the HCA from the GPU PerformanceCPU Memory Enable batching of multiple GPU andcommunication tasks Reduce latencyCPU Reduce CPU utilizationChipset Light weight CPU Less power CPU prepares and queues compute andGPU Memorycommunication tasks on GPU GPU triggers communication on HCA HCA directly accesses GPU 1OpenFabrics Alliance Workshop 2017

GPUDirect Async/PeerDirect AsyncGPU CPU HCAwhile(fin) {gpu kernel , stream (buf);gds stream queue send(stream, qp,buf);gds stream wait cq(stream, cqe);}CPU is free12OpenFabrics Alliance Workshop 2017

Peer-to-Peer EvolutionGPUDirect Eliminate the need to make a redundant copy inCUDA host memory Eliminate CPU bandwidth and latency bottlenecksPeerDirect Eliminate the need to make a redundant copy in hostmemory Direct path for data exchangePeerDirect Sync Control RDMA device from the GPU Reduce CPU utilizationOpenFabrics Alliance Workshop 2017

PeerDirect14OpenFabrics Alliance Workshop 2017

PeerDirectHow Does It Work? Allow ibv reg mr() to register peer memory Peer devices implement new kernel module – io peer mem Register with RDMA subsystem - ib register peer memory client() io peer mem implements the following callbacks : acquire() – detects whether a virtual memory range belongs to the peer get pages() – asks the peer for the physical memory addresses matching the memory region dma map() – requests the bus addresses for the memory region Matching callbacks for release: dma unmap(), put pages() and release()15OpenFabrics Alliance Workshop 2017

PeerDirectMemory Region RegistrationUser-space Verbs AppRDMA Subsystemibv reg mr()Peer ClientPeer DeviceHCA(a) acquire()mine!(b) get pages()Pin Peer PagesPhysical Pagesdma map()Register MRDMA addressesibv reg mr() SuccessUse MR forPeerDirectOpenFabrics Alliance Workshop 2017

PeerDirect Async17OpenFabrics Alliance Workshop 2017

PeerDirect AsyncHow Does It Work? Allow peer devices to control the network card latency reduction, batching of management operations Two new supported operations Queue a set of send operations to be triggered by the GPU - ibv exp peer commit qp() Test for a “successful completion” - ibv exp peer peek cq() Dedicated QPs and CQs for PeerDirect Sync Avoid to interlock PeerDirect Sync and normal post send/poll cq Device agnostic Currently, built to support NVIDIA’s GPUs Support other HW as well – FPGAs; storage controllers18OpenFabrics Alliance Workshop 2017

Transmit OperationCreate a QP - Mark it for PeerDirect Sync - Associate it with the peer(1)QueueWorkRequest1. Post work requests using ibv post send() Doorbell record is not updatedCPU(2)PassBytecodeDoorbell is not ringed2. Use ibv exp peer commit qp() to getbytecode for committing all WQEscurrently posted to the send work queue3. Queue the translated bytecode operationson the peer after the operations thatgenerate the data that will be sentHCAGPU(3)Trigger sendusingBytecode19OpenFabrics Alliance Workshop 2017

Completion HandlingCreate a CQ - Mark it for PeerDirect Sync - Associate it with the peer(4)ReclaimCompletions1. Use ibv exp peer peek cq() to getbytecode for peeking a CQ in a specificoffset from the currently expected CQ entry2. Queue the translated operations on thepeer before the operations that use thereceived data3. Synchronize the CPU with the peer toinsure that all the operations has ended4. Use ibv poll cq() to consume thecompletion entriesCPU(3)Report forfinishHCA(1)Pass PollBytecodeGPU(2)Peek forCompletion20OpenFabrics Alliance Workshop 2017

Performance21OpenFabrics Alliance Workshop 2017

Performance mode[*] modified ud pingpong test: recv GPU kernel send on each side.2 nodes: Ivy Bridge Xeon K40 Connect-IB MLNX switch, 10000 iterations, message size: 128B, batch size: 2022OpenFabrics Alliance Workshop 2017

Economy Mode25% faster45% less CPU load[*] modified ud pingpong test, HW same as in previous slide23OpenFabrics Alliance Workshop 2017

Upstream Work24OpenFabrics Alliance Workshop 2017

Peer-to-Peer – Upstream Proposals Peer-to-Peer DMA Mapping DMA addresses of PCI device to IOVA of other device ZONE DEVICE Extend ZONE DEVICE functionality to memory not cached by CPU RDMA extension to DMA-BUF Allow memory region create from DMA-BUF file handle IOPMEM A block device for PCI-E memory Heterogeneous Memory Management (HMM) Common address space will allow migration of memory betweendevicesOpenFabrics Alliance Workshop 2017

13th ANNUAL WORKSHOP 2017THANK YOUFeras Daoud, Leon Romanovsky

BACKUPOpenFabrics Alliance Workshop 2017

BytecodeOpenFabrics Alliance Workshop 2017

Associate it with the peer. 1. Use ibv_exp_peer_peek_cq() to get bytecode for peeking a CQ in a specific offset from the currently expected CQ entry. 2. Queue the translated operations on the peer before the operations that use the received data . 3. Synchronize the CPU with the peer to insure that all the operations has ended. 4.

Related Documents:

September 2013

DNR Peer A Peer B Peer C Peer D Peer E Peer F Peer G Peer H Peer I Peer J Peer K 14 Highest Operating Margin in the Peer Group (1) (1) Data derived from SEC filings, three months ended 6/30/13 and includes DNR, CLR, CXO, FST, NBL, NFX, PXD, RRC, SD SM, RRC, XEC. Calculated as

59 Views

3y ago

Asynchronous Behaviors Meet Their Match with SystemVerilog Assertions

2005 SystemVerilog standard[3], making SVA a little tricky to use for describing asynchronous behaviors. Asynchronous behaviors usually fall into two categories: (1) asynchronous control, and (2) asynchronous communication. SystemVerilog assertions can be used for either, but each presents its own set of challenges.

12 Views

1y ago

A Measurement Study of Peer-to-Peer File Sharing Systems

The popularity of peer-to-peer multimedia ﬁle sharing applications such as Gnutella and Napster has created a ﬂurry of recent research activity into peer-to-peer architec-tures. We believe that the proper evaluation of a peer-to-peer system must take into account the characteristics

175 Views

2y ago

Modeling Peer-Peer File Sharing Systems

In a peer-peer ﬁle-sharing application, for example, a peer both requests ﬁles from its peers, and stores and serves ﬁles to its peers. A peer thus generates workload for the peer-peer application, while also providing the ca

148 Views

2y ago

Asynchronous Design Methodology for an Efficient ...

1.1 Basic block diagram of an Asynchronous Circuit 5 1.2 (a) A synchronous circuit, (b) a synchronous circuit with clock drivers and clock gating, (c) an equivalent asynchronous circuit, and (d) an abstract data-flow view of the asynchronous circuit. 9 2.1 CMOS in

38 Views

2y ago

Asynchronous and Synchronous William Stallings ...

Asynchronous File Transfer Protocols Older microcomputer file transfer protocols used asynchronous point-to-point circuits, typically across telephone lines via a modem. y XMODEM x XMODEM-CRC (CRC-8) x XMODEM-1K (CRC 1K blocks) y YMODEM(CRC-16) y ZMODEM (CRC-32) y KERMIT (CRC -24) Asynchronous

60 Views

2y ago

Beneﬁts of Asynchronous Control for Analog Electronics ...

Multiphase buck converter: Asynchronous phase control 14 / 21 - analog-asynchronous interfaces - synthesised hazard-free components l asymmetric delays elements. Multiphase buck converter: Design of asynchronous

17 Views

2y ago

Design and Synthesis of 2-bit Asynchronous and 2-bit ...

counter. Synchronous counter is faster than the asynchronous counter. Because Asynchronous counter has more delay of the pulse from one Flip flop to another Flip flop. Fig. 8. Simulated output of Conventional Two-bit Asynchronous Counter. D. Two-bit Synchronous Counter . In synchronous counter

34 Views

2y ago

Recent Views

No. 41018

die Boedelwet, 1965, ten einde die Kabinetslid verantwoordelik vir die . die aanstellingstermyn van lede van die Raad van Regshulp Suid-Afrika verder te reel; en . artikel3 van Wet 104 van 1996, artikel 3 van Wet 66 van 1998, artikel 1 van Wet 62 van 2000, artikel 1 van Wet 28 van 10 . 6 No. 41018 Act No.8 of 2017

3y ago

184 Views

STAATSKOERANT - Parliament of Namibia

V AN DIE REPUBLIEK VAN SUID-AFRIKA REPUBLIC OF SOUTH AFRICA . GOVERNMENT GAZETTE . . Tot wysiging van die Boedelwet, 1965, om sekere bedrae te . Wysiging van artikel 35 van Wet 66 van 1965. Wysiging van artikel 80 van Wet 66 van 1965. Wysiging van artikel 102 van

3y ago

152 Views

The State of Van Nuys - Final Report - Van Nuys Neighborhood Council

Geographic Location of Van Nuys in Los Angeles City Figure 2. Van Nuys Neighborhood Council Figure 3.!Founding of Van Nuys in 1911 Figure 4. Original Van Nuys Hotel, Van Nuys, Calif., on Van Nuys Blvd. Figure 5. Van Nuys Population Trends 1970-2010 Figure 6. Population Trends in Race/Ethnicity, 1980 - 2010 Figure 7. Van Nuys Land Use Figure 8.

1y ago

89 Views

VLAAMSE OVERHEID 17 MEI 2019 Besluit van de Vlaamse .

Vlaamse Regering van 19 maart 2010 betreffende de organisatie van de fokkerij van de voor de landbouw nuttige huisdieren; Gelet op het ministerieel besluit van 26 juli 2011 tot erkenning van centra voor varkens ter uitvoering van artikelen 35 en 59, par. 2, van het Fokkerijbesluit van 19 maart 2010;

3y ago

150 Views

YOGA VASISTHA SARA (De essentie van Yoga Vasishtha)

YOGA VASISTHA SARA (De essentie van Yoga Vasishtha) Śrī Ramana Maharṣi 1 - Onthechting 2 - Onwerkelijkheid van de wereld 3 - Kenmerken van de bevrijde 4 - Het oplossen van de geest 5 - Het uitwissen van de onbewuste denk- en voelpatronen. 6 - Meditatie van het Zelf 7 - Methode van Zuivering 8 - Verering van het Zelf

2y ago

309 Views

Training Didactische inzet van ICT - edufit.nl

De rol van facilitator van leerprocessen De mogelijkheden van de didactische inzet van ICT om het onderwijs te verbeteren vraagt ook om beleidsbeslissingen en het ondersteunen van veranderprocessen. Het opschrijven van een visie op de inzet van ICT in het onderwijs is daarbij stap één, het motiveren en stimuleren van docenten om ICT te .

1y ago

104 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

AGRICULTURAL CREDIT ACT NO. 28 OF - FAO

1965 (Wet No. 66 van 1965), aangestel, wat ten opsigte van daardie aangeleentheid, goed of boedel met regsbevoegdheid beklee is; ,,Ministerv die Minister van Landbou; [Omskrywing van ,,h1inisterv vervang deur a. 1 (a) van Wet No. 45 van 1968, deur a. 1 (c) van Wet No. 73 van 1981, deur a.

3y ago

160 Views

Onderwijs- en ExamenRegeling (OER) - Anton de Kom .

1. Het borgen van de kwaliteit van de toetsing. 2. De coordinatie van en controle op examens en tentamens. 3. Het bekrachtigen van tentamenresultaten. 4. Het vaststellen van richtlijnen binnen het kader van het OER om de uitslag van examens vast te stellen. 5. In overleg met de betreffende discipline, verlenen van vrijstelling. 6.

3y ago

241 Views

BIJLAGE I Lijst met door leerlingen geselecteerde werken

3 Beijnum, Kees van - De ordening 1 Beijnum, Kees van - Dichter op zee 1 Beijnum, Kees van - Het mooie seizoen 3 Beijnum, Kees van - Het verboden pad 4 Beijnum, Kees van - Over het IJ 12 Beijnum, Kees van - Paradiso 16 Beijnum, Kees van - Zoon van 2 Beishuizen, Ti

2y ago

214 Views

Musiek van die 'dood'! Of Musiek van die 'lewe'!

Musiek van die 'dood'! Musiek van die 'lewe'! Of Amos 6:5 Wat liedjies sing met begeleiding van die harp, wat net soos Dawid vir hulle musiekinstrumente uitdink! Amos 5:23 Verwyder van My die geraas van jou liedere!En na die geluid van jou harpe wil Ek ( GOD ) nie luister nie.

2y ago

261 Views

van de Europese Unie - Huisvoorklokkenluiders

Gezien het advies van de Rekenkamer (1), Gezien het advies van het Europees Economisch en Sociaal Comité . van de vlaggenstaat met betrekking tot de naleving en de handhaving van het Verdrag betreffende maritieme arbeid, 2006 (PB L 329 van 10.12.2013, blz. 1) en Richtlijn 2009/16/EG van het Europees Parlement en de Raad van 23 april 2009 .

1y ago

111 Views

Gedragscode - dxc

Preventie, detectie en onderzoek van wangedrag Beheer van de Gedragscode Beheer en handhaving van beleid inzake zakelijk gedrag Beheer van naleving van wet-/regelgeving Training in en bewustzijn van ethiek en naleving Beheer van nalevingsrisico's Programma-administratie van SpeakUp! en OpenLine.

1y ago

92 Views

over dierproeven en proefdieren of Research

bevordering van de naleving van de wettelijke voorschriften en daarmee het bevorderen van het welzijn van proefdieren. De NVWA is met deze manier van inspecteren in staat mede vorm te geven aan het principe van de 3 V's (vervanging, vermindering en vooral verfijning van dierproeven) dat ook de basis is van het dierproefbeleid en de Wet op de .

1y ago

104 Views

Wat is de rol van TWW voor het preventiebeleid van mijn bedrijf

Middelen Sociaal strafwetboek (6 juni 2010) SLIC-document: gemeenschappelijke visie van de hoofden van de inspectiediensten in Europa op het vlak van het beheer van een inspectiedienst De Iso9001-norm over de vereisten van een kwaliteitssysteem De jaarlijkse uitwerking van een operationeel plan De samenwerking met diverse andere diensten: Afdeling

1y ago

115 Views

Asynchronous Peer-to-Peer Device Communication - NVIDIA

It looks like you're using an ad-blocker