NVIDIA GPUDirect Storage

2y ago
146 Views
6 Downloads
264.58 KB
30 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Rafael Ruffin
Transcription

NVIDIA GPUDirect StorageOverview GuideDU-10094-001 v0.95.1 May 2021

Table of ContentsChapter 1. Introduction. 11.1. Related Documents.11.2. Benefits for a Developer.21.3. Intended Uses. 31.4. Versioning History. 31.5. Update History.4Chapter 2. Functional Overview. 62.1. Explicit and Direct.62.2. Performance Optimizations.82.2.1. Implementation Performance Enhancements.82.2.2. Concurrency Across Threads. 102.2.3. Asynchrony. 102.2.4. Batching. 102.3. Compatibility and Generality. 112.4. Monitoring.122.5. Scope of the Solutions in GDS. 122.6. Dynamic Routing. 132.6.1. cuFile Configuration for Dynamic Routing.142.6.2. cuFile Configuration for DFS Mount. 152.6.3. cuFile Configuration Validation for Dynamic Routing.17Chapter 3. Software Architecture. 183.1. Software Components.183.2. Primary Components. 183.2.1. Workflows for GDS Functionality.193.2.2. Workflow 1. 203.2.3. Workflow 2. 213.3. Aligning with Other Linux Initiatives. 21Chapter 4. Deployment.234.1. Software Components for Deployment.234.2. Using GPUDirect Storage in Containers.244.3. Internal and External Dependencies. 25NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 ii

Chapter 1.IntroductionGDS enables a direct data path for direct memory access (DMA) transfers between GPUmemory and storage, which avoids a bounce buffer through the CPU. This direct pathincreases system bandwidth and decreases the latency and utilization load on the CPU.This guide provides a high-level overview of GDS, guidance to help you enable filesystemsfor GDS, and some insights about the features of a filesystem and how it relates to GDS. Theguide also outlines the functionalities, considerations, and software architecture about GDS.This high-level introduction sets the stage for deeper technical information in the cuFile APIReference Guide for GDS users who need to modify the kernel.1.1.Related DocumentsThis section provides information about the other GDS documentation that is available to helpyou understand and use GDS.Since the original creation of this document, the following documents and online resourcesprovide support and provide additional context for the optimal use of and understanding of thisspecification:Refer to the following guides for more information about GDS:‣‣‣‣‣‣‣GPUDirect Storage Design GuidecuFile API Reference GuideGPUDirect Storage Release NotesGPUDirect Storage Benchmarking and Configuration GuideGPUDirect Storage Best Practices GuideGPUDirect Storage Installation and Troubleshooting GuideGPUDirect Storage O DIRECT Requirements GuideTo learn more about GDS, refer to the following blogs:‣ GPUDirect Storage: A Direct Path Between Storage and GPU Memory.‣ The Magnum IO series.NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 1

Introduction1.2.Benefits for a DeveloperHere is some informaton about the benefits that GDS provides for application developers.Here are the benefits that are provided by GDS:‣ Enables a direct path between GPU memory and storage.‣ Increases the bandwidth, reduces the latency, and reduces the load on CPUs and GPUs fordata transferral.‣ Reduces the performance impact and dependence on CPUs to process storage datatransfer.‣ Performance force multiplier on top of the compute advantage for computational pipelinesthat are fully migrated to the GPU so that the GPU, rather than the CPU, has the first andlast touch of data that moves between storage and the GPU.‣ Supports interoperability with other OS-based file access, which enables data to betransferred to and from the device by using traditional file IO, which is then accessed by aprogram that uses the cuFile APIs.Here are the benefits that are provided by the cuFile APIs and their implementations:‣ A family of APIs that provide CUDA applications with the best-performing access to localor distributed file and block storage.‣‣‣‣‣Block storage validation might be added in the future.These APIs are consistent with the long-term direction of the Linux community, forexample, with respect to peer to peer RDMA.When transferring to and from the GPU, increased performance relative to existingstandard Linux file IO.Greater ease of use by removing the need for the careful expert management of memoryallocation and data movement.A simpler API sequence that is relative to existing implicit file-GPU data movementmethods, which require a more complex management of memory and data movement onand between the CPU and GPU.Broader support for unaligned transfers than POSIX pread and pwrite APIs with O DIRECT.In the application code, the POSIX APIs require a buffered IO or unaligned handling.‣ Generality across a variety of storage types that span various local and distributedfilesystems, block interfaces, and namespace systems, including standard Linux and thirdparty solutions.Here are the benefits that are provided by the Stream subset of the cuFile APIs:‣ Asynchronous offloaded operations are ordered with respect to a CUDA stream.‣ O after compute: The GPU kernel produces data before it is transferred to IO.‣ Compute after IO: After the data transfer is complete, the GPU kernel can proceed.‣ Available concurrency across streams.NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 2

Introduction‣ Using different CUDA streams allows the possibility of concurrent execution and theconcurrent use of multiple DMA engines.1.3.Intended UsesHere is some information that explains how to use the cuFile features.Here is a list of how you can use the cuFile features:‣ cuFile implementations boost throughput when IO between storage and GPU memory is aperformance bottleneck.This condition arises in cases where the compute pipeline has been migrated to the GPUfrom the CPU, so that the first and last agents to touch data, before or after transfers withstorage, execute on the GPU.‣ cuFile APIs are currently explicit, and reading or writing between storage and buffers thatcompletely fit into the available GPU physical memory.‣ Rather than fine-grained random access, the cuFile APIs are a suitable match for coarsegrained streaming transfers.‣ For fine-grained accesses, the underlying software overheads for making a kerneltransition and going through the operating system can be amortized.1.4.Versioning HistoryHere is some information about the numbering scheme that is used for the documentation.A common versioning scheme is used for documents and utilities with a -v switch thatcorresponds to the following major releases:‣ 0.4 Pre-Alpha‣ 0.5 Alpha‣ 0.7 Beta, April 2020‣ CPU-staged fallback path to POSIX-compliant filesystems when driver is absent‣ Add support for DDN EXAScaler , parallel filesystem solutions (based on the Lustrefilesystem) and WekaFS .‣ Deployment: tarball with installer‣ 0.7.1 Beta Update 1, June 2020‣ Bug fixes.‣ Documentation improvements.‣ 0.8 Release in October 2020‣ 0.9 Release in November 2020‣ 0.95 Release in April 2021NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 3

Introduction‣ 0.95.1 Release in May 2021The overall schema is major release . minor release . patch number . The minor releasenumber is incremented for each validated minor release, and this value returns to 0 with eachmajor release. Patch numbers might be used for bug fixes in unofficial releases. Until version1.0, API definitions may continue to change.See GPUDirect Storage Release Notes for details of functional and performance changessince previous releases. The following subsections pertain to documentation changes.Note: These are API name changes and API argument changes that have occurred since theAlpha release and that developers must accommodate to compile.1.5.Update HistoryThis section provides information about the updates to this guide.Updates Since Version 0.9Since version 0.9 of this guide was created, the following sections are new:‣ Added Dynamic Routing feature.Updates Since Version 0.7Since version 0.7 of this guide was created, the following sections are new:‣ Functional Overview‣ GPUDirect Storage Requirements, except Software Components and Alignment with OtherLinux Initiatives, which also had minor updates.‣ Using GPUDirect Storage in ContainersSections 1, 2, and a part of section 3 from the cuFile API Reference Guide have been moved tothis guide.Updates Since Version 0.5 (Alpha 2)The following sections have been updated since version 0.5, relative to the cuFile APIReference Guide:‣ 1.4: Shifted from upcoming features to Beta availability.Concretized references to partner support, which will simplify future work.‣‣‣‣2.2: Compatibility mode has been added.2.4: Added monitoring functionalities like Ftrace, logging, profiling.5.1: Updated deployment specifics and library dependencies.5.3: Dependencies have been refinedNVIDIA GPUDirect StorageDU-10094-001 v0.95.1 4

Introduction‣ 5.4: Limitations were updated, and specifics of distributed filesystem support were added.Updates Since Version 0.4 (Alpha 1)The following updates have been made to this document since versions 0.4 of the cuFile APIReference Guide guide:‣ 1.2: Greater clarity around ease of use and unaligned IO as a benefit.‣ 2.5: Increased clarity around GPLv2.NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 5

Chapter 2.Functional OverviewThis section provides a functional overview of GDS. It covers basic usage, generality,performance considerations, and a scope of the solution. This documentation applies to thecuFile APIs, which are issued from the CPU.2.1.Explicit and DirectGDS is a performance-centric solution, so the performance of an end-to-end transfer is afunction of latency overheads and the maximal achievable bandwidth.Here are some terms:Explicit programmatic requestAn explicit programmatic request that immediately invokes the transfer between thestorage and the GPU memory is proactive.Implicit requestAn implicit request to storage, which is induced by a memory reference that causes a pagemiss from the GPU back to the CPU, and potentially the CPU to storage, is reactive.Note: Reactive activity tends to induce more overhead. As a result of being explicit andproactive, GDS maximizes performance with its explicit cuFile APIs.Latency is lower when extra copies are avoided, and the highest bandwidth paths are taken.Without GDS, an extra copy through a bounce buffer in the CPU is necessary, which introduceslatency and lowers effective bandwidth.Note: The latency improvements from GDS are most apparent with small transfers.With GDS, although there are exceptions, a zero-copy approach is possible. Additionally, whena copy through the CPU is no longer necessary, the data path does not include the CPU. Onsome systems, a direct path between local or remote storage that goes through a PCIe switchoffers at least twice the peak bandwidth as compared to taking a data path through the CPU.Using cuFile APIs to access GDS technology enables explicit and direct transfers, which offerslower latency and higher bandwidth.For direct data transfers between GPU memory and storage, the file must be opened inO DIRECT mode. If the file is not opened in this mode, contents might be buffered in the CPUsystem memory, which is incompatible with direct transfers.NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 6

Functional OverviewExplicit Copy versus Using mmapThe following code samples compare the code sequences of an explicit copy versus usingmmap and incurring an implicit page fault where necessary.This code sample uses explicit copy:int fd open(file name,.)void *sysmem buf, *gpumem buf;sysmem buf malloc(buf size);cudaMalloc(gpumem buf, buf size);pread(fd, sysmem buf, buf size);cudaMemcpy(sysmem buf,gpumem buf, buf size, H2D);doit gpumem buf, // no faults;This code sample uses mmap:int fd open(file name, O DIRECT,.)void *mgd mem buf;cudaMallocManaged(mgd mem buf, buf size);mmap(mgd mem buf, buf size, , fd, )doit mgd mem buf, // fault on references to mgdmem bufIn the left pane, pread is used to move data from storage into a CPU bounce buffer,sysmem buf, and cudaMemcpy is used to move that data to the GPU. In the right pane, mmapmakes the managed memory backed by the file. The references to managed memory fromthe GPU that are not present in GPU memory will induce a fault back to the CPU and then tostorage, which causes an implicit transfer.GDS enables DMA between agents (NICs or NVMe drives) near storage and GPU memory.Traditional POSIX read and write APIs only work with addresses of buffers that reside in CPUsystem memory. cuFile APIs, in contrast, operate on addresses of buffers that reside in GPUmemory. So they look very similar, but have a few differences, as shown in Figure 2.Comparing the POSIX APIs and the cuFile APIsThe following code samples compare the POSIX APIs and cuFile APIs. POSIX pread and pwriterequire buffers in CPU system memory and an extra copy, but cuFile read and write onlyrequires file handle registration.This code sample uses the POSIX APIs:int fd open(.)void *sysmem buf, *gpumem buf;sysmem buf malloc(buf size);cudaMalloc(gpumem buf, buf size);pread(fd, sysmem buf, buf size);cudaMemcpy(sysmem buf,gpumem buf, buf size, H2D);doit gpumem buf, This code sample uses the cuFile APIs:int fd open(file name, O DIRECT,.)CUFileHandle t *fh;CUFileDescr t desc;desc.type CU FILE HANDLE TYPE OPAQUE FD;NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 7

Functional Overviewdesc.handle.fd fd;cuFileHandleRegister(&fh, &desc);void *gpumem buf;cudaMalloc(gpumem buf, buf size);cuFileRead(&fh, gpumem buf, buf size, );doit gpumem buf, Here are the essential cuFile functionalities:‣ Explicit data transfers between storage and GPU memory, which closely mimic POSIXpread and pwrite.‣ Non-buffered IO (using O DIRECT), which avoids the use of the filesystem page cache andcreates an opportunity to completely bypass the CPU system memory.‣ Performing IO in a CUDA stream, so that it is both async and ordered relative to the othercommands in that same stream.The direct data path that GDS provides relies on the availability of filesystem drivers that areenabled with GDS. These drivers run on the CPU and implement the control path that sets upthe direct data path.2.2.Performance OptimizationsAfter there is a viable path to explicitly and directly move data between storage and GPUmemory, there are additional opportunities to improve performance.2.2.1.Implementation Performance EnhancementsGDS provides a user interface that abstracts the implementation details. With theperformance optimizations in that implementation, there are trade offs that are enhanced overtime and are tuned to each platform and topology.The following graphic shows you a list of some of these performance optimizations:NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 8

Functional OverviewFigure 1.Performance Optimizations‣ Path selectionThere might be multiple paths available between endpoints. In an NVIDIA DGX-2 system, for example, GPU A and GPU B that are connected to CPU sockets CPU A and CPUB respectively may be connected via two paths.‣ GPU A -- CPU A PCIe root port -- CPU A to CPU B via the CPU interconnect -- CPUB along another PCIe path to GPU B.‣ GPU A -- GPU B using NVLink.Similarly, a NIC that is attached to CPU A and to GPU A via PCIe by using an interveningswitch has a choice of data paths to GPU B:‣ The NIC -- CPU A PCIe root port, CPU A -- CPU B via CPU interconnect, and CPU Balong another PCIe path -- GPU B.‣ The NIC -- a staging buffer in GPU A and NVLink -- GPU B.‣ Staging in intermediate buffersBulk data transfers are performed with DMA copy engines. Not all paths through a systemare possible with a single-stage transfer, and sometimes a transfer is broken into multiplestages with a staging buffer along the way.In the NIC-GPU A-GPU B example in the graphic, a staging buffer in GPU A is required,and the DMA engine in GPU A or GPU B is used to transfer data between GPU A’s memoryand GPU B’s memory.Data might be transferred through the CPUs along PCIe only or directly between GPUsover NVLink. Although DMA engines can reach across PCIe endpoints, paths that involvethe NVLink may involve staging through a buffer (GPU A).‣ Dynamic routingNVIDIA GPUDirect StorageDU-10094-001 v0.95.1 9

Functional OverviewPaths and staging. The two paths in the following graphic are available between endpointson the left half and the right half, the red PCIe path or the green NVLink path.2.2.2.Concurrency Across ThreadsHere is some information about how GDS manages concurrency across threads.Note: All APIs are expected to be thread safe.Using GDS is a performance optimization. After the applications are functionally enabled tomove data directly between storage and a GPU buffer by passing a pointer to the GPU bufferdown through application layers, performance is the next concern. IO performance at thesystem level comes from concurrent transfers on multiple links and across multiple devices.Concurrent transfers for each 4 x 4 NVMe PCIe device is necessary to get full bandwidth fromone x16 PCIe link. Since there are PCIe links to each GPU and to each NIC, many concurrenttransfers are necessary to saturate the system. GDS does not boost concurrency, so this levelof performance tuning is managed by the application.2.2.3.AsynchronyAnother form of concurrency, between the CPU and one or more GPUs, can be achieved in anapplication thread through asynchrony.In this process, work is submitted for deferred execution by the CPU, and the CPU cancontinue to submit more work to a GPU or complete the work on the CPU. This process addssupport in CUDA for async IO, which can enable a graph of interdependent work that includesIO to be submitted for deferred execution.There is a plan for this to be enabled in a future version of GDS with an asynchronoussubset of the cuFile APIs, and this feature will add a CUDA stream as an argument. TheseAPIs will also add a pointer to an integer to hold the number of transferred bytes, which isasynchronously updated. Refer to the cuFile API Reference Guide for more information.2.2.4.BatchingHere is some information about how batching is used in GDS.There is some fixed overhead involved with each submission from an application into thecuFile implementation. For usage models where many IO transactions get submittedsimultaneously, batching reduces the overhead by amortizing that fixed overhead across thetransactions in the batch, which improves performance.Applications might also submit a batch of IO transactions and start working on a subset ofcompleted transactions without having to wait for the whole set. An automatically updated setof flags that indicate which transactions in a batch have completed allows the application toproceed before the entire set of transactions in the batch have completed.The cuFile batch APIs require the application developer to allocate and populate a datastructure with a set of descriptors for IO transactions in the batch, and an initialized bit vectorto indicate the completion status. Batch APIs are also asynchronous and use a CUDA streamargument. Refer to the cuFile API Reference Guide for more information.NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 10

Functional Overview2.3.Compatibility and GeneralityAlthough the purpose of GDS is to avoid using a bounce buffer in CPU system memory, theability to fall back to this approach allows the cuFile APIs to be used ubiquitously even undersuboptimal circumstances. A compatibility mode is available for unsupported configurationsthat maps IO operations to a fallback path.This path stages through CPU system memory for systems where one or more of the followingconditions is true:‣ Explicit configuration control by using the user version of the cufile.json file.Refer to the cuFile API Reference Guide for more information.‣ The lack of availability of the nvidia-fs.ko kernel driver, for example, because it was notinstalled on the host machine, where a container with an application that uses cuFile, isrunning.‣ The lack of availability of relevant GDS-enabled filesystems on the selected file mounts, forexample, because one of several used system mounts does not support GDS.‣ File-system-specific conditions, such as when O DIRECT cannot be applied.Vendors, middleware developers, and users who are doing a low-level analysis of filesystems should review the GPUDirect Storage O DIRECT Requirements Guide for moreinformation.Refer to cuFileHandleRegister in the cuFile API Reference Guide for more information.Performance on GPU-based applications that transfer between the storage and GPU memoryin compatibility mode is at least the same or better than current CPU-based APIs when GDS isnot used. Testing for the CPU path is limited to POSIX-based APIs and qualified platforms andfilesystems that do not include GDS.Even when transfers are possible with GDS, a direct transfer is not always possible. Here is asampling of cases that are handled seamlessly by the cuFile APIs:‣ The buffer is not aligned, such as the following:‣‣‣‣The offsets of the file are not 4KB-page aligned.The GPU memory buffer address is not 4KB-page aligned.The IO request size is not a multiple of 4KB.The requested IO size is too small, and the filesystem cannot support RDMA.‣ The size of the transfer exceeds the size of the GPU BAR1 aperture.‣ The optimal transfer path between the GPU memory buffer and storage involves anintermediate staging buffer, for example, to use NVLink.The compatibility mode and the seamless handling of cases that require extra steps broadenthe generality of GDS and makes it easier to use.NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 11

Functional Overview2.4.MonitoringThis section provides information about the monitoring facilities that are available to trackfunctional and performance issues in GDS.GDS supports the following monitoring facilities for tracking functional and performanceissues:‣‣‣FtraceExported symbols for GDS functions can be traced using Ftrace. You can also usestatic tracepoints in the libcufile.so library, but the tracepoints are not yet supportedfor nvidia-fs.ko. Refer to the GPUDirect Storage Troubleshooting Guide for moreinformation.LoggingError conditions and debugging outputs can be generated in a log file. This information isuseful for conditions that affect many of the APIs but need only be reported once or affectAPIs with no return value to report errors. The cufile.json file is used to select at leastreporting level, such as ERROR, WARN, INFO, DEBUG, and TRACE.ProfilingGDS can be configured to collect a variety of statistics.These facilities, and the limitations of third-party tools support, are described in greater detailin the GPUDirect Storage Troubleshooting Guide.2.5.Scope of the Solutions in GDSHere is some information about the solutions that are available in GDS.GDS has added new APIs with functionality that is not supported by today’s operating systems,including direct transfers to GPU buffers, asynchrony, and batching. These APIs offer aperformance boost, with a platform-tuned and topology-tuned selection of paths and staging,which add enduring value.The implementations under cuFile APIs overcome limitations in current operating systems.Some of those limitations are transient and may be removed in future versions of operatingsystems. Although these solutions are not currently available and may require time foradoption, other GDS-enabled solutions are needed today. Here are the solutions currentlyavailable in GDS:‣ Third-party vendor solutions for distributed filesystems.‣ Long-term support through open source, upstreamed Linux that future GDSimplementations will seamlessly use.‣ Local filesystem support by using modified storage drivers (currently for experimentationonly).‣ The overall cuFile architecture involves a combination of components, some from NVIDIAand some from third parties.NVIDIA GPUDirect StorageDU-10094-001 v0.95.1 12

Functional Overview‣ Here is a list of the NVIDIA-originated content:‣ User-level cuFile library, libcufile.so, which implements the following in the closedsource code:‣ cuFile Driver APIs:‣‣cuFileDriver{Open, Close}‣‣‣‣cuFileHandle{Register, Deregister}‣cuFile{Read, Write}AsynccuFileDriver{GetProperties, Set*}‣ cuFile IO APIs:cuFileBuf(Register, Deregister}cuFile{Read, Write}‣ Stream subset of the cuFile APIs (Future):‣ cuFileBatch APIs (Future):‣ cuFileBatchIO(Submit, GetStatus, Cancel, Destroy}‣ Calls to VFS components in standard Linux whether the filesystem is standardLinux, NFS, distributed filesystems, and so on.‣ nvidia-fs.ko, the kernel-level driver:‣ Implements callbacks from modified Linux kernel modules or from proprietaryfilesystems that enable direct DMA to GPU memory.‣ Licensed under GPLv2.Likewise, any kernel third-party kernel components that call the nvidia-fs APIsshould expect to be subject to GPLv2.‣ Third-party content‣ Proprietary code stacks that replace portions of the Linux filesystem and block system,and so on.2.6.Dynamic RoutingGDS Dynamic Routing is a feature for choosing the optimal path for cuFileReads andcuFileWrites to and from files on network-based file systems such as DDN-Exascaler, VASTNFS, and WekaFS. For hardware platforms, where GPUs do not share the same Root Port withthe Storage NICs, peer to peer transactions (p2p) may have higher latency and are inefficientcompared to p2p traffic under PCIe switches.With this feature, based on platform configuration, the cuFile library tries to efficiently routethe I/O to and from GPU without paying the penalty of cross-root port p2p traffic. For example,if the storage NIC shares a common PCIe root port with another allowed GPU (say GPU1) andthe target GPU (say GPU0) is across the CPU root complex, cuFile library can use a bounceNVIDIA GPUDirect StorageDU-10094-001 v0.95.1 13

Functional Overviewbuffer on GPU1 to perform the p2p transaction to GPU1 and copy the data to target GPU0. Thepresence of NVLINKs across GPUs can further accelerate the subsequent device to device(GPU1- GPU0) I/O transfers using NVLINK instead of PCIe.For each mount/volume, cuFile library pre-computes the best GPUs having the smallest PCIdistance with the available storage NICs for routing the I/O. During reads and writes cuFilechecks if the target GPU shares a common PCIe switch and does not need traffic to crossthe CPU root complex. If the path is already optimal, then dynamic routing does not apply,otherwise cuFile library selects a candidate GPU for intermediate bounce buffer and performsa device to device copy to the target GPU buffer.Note: There is a possibility that there might be multiple candidate GPUs for the staging theintermediate buffer and may not be equidistant from all the Storage NICs, in that cuFile relieson the underlying file-system driver to pick the best storage NIC for the candidate GPU basedon the nvidia-fs callback interface to choose the best NIC based on the GPU buffer.2.6.1.cuFile

storage and the GPU memory is proactive. Implicit request An implicit request to storage, which is induced by a memory reference that causes a page miss from the GPU back to the CPU, and potentially the CPU to storage, is reactive. Note: Reactive activity tends t

Related Documents:

NVIDIA Corporation 2012 GPUDirect P2P Communication on Dual-IOH Systems (1/2) PCI-e P2P Communication Not Supported Between Intel IOH Chipsets

NVIDIA virtual GPU products deliver a GPU Experience to every Virtual Desktop. Server. Hypervisor. Apps and VMs. NVIDIA Graphics Drivers. NVIDIA Virtual GPU. NVIDIA Tesla GPU. NVIDIA virtualization software. CPU Only VDI. With NVIDIA Virtu

Veriton P330 F2 product summary Designed for users demanding an excellent combination of performance and expandability, the Veriton P330 F3 is a best-of-class choice for both computing and rendering capabilities Intel Xeon E5 processors 8 DIMMs DDR3 ECC memory NVIDIA Quadro400 NVIDIA Quadro600 NVIDIA Quadro2000 NVIDIA Quadro4000 NVIDIA QuardroK5000

NVIDIA PhysX technology—allows advanced physics effects to be simulated and rendered on the GPU. NVIDIA 3D Vision Ready— GeForce GPU support for NVIDIA 3D Vision, bringing a fully immersive stereoscopic 3D experience to the PC. NVIDIA 3D Vision Surround Ready—scale games across 3 panels by leveraging

all 250 W or the rack is power constrained, the board power can be set to a lower level. nvidia-smi nvidia-smi is an in-band monitoring tool provided with the NVIDIA driver and can be used to set the maximum power consumption with driver running in persistence mode. An example command to enable Max-Q is shown (power limit 180 W): nvidia-smi -pm 1

Gigabyte GA-8N-SLI-Quad Royal NVIDIA nForce4 SLI Intel Ed. F5 Gigabyte GA-K8N Pro-SLI NVIDIA nForce4 SLI F4 Gigabyte GA-K8N-SLI NVIDIA nForce4 SLI F9 - 31 - BIOS Gigabyte GA-K8N51GMF NVIDIA nForce 410/GeForce 6100 F7 Gigabyte GA-K8N51PVM9-RH NVIDIA nForce 430/GeForce 6150 F1

NVIDIA CAPTURE SDK : INTERFACES NVFBC: NVIDIA Frame Buffer Capture NVIFR: NVIDIA In-band Frame Render L i n u x W i n d o w s-ToHWEnc interfaces internally invoke NVENC API (part of NVIDIA Video Codec SDK) NVIFR - NVIFR Directx NVIFRToSys NVIFRToHWEnc NVIFR - OpenGL NVFB

Contract HHSM-500-2015-00246C ; Enhanced Direct Enrollment (EDE) API Companion Guide Version 5.6 August 17, 2020 : CMS FFE Companion Guide ii . Document Control . Author Versio n Rev. date Summary of Changes Section Page Abigail Flock, Alexandra Astarita, Sean Song 1.0 . 1/23/2018 . Initial Version . All . All . Scott Bickle, Alexandra Astarita, Sean Song 2.0 . 3/15/2018 . Incorporated Client .