Virtualized Graphics With Teradici PCoIP Hardware Accelerator

1y ago
13 Views
2 Downloads
663.86 KB
11 Pages
Last View : 1m ago
Last Download : 2m ago
Upload by : Konnor Frawley
Transcription

Virtualized Graphics with Teradici PCoIP Hardware Accelerator PERFORMANCE STUDY INTRODUCTION Virtual Desktop Infrastructure (VDI) is now a mature technology that is being adopted by organizations with increasing popularity. However, the user community who needs high-end graphics capabilities in their work (such as 3D modeling and simulations) were somewhat skeptical about the capabilities and performance of VDI with applications using 3D graphics, compared to the performance they can expect from a workstation with a dedicated GPU. Recent advances in VDI attempt to make this performance gap narrower by providing the capability for either a pool of virtual desktops to share a GPU or, to dedicate the GPU resources to a specific virtual desktop, depending on the user requirements. This white paper describes how the Teradici PCoIP Hardware Accelerator (previously known as the APEX 2800 Server Offload card) can be used with virtualized GPUs to more effectively support high-end graphics applications in a VMware Horizon View VDI environment. With measured quantitative data, the whitepaper shows the significant improvements in user experience and reduction in CPU load provided by the Teradici PCoIP Hardware Accelerator with 3D graphics workloads, making it an essential component of any virtualized GPU solution. The paper begins with an introduction to the basic terminology used in supporting virtualized graphics in a VMware Horizon View environment. The second section describes the value proposition of the Teradici PCoIP Hardware Accelerator with 3D graphics, and includes a discussion of non-overlapping, complementary functions of a GPU and the Hardware Accelerator. The test environment, test methodology and a description of the workload are described next. Then the summary results are presented and discussed to show how the Teradici PCoIP Hardware Accelerator enhances end user experience, while providing costeffective user consolidation in high-end graphics use cases. The discussion of results also highlight how the benefit of CPU usage reduction may get partly traded off for enhancing end user experience. Finally, after our concluding remarks, the Appendix provides additional details of the test environment.

Virtualized Graphics with Teradici PCoIP Hardware Accelerator Performance Study 3D GRAPHICS IN VMWARE VIEW ENVIRONMENT VMware has progressively introduced higher performance 3D-Graphics capabilities into VMware Horizon View. This section briefly introduces different graphics capabilities supported by VMware Horizon View. Virtual Machine 1 VMware Driver Soft 3D PCoIP Agent Virtual Machine n VMware Driver VMware ESXi Soft 3D PCoIP Agent Teradici Driver FIGURE 1: SOFTWARE RENDERER IN INDIVIDUAL VMS (VMWARE SOFT GPU) Graphics with Soft 3D VMware vSphere 5.0 and View 5.0 introduced basic 3D support in VMs, without the need to have a hardware GPU in an ESXi server. This support provided end users such as task and knowledge workers, the capability to experience Windows Aero capabilities and run some basic 3D applications like Google Earth in a VM. Figure 1 shows the basic components of Soft 3D at a high level. The main advantage of Soft 3D is the enabling of low-end graphics features without having to use a hardware GPU. However, Soft 3D has CPU overhead created to by the rendering of the displays and has limited 3D performance. Virtual Shared Graphics Acceleration (vSGA) Virtual Machine 1 VMware Driver vSGA VMware Driver PCoIP Agent GPU Vendor Driver Virtual Machine n VMware ESXi PCoIP Agent Teradici Driver FIGURE 2: PHYSICAL GPU RESOURCES SHARED BY MULTIPLE VMS (VMWARE VSGA) With vSphere 5.1 and VMware Horizon View 5.2, virtual Shared Graphics Acceleration (vSGA) provides the capability for multiple VMs to share hardware GPU resources in an ESXi server. The main benefit is its cost effectiveness, while providing high performance graphics to multiple users. Figure 2 shows the main components of the vSGA architecture. A GPU-vendor specific driver resides in the ESXi hypervisor and no changes are needed in individual VMs. The focus of this whitepaper is to highlight the complementary benefits of using the Teradici Hardware Accelerator together with a GPU in this vSGA mode. Virtual Dedicated Graphics Acceleration (vDGA) Virtual Machine 1 GPU Vendor Driver vDGA Virtual Machine n PCoIP Agent PCoIP Agent GPU Vendor Driver Teradici Driver VMware ESXi FIGURE 3: DEDICATED GPU RESOURCES FOR INDIVIDUAL VMS (VMWARE VDGA) The capability to dedicate a GPU in an ESXi server for exclusive use by a specific VM (known as vDGA – virtual Dedicated Graphics Acceleration) is supported with vSphere 5.5 and VMware Horizon View 5.3. This makes available the full capabilities of a hardware GPU to a single VM. A GPU vendor-specific driver resides in the VM. The ESXi server hardware should support the “PCI Passthrough” feature and the GPU appears to the VM as a directly connected PCI-e device. vDGA is intended for users needing high-end graphics processing capabilities, such as in 3D-modeling and simulations in computer-aided design and manufacturing (CAD/CAM), or similar complex graphics-oriented applications. Figure 3 depicts the vDGA components at a high level. Our initial tests in vDGA mode with Teradici PCoIP Hardware Accelerator together with a dedicated GPU show promising results to be published in a future whitepaper. 2

Virtualized Graphics with Teradici PCoIP Hardware Accelerator Performance Study HOW IS PCOIP HARDWARE ACCELERATOR VALUABLE FOR 3D WORKLOADS? In this section, we first discuss the value proposition of the Teradici PCoIP Hardware Accelerator with 3D graphics workloads. Then we describe how the functions performed by the Hardware Accelerator differ from the functions performed by a GPU to show that they are not overlapping, but complementary in enhancing the overall performance of 3D graphics applications. Value Proposition of Teradici PCoIP Hardware Accelerator Many 3D-graphics applications are CPU-bound (i.e., the CPU is the performance bottleneck, compared to other resources such as memory or storage access) and are single threaded (i.e., only use a single core of a CPU). Therefore, for these type of applications, a server with a fewer number of cores - but with high clock frequency delivers higher performance, compared to a server with a larger number of cores with lower clock frequency. The Teradici PCoIP Hardware Accelerator frees up a precious server core that would have otherwise been used by the PCoIP Software Encoder. In addition to this benefit of freeing up a CPU core, the Hardware Accelerator can process pixels at a higher peak throughput than the Software Encoder. The net results are: Dramatically improved 3D application experience Optimized server with fewer, high frequency cores llHigher frame rate delivered to the end user ll ll In the “Test Results” section, using measured, quantitative data from our tests, we show how the Teradici Hardware Accelerator provides the above benefits and justify why the Hardware Accelerator is an essential component in virtualized 3D graphics environments. 3

Virtualized Graphics with Teradici PCoIP Hardware Accelerator Performance Study GPU Functions versus Teradici PCoIP Hardware Accelerator Functions Figure 4 illustrates at a high level how pixels are rendered and encoded before being sent to the remote client with a Soft 3D, with a GPU, and with a Hardware Accelerator. The Soft 3D processes graphics commands/primitives from a 3D application and delivers pixels to the VM frame buffer. If a GPU is present (shared or dedicated), it takes over this function of the Soft 3D and delivers pixels to the frame buffer much faster. In the absence of a Teradici PCoIP Hardware Accelerator, the PCoIP Software Encoder extracts the pixels from the frame buffer, performs the pixel encoding function and delivers the encoded pixels to the client through the network interface. If a Hardware Accelerator is used, the functions of the software encoder are performed by the Hardware Accelerator with a higher peak throughput. The Hardware Accelerator frees up the CPU cycles that would have been used by the Software Encoder, making it available for other processing functions or for other users on the server. This is particularly critical for CPU-bound applications where fewer, high-frequency CPU cores are the optimal choice. Windows Desktop VM (VMware Horizon View) VM OS and Applications ESXi Hypervisor Host Rendered Display GPU PassThrough or Shared GPU VMW pixel buffer PCoIP Soft Encoder Send encoded pixels only to clients HW Accelerator Drivers Hardware GPU rendered pixels Plus Hardware Accelerator FIGURE 4: COMPLEMENTARY FUNCTIONS OF GPU AND HW ACCELERATOR As described above (Figure 4), the GPU and the Hardware Accelerator functions are not overlapping. They both result in speeding up 3D applications: the GPU in generating pixels faster to the frame buffer and the Hardware Accelerator in delivering those pixels to the end point faster, by offloading the pixel encoding functions from the CPU and freeing up a CPU core to perform other functions. 4

Virtualized Graphics with Teradici PCoIP Hardware Accelerator Performance Study Test Environment and Methodology ESXi Server (Host VMs) ESXi Server (Client VMs) Switch PCoIP Zero Client ESXi Server (Management VMs) FIGURE 5: TEST ENVIRONMENT Figure 5 illustrates the test environment. Two ESXi servers and a PCoIP Zero Client are connected to a switch. One ESXi server supports the desktop VMs, while the other supports the management VMs. The managements VMs include the vCenter server, VMware Horizon View connection server, and active directory with DNS and DHCP servers. All servers were running ESXi 5.5, with hyper threading and turbo-boost features enabled. The ESXi server that supports the desktop VMs also has the Teradici PCoIP Hardware Accelerator and an Nvidia Grid K2 GPU (which consists of two GPUs) installed in it. Additional details of the test environment are included in the Appendix. The tests were conducted with a single desktop VM (VMware Horizon View 5.3 agent) and a PCoIP Zero Client containing the Teradici PCoIP Processor chip. The desktop VM had the Aero and 3D settings enabled and display resolution set to 1920 x 1080. Please see the Appendix for more details on the test configuration. CPU usage statistics were collected with VMware ESXtop (statistics gathering utility) in parallel with the workload. A sampling interval of 5 seconds was used with ESXtop. Frame rate numbers were extracted from the PCoIP server log and the PCoIP Zero Client web user interface. In all the tests, caching was disabled and the maximum frame rate was set to 60 frames per second. The rest of the PCoIP protocol parameters were left at their default values. The workload was run for about 10 minutes and each data point is the average of a minimum of 3 repeated test runs, to minimize the impact of variability of results across tests. The Teradici PCoIP Hardware Accelerator makes dynamic decisions about offloading (i.e., when to offload and when not to offload) based on factors such as activity level of a display, VM priority, etc. In our testing, we disabled this dynamic capability of the accelerator to deterministically control the offloading decision. 5

Virtualized Graphics with Teradici PCoIP Hardware Accelerator Performance Study Description of Workload Google Earth was used as the workload in the tests. The main reasons for choosing Google Earth: Can be run in DirectX and OpenGL modes easily llEasy for any interested party to use it and reproduce the results ll The workload emulates a user using Google Earth’s navigation tools with a mouse to spin the earth in different directions back and forth in a maximized window. A Python script was used to perform the following sequence of actions: Spin the earth about the horizontal axis, in the N - S direction for about 5 seconds llThen, spin about an axis inclined by 120 degrees anti-clockwise from the horizontal axis in NE - SW direction for another 5 seconds llNext, spin about the horizontal axis, in the S - N direction for about 5 seconds llLastly, spin the earth about an axis inclined by 60 degrees anti-clockwise from the horizontal axis in NW - SE direction for another 5 seconds. ll The above sequence of actions was repeated for approximately 10 minutes. The workload was run in both OpenGL and DirectX modes. The version of Google Earth used was 7.1.2.2041. Test Results In this section we present the end user experience (using client frame rate as a representative metric) and CPU usage results from our tests. Results are shown with and without the Teradici PCoIP Hardware Accelerator with vSGA (i.e., shared GPU mode). Results without the Hardware Accelerator are shown as “Soft Encoder”, meaning the PCoIP Software Encoder since it is used when the Hardware Accelerator is not being used. 6

Virtualized Graphics with Teradici PCoIP Hardware Accelerator Performance Study End User Experience A common metric for quantifying the end user experience in objective tests is the “frame rate” as seen by the end user on the client display (which may be different from the frame rate that the application is delivering to the frame buffer in the VM). We termed this as the “client frame rate” and used it in our tests as a our end user experience metric since it represents what the end user experiences as the frame rate of the application in a remote application scenario. Client frame rate is the rate at which the end user display updates due to changing image content from the host. This means the client frame rate is either limited by the server’s ability to generate the content or to encode the content. In our tests, the server’s ability to generate the content was never artificially limited (our VM has 4 vCPUs and an NVIDIA K2 GPU). A high client frame results in smooth transitions in a series of related scenes (like a fast moving object), providing a visually pleasing experience, while the same related scenes played at a lower frame rate makes the transitions between consecutive scenes somewhat abrupt, resulting in a lower, perhaps even irritable, user experience. 100.0 100.0% 90.0% h 39% OpenGL h 80% DirectX 80.0% Normalized FPS Increase in client frame rate with PCoIP Hardware Accelerator 70.0% 60.0% 98.0 39% OpenGL HW Accelerator Increase 72.0 OpenGL Soft Encoder 80% Increase DirectX Soft Encoder DirectX HW Accelerator 54.0 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% vSGA FIGURE 6: NORMALIZED FRAME RATES WITH HW ACCELERATOR & SHARED GPU (Google Earth in OpenGL & DirectX modes) Figure 6 shows the relative frame rates of the four scenarios normalized to the frame rate of the OpenGL with Hardware Acceleration scenario. As seen from this figure, the use of the Hardware Accelerator results in an increase of 39% in the OpenGL client frame rate compared to the OpenGL client frame rate with the Software Encoder. With DirectX, the increase in client frame rate due to the Hardware Accelerator is much higher at 80%. These results demonstrate that the CPU cycles freed up by the Hardware Accelerator were used by the CPU in processing more frames during a given time interval (and thereby utilizing the GPU to a higher degree), causing the client frame rate to increase by a large amount. This increase in end user experience as a result of higher frame rate is more prominent in the DirectX mode in our tests. 7

Virtualized Graphics with Teradici PCoIP Hardware Accelerator Performance Study CPU Reduction with the use of Hardware Accelerator Figure 7 depicts the CPU usage of the desktop VM in OpenGL and DirectX modes of the Google Earth workload. 60.0 Decrease in CPU utilization with PCoIP Hardware Accelerator i 31% OpenGL i 9% DirectX CPU Usage (%) 50.0 40.0 OpenGL Soft Encoder 48.5 OpenGL HW Accelerator 31% Decrease 33.4 DirectX Soft Encoder 38.7 DirectX HW Accelerator 9% Decrease 35.4 30.0 20.0 10.0 0.0 vSGA FIGURE 7: CPU USAGE WITH HW ACCELERATOR & SOFT ENCODER (Google Earth in OpenGL & DirectX modes) The desktop VM has four virtual CPUs and ESXtop reports the overall CPU usage out of 400%. For easy interpretation of results, the CPU usage values were normalized to be out of 100% by dividing them by four. As seen from Figure 7, in OpenGL mode with vSGA, the CPU usage reduction benefit provided by the Hardware Accelerator is about 31% (reduced from 48.5% to 33.4%). This CPU usage reduction by the Hardware Accelerator allows more users or additional applications to use the same ESX server, without overloading the server. In DirectX mode, with vSGA, the CPU usage reduction gained by using the Hardware Accelerator is about 9%. As noted in “End User Experience” discussion above, the increase in frame rate in this case is quite high – about 80%. This shows that more of the CPU cycles freed up by the Hardware Accelerator have been used up by the CPU to increase the frame rate – thereby providing an enhanced user experience. Thus, the CPU offload benefit is being offset by a significantly improved user experience, which is the appropriate trade-off for 3D applications. In summary, with OpenGL, the Hardware Accelerator speeds up the application and significantly reduces CPU while with DirectX, it speeds up the application even more, with a more modest CPU reduction as a result. 8

Virtualized Graphics with Teradici PCoIP Hardware Accelerator Performance Study Why Hardware Accelerator is an essential component in Virtualized 3D Graphics? Figures 6 and 7 show independently the benefits of Teradici PCoIP Hardware Accelerator in enhancing user experience (in terms of client frame rate) and reducing the CPU usage. In this section, we combine the same results presented in these figures and normalized them to make the Hardware Accelerator benefits clearer. Figure 8 was created by dividing the Frame Rate values in Figure 6 by the corresponding CPU usage values in Figure 7. This gives the “Frames Per Second” (FPS) for a unit value of “CPU Usage”, for each of the different scenarios (i.e., OpenGL/DirectX with Soft 3D/vSGA and PCoIP Soft Encoder/Hardware Accelerator). These values were then separated into the two groups - OpenGL and DirectX, then normalized to the highest “FPS per CPU Usage” value (by dividing each FPS per CPU Usage value by the highest “FPS per CPU Usage” within that group) and expressed as a percentage. Figure 8 shows these normalized “FPS per CPU Usage” percentage values for the two groups Open GL and DirectX with vSGA, with and without Hardware Accelerator. Since both CPU reduction and frame increases are happening at the same time, we use Figure 8 to show the normalized number of frames generated for a given amount of CPU. In both the OpenGL and DirectX scenarios, the Hardware Accelerator generates nearly twice as many Frames Per Second (FPS) as the Software Encoder for the same amount of CPU. As discussed before, this significantly enhanced user experience (at a given level of CPU Usage) is possible with the Hardware Accelerator, as it frees up CPU cycles used for PCoIP encoding function, which can now be used for the application and rendering functions. These results show the importance of using the Teradici PCoIP Hardware Accelerator as an essential component in supporting 3D-graphics in a virtualized environment to achieve the optimal CPU performance and user experience. 100% 2X frames delivered per CPU utilization Normalized Frames Per Second Per CPU Usage (%) 100% 100% Open GL Soft Encoder 90% Open GL HW Accelerator 80% DirectX Soft Encoder 70% DirectX HW Accelerator 60% 50% 50% 51% 40% 30% 20% 10% 0% vSGA FIGURE 8: NET IMPACT OF HARDWARE ACCELERATOR ON END USER EXPERIENCE Figure 8 is a normalized comparison of the number of frames generated for a given amout of CPU with and with the Hardware Accelerator 9

Virtualized Graphics with Teradici PCoIP Hardware Accelerator Performance Study Concluding Remarks In this paper, we have presented the results of our study on the performance of Teradici PCoIP Hardware Accelerator in supporting 3D-graphics together with a physical GPU in a VMware Horizon View environment. The study was focused on the vSGA mode, which allows multiple VMs to share the GPU resources in an ESXi server. Our results show that the Teradici PCoIP Hardware Accelerator significantly enhances the end user experience by increasing the client frame rate while reducing the CPU required to deliver each frame to the user. The reduced CPU usage allows increased VM consolidation, by enabling more users to be supported on the same server, before exceeding a CPU utilization threshold that would impact all the users sharing the server. Deferring an expensive server upgrade to a future date while providing an enhanced end user experience, clearly translates to lower overall IT costs to an organization. More importantly, for demanding graphics users, the PCoIP Hardware Accelerator allows VMs to run on fewer, higher frequency cores, which is the best configuration possible for 3D Graphics application. Therefore, the Teradici PCoIP Hardware Accelerator is an essential component for supporting 3D-graphics in a virtualized environment for achieving optimal CPU performance and user experience. Our preliminary tests with vDGA mode (allows a desktop VM to exclusively use a physical GPU in an ESX server) show even more promising results by lowering the CPU usage in 3D-graphics applications. Our future paper will include an in depth study of the performance of Teradici PCoIP Hardware Accelerator with 3D-graphics in vDGA mode. 10

Virtualized Graphics with Teradici PCoIP Hardware Accelerator Performance Study APPENDIX The appendix contains additional details about the test environment, including the ESXi servers, desktop VM and the zero client. Details of the ESXi Servers ESXi Server Desktop VM ESXi Server Management VMs Hardware/Model Dell R720 SuperMicro X8DDT-H Processor Intel Xeon E5-2660 Intel Xeon E5645 Speed 2.2 GHz 2.4 GHz Processor sockets 2 2 Cores per socket 8 6 Hyperthreading Enabled Enabled Memory - Total 262.098 GB 98.294 GB Storage capacity 2.34 TB 325.94 GB SSD 1.36 TB ESXi Version 5.5.0, 1331820 5.5.0, 1331820 Nvidia ESXi5.5 Driver 319.65 Not Applicable SVN Rev. 26149 Not Applicable Teradici PCoIP Hardware Accelerator Details of Desktop VM and Zero Client Desktop VM Windows version Windows 7 - Ultimate SP1 (64-bit) Zero Client Not Applicable (N/A) VMware View version View Agent 5.3,1427931 (GA) N/A Number of vCPUs 4 N/A Memory 16 GB N/A VM version vmx-09 N/A N/A TERA2321P N/A R4.2,15160 Teradici PCoIP Processor FW version 2004 – 2014 Teradici Corporation. All rights reserved. Teradici and PCoIP are trademarks of Teradici Corporation and may be registered in the United States and/or other countries. All other trademarks are property of their respective owners. Specifications subject to change without notice. WP-1-140318 11

interface. If a Hardware Accelerator is used, the functions of the software encoder are performed by the Hardware Accelerator with a higher peak throughput. The Hardware Accelerator frees up the CPU cycles that would have been used by the Software Encoder, making it available for other processing functions or for other users on the server.

Related Documents:

Although many agent features and settings can be configured using the Windows user interface, some administrative tasks require use of Windows command line tools. Users should be familiar with both cmd and PowerShell. About the PCoIP Standard Agent for Windows The PCoIP Standard Agent for Windows is part of the Teradici Cloud Access Software .

TERADICI APEX 2800 SERVER oFFLoAD CARD – DELL DVS STAC VALIDA TIon page_4 While the Teradici APEX 2800 has already been added to the Dell online e-store in USA and Europe, Dell wanted to better understand the value proposition of the card and decided to run the tests on its DVS Enterprise stack to gain visibility of the following:

The Cirrus LT is based on Teradici's high performance TERA2321 processor, which uses the PCoIP protocol. PCoIP compresses, encrypts, and encodes the entire computing expe-rience at the data center and transmits it as pixels across the IP network to a zero client.

Ensure that the round trip network latency is within specification Excessive latency will impact desktop performance Less than 250ms round trip for VMware View 4.x hosts Less than 150ms round trip for PCoIP Host Cards Ensure the latency variation is less than 30ms About 1 frame for 30 fps (HD video and default for PCoIP Software in VMware

Virtualized Services Directory (VSD) VSP: Unified Multi-tenanted Policy and Control Virtualized Services Controller (VSC) VSAP Assurance Security Analytics VCS: Virtualized Cloud Services SD-WAN VNS: Virtualized Network Services Site A Site B Site C VPN PNF- WAN Physical Network VPN DC-GW Cloud Native Private-Public Cloud

Graphics API and Graphics Pipeline Efficient Rendering and Data transfer Event Driven Programming Graphics Hardware: Goal Very fast frame rate on scenes with lots of interesting visual complexity Pioneered by Silicon Graphics, picked up by graphics chips companies (Nvidia, 3dfx, S3, ATI,.). OpenGL library was designed for this .

Using PCoIP Zero Clients or PCoIP Software Clients (including Android, ChromeOS, and iOS) in dispersed locations, users can run the Maya application . rate at the remote client is what is important to the artist. 3. Interactive latency - the delay measured from a user input, such as a mouse . Horizon View were designed to deliver the .

Quand un additif alimentaire est autorisé au niveau européen, celui-ci bénéficie d'un code du type Exxx. Les additifs sont classés selon leur catégories. Cependant, étant donné le développement de la liste et son caractère ouvert, la place occupée par un additif alimentaire dans la liste n'est plus nécessairement indicative de sa fonction. Sommaire 1 Tableau des colorants .