FPGA Virtualization - University Of California, San Diego

2y ago
103 Views
2 Downloads
2.00 MB
22 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Kaleb Stephen
Transcription

FPGA VirtualizationFor CSE291J VirtualizationYizhou ShanFeb 27, 2020

What is FPGA?2

ProgrammableBitsLUT MemoryProgrammable Bits3(Images from: Parallel Programming for FPGAs, http://kastner.ucsd.edu/hlsbook/)

Development ProcessOptional,You can startfrom HDLC PythonScala.Bitstream is like a binary etlistSynthesis(mins-hrs)BitstreamPlace and Route (P&R)(hrs-days)4

Use FPGA as an accelerator- Image/Video Processing- Machine Learning--DNN/CNN and moreA good alternative to GPUsBio AnalysisNetwork Acceleration (e.g., SmartNIC)Storage Acceleration (e.g., SmartSSD)Graph/KVS and moreYou don’t really need to understand FPGA in order to use itRecent language advancement has boosted its recent adoption5

Massive Deployment, Cloud FPGAMicrosoft Project Catapult- Released at 2014 (internal, not public)- Since then, it has been used to accelerate- Bing Search- Azure Network- Machine LearningAWS, Alibaba, etc- Public cloud FPGA- High-end Xilinx chips- Large scale, low-cost, and fast dev- Current model- Single user, no sharing6Image from Catapult, ISCA’14

Discussion: Is FPGA the future?-Microsoft is betting big on FPGAGoogle and Amazon are leaning towards ASICCan FPGA take over GPU or Google TPU or ASICin the future?Compared to FPGA:CostEnergyDev lowerDependsLowerASICGoogle TPULowerLowerSlowerDependsLowerASICAmazon NitroLowerLowerSlowerDependsLower7

Towards sharing cloud FPGAsVM 0VM 1VM 2VM 3Strawman solution:- vFPGA on top of a physical hellFPGAReasons for sharing- Customer: pay-as-you-go- Vendor: consolidationKey technique: Partial Reconfiguration (PR)- Change a part of a running FPGA design, e.g., updatevFPGA0, without disturbing others- Limitation: fixed slots Fragmentation- Resizing needs to reprogram the whole chipBut sharing needs more:- Protection- Elasticity- Compatibility8

AmorphOS: enable cloud FPGA sharing- High-level goals-Protection among untrusted FPGA applicationsDynamic Scaling, or ElasticityCompatibility across vendors9

AmorphOS- A framework to efficiently share cloud FPGAs among untrusted users-A set of APIsA way to partition the chip ZoneA way to scale/package FPGA apps MorphletA way to mix FPGA apps High-throughput/Low-latency modeA way to protect resource from untrusted FPGA applications HullFinally a way to deploy mixed apps onto protected and partitioned chip Registry10

Zone and Morphlet- Partition the chip into a multi-level zones-Global zone for the whole chipThen smaller sub-zones- Morphlet-An instance of a user FPGA bitstream (like a container, and scalable)It can morph, i.e., dynamically change resource requirements- How? E.g., change the array size N for int buf[N].11Image from AmorphOS, OSDI’18

Scheduling Morphlets- Low Latency Mode-Fixed zones PRDefault Morphlet bitstream- High Throughput Mode-Combine multiple MorphletsCo-schedule on a global zoneLow latency: switching is fastHigh throughput: more areas are used12

Image from AmorphOS, OSDI’18reprogramPRreprogram13

Discussion: Scheduling Tasks in FPGA- What are two common approaches?- Can we do the same on FPGA?FPGAvFPGA0vFPGA1vFPGA0shellRunqueuevFPGA2Time 0Space sharing is easy.Time sharing is not, esp for preemptive time sharing- It’s hard to readback the states, not what FPGA isdesigned forvFPGA2vFPGA0vFPGA1shellshellvFPGA1vFPGA2Time 1Time 214

Protection- Vendor Shells-Basic raw PCIe, memory controller IPsClocks, virtual LEDs, Pads etcNo sharing, protection, multiplexing mechanism.Why not just let users deploy those raw IPs?1. Those are heavy lifting tasksa. A novice user can easily spentdays/months on just setting up2. Safety. Misconfiguration might harm chip.AWS EC2F1 ShellAWS F1 ShellvFPGA0vFPGA2vFPGA1vFPGA3CustomerLogic15

Protection- What need to be protected?-Host/On-board DRAMOther host PCIe devices- AmorphOS Hull-Hardens and extends vendor shellsIsolation/Protection/FairnessInterfaces- Control (CntrlReg)- Virtual Memory (AMI)- Data Transfer (PCIe)- Basic idea-Mediate all IO requests16

RegistryRationale- Compiling takes time (hours to days)- To switch, next bitstream must be ready- AmorphOS will do precompile and putthem into a bitstream registryFor this particular case,we need 4 bitstreams!17

Limitations of AmorphOS- App Support-The key to AmorphOS’s success is its ability to right-sizing appsTo take advantage, apps must be written in a way that can scaleThus, its solution is more on managing app rather than managing chip- Virtualization Support-How to use host resource in a virtualization-enabled node? E.g., with IOMMU in place.- Runtime-The hull protection is static and lacks of a runtime dynamic mgmt partA SW-programmer friendly interface: e.g., malloc/free, read file, etc.18

Advancement in this fieldViTAL, ASPLOS’20A framework that is able to partition any FPGAapplications, and partition the FPGA intoidentical blocks.Thus it overcomes the app support part, andalso has a finer-grained scheduling unit.(No app modification needed)Image from ViTAL, ASPLOS’2019

Advancement in this fieldOptimus, ASPLOS’20Deal with DMA IOMMU.Essentially implementedan IOTLB and IO Page Table Walker in FPGAImage from Optimus, ASPLOS’2020

Summary- FPGA is massively deployed in Cloud- Shared Cloud FPGA is still in its infancy- A lot exciting research is going on, trying to improve the model21

Thank you.Questions?22

FPGA vFPGA2 vFPGA1 vFPGA3 Disk shell Reasons for sharing - Customer: pay-as-you-go - Vendor: consolidation Strawman solution: - vFPGA on top of a physical FPGA Key technique: Partial Reconfiguration (PR) - Change a part of a running FPGA design, e.g., update vFPGA0, without d

Related Documents:

In this section, we give an overview of virtualization and describe virtio, the virtualization standard for I/O devices. In addition, we discuss the state-of-the-art for network I/O virtualization. 2.1 Overview of Virtualization and virtio The virtualization technology is generally classi ed into full-virtualization and paravirtualization.

In this thesis, FPGA-based simulation and implementation of direct torque control (DTC) of induction motors are studied. DTC is simulated on an FPGA as well as a personal computer. Results prove the FPGA-based simulation to be 12 times faster. Also an experimental setup of DTC is implemented using both FPGA and dSPACE. The FPGA-based design .

FPGA ASIC Trend ASIC NRE Parameter FPGA ASIC Clock frequency Power consumption Form factor Reconfiguration Design security Redesign risk (weighted) Time to market NRE Total Cost FPGA vs. ASIC ü ü ü ü ü ü ü ü FPGA Domain ASIC Domain - 11 - 18.05.2012 The Case for FPGAs - FPGA vs. ASIC FPGAs can't beat ASICs when it comes to Low power

Step 1: Replace ASIC RAMs to FPGA RAMs (using CORE Gen. tool) Step 2: ASIC PLLs to FPGA DCM & PLLs (using architecture wizard), also use BUFG/IBUFG for global routing. Step 3: Convert SERDES (Using Chipsync wizard) Step 4: Convert DSP resources to FPGA DSP resources (using FPGA Core gen.)

This guide also explains the advantages of virtualization and dispels some common myths that exist regarding virtualization. 1.1. Who should read this guide? This guide is designed for anyone wishing to understand the basics of virtualization, but may be of particular interest to: Those who are new to virtualization.

TU Dresden, 2009-12-01 MOS - Virtualization Slide 6 von 58 Virtualization – a hype A lot of interest in the research community within the last years, e.g.: SOSP 03: Xen and the Art of Virtualization EuroSys 07: a whole session on virtualization Many virtualization products: VMware, QEmu, VirtualBox, KVM

Lots of features (Contd.) Domain Isolation: VCPU and Host Interrupt Affinity Spatial and Temporal Memory Isolation Device Virtualization: Pass-through device support Block device virtualization Network device virtualization Input device virtualization Display device virtualization VirtIO v0.9.5 for Para-virtualization

physical entities, and categorizes virtualization on two levels: resource (or infrastructure) virtualization and service (or application) virtualization. In resource virtualization, physical resources such as network, compute, and storage resources are segmented or pooled as logical resources. An example of resource virtualization: Sharing a load