RapidWright: Enabling Custom Crafted Implementations For

2y ago
22 Views
2 Downloads
2.18 MB
22 Pages
Last View : 21d ago
Last Download : 3m ago
Upload by : Baylee Stein
Transcription

RapidWright1:Enabling Custom Crafted Implementations for FPGAsChris Lavin & Alireza KavianiFCCM - May 1, 2018Wright1 maker or builder

FPGA Implementation Tools Lead a Challenging Life Oh, and make it easyto use for softwareprogrammers Highest QoRPossible!!Highestutilizationpossible!!For EveryImaginableDesign!!Page 210X fastercompile time!! Copyright 2018 Xilinx

Xilinx Silicon CapabilityArchitecture Performance1800700600MHz5004003002001000Series 7 (28ns)UltraScale (20ns)Logic & DSPPage 3 Copyright 2018 XilinxBRAMUltraScale (16nm)1SpeedGrade: -2

What is RapidWright?Companion framework for Vivadosynth design– Communicates through DCPs.DCP– Fast, light-weight, open sourceopt design– Java, Python support.DCPCheckpointReaderplace designEnables targeted solutions.DCP– Reuse & relocate pre-implemented modules‘Secret Sauce’phys opt design(your code here).DCP– Systematic shells & overlaysroute design– Generate on-the-fly implementations.DCPphys opt designAcademic ecosystem.DCP– Algorithm validation– Rapid prototyping of CAD conceptsPage 4 Copyright 2018 XilinxCheckpointWriter

RapidWright Framework DESIGNMODELVivado.dcpLabs Vivado.xddDevice FilesCreatorXDD ParserAPPLICATIONS.datPARTGENERATIONPage 5DeviceNetlistARCH MODEL Copyright 2018 Xilinx

Modular Pre-Implemented Methodology1. RestructureDesignUSER COMPLETED2. Packing &PlacementPlanning3. Stitch, Place& RouteImplementationUSER ASSISTEDTOOL AUTOMATEDMEM0DesignMEM1MEM0MEM1Page ;1,20,1;1,10,1;1,10,0;1,00,0;1,0 Copyright 2018 XilinxVivadoOOCFlowIP Cache*RapidWright

Creating Pre-implemented Modules (Vivado OOC Flow)OOCSynthesisPBlockGeneratorOOCPlace & RoutePage 7( or usersupplied)BlockGenerator Copyright 2018 XilinxIPCache

Step 3: RapidWright Pre-implemented Module FlowIPIDesignVivadoOOC FlowIPCache*ImplGuideFilePage 8IPIDesignParserBlockStitcherFully Placed,Partially ckPlacer Copyright 2018 Xilinx*IP Cache Augmentedw/RapidWright

Design Results1(Step 1)(Step 3) Hz53%54%GEMMKU115*391MHz437MHz10%462MHz6%16%Heart ofGoldZU9EG*368MHz569MHz55%541MHz-5%50%Total *Constrained area of the ionBRAMUtilizationSeismic93% (226k)5% (26k)--FMA25% (166k)50% (656k)97% (5360)6% (130)GEMM19% (64k)20% (134k)87% (2400)-Heart of Gold46% (30k)29% (38k)42% (272)96% (208)1SpeedPage 9 Copyright 2018 XilinxGrade: -2

2. FMA DesignRUNFMAX (MHz)Baseline (initial design)270GOALVivado (restructured design)273 ( 1%)– Highest compute (TeraOp/s) possiblePre-implemented Flow417 ( 53%)– 16-bit fused multiply accumulateKU115 Implementation– 1340 kernel instances 4x10 CLEs, 1x4 DSPs– 97% DSP utilization– 4.4 TeraOp/s“Fabric discontinuites”– SLR boundary– IO Columns– Depopulated CLEs (SLR crossing)Page 10FMA KERNEL Copyright 2018 Xilinx

Re-locatability & Multiple l#5Page 11 Copyright 2018 XilinxIO

Crossing IOs: AXI Stream Register SlicesIOIOStrategically placed register slices to cross longdistances– cross chip IO columnsLatency insertion/connectivity is easily automatedPage 12 Copyright 2018 Xilinx

Debug Productivityinstrumented.dcprouted.dcpLeverage unusedresources toplace blocks androute enerationDebugProbeRouterDebug BlockCachePage 13 Copyright 2018 Xilinx

RapidWright vs. Vivado for Debug Instrumentation Speedup45.0Runtime (minutes)40.035.0BaselineRapidWright Debug30.025.020.015.010.05.035x24x12x97x33xdsp1 (9% CLBs)10g (10% CLBs)dsp2 (20% CLBs)sparc (31% CLBs)21ch (70% CLBs)0.0Page 14 Copyright 2018 Xilinx

Fully Placed and Routed Designs in SecondsfpFIRFiltermicroblazeDesignPage 15RapidWrightFlowVivadoSpeedupmicroblazeDesign12.5 seconds232 seconds19xfpFIRFilter18.6 seconds183 seconds10x Copyright 2018 Xilinx

On-the-fly, Pre-implemented Module Generators (Demo)x 2 3*x-5Build modules on-demand– Placed and routed in seconds– Reusable and compose-able– Target spec performancexx 3x Parameterizable Generators – Adder– Subtractor – MultiplierPolynomial Solution Generator– Runs at spec 775MHz (UltraScale , SG2)– Constructed on-the-fly in seconds– Still in developmentPage 16 Copyright 2018 Xilinx̶5

AWS F1 - LinkBlaze Shell FloorplanGoal: Achieve Spec PerformanceDDR– 775MHz on UltraScale , SG2– Minimize overhead of overlays/shellsL AXIB TGLBLinkBlaze [1]: Data movement soft NoCLBLB– 128 bit, bi-directionalLB– Modular design Pre-implemented modules Captures high performance implementationSLR1SLR0Challenge: Crossing SLR– Solved using two techniques in RapidWright Custom clocking of Leaf clock buffer and delay tuning Custom clock root routing per SLR crossing[1] LinkBlaze: Efficient global data movement for FPGAs (ReConFig 2017)Page 17 Copyright 2018 XilinxDDRLBL AXIB TGLBLBLBLBDDRL AXIB TGLLBBLBLBLBLBAWSShell(DDR &PCIe)

SLR Crossing Solution – Clocking TechniquesCustom RoutedClock RootAdd Extra Delayto Capture ClockLeaf Clock Buffers (LCBs)DDR– Custom route RX/TX to same LCB– Tune LCB delay to avoid hold issuesInter-SLR CompensationRXSLL– Minimize by custom creating clockroots per SLR MHz Copyright 2018 XilinxLBL AXIB TGLBLCB & CustomClk RouteTXPage 18LBLB– 15% tax of clk delay between rootand RX flopVivado2017.3L AXIB TGL AXIB TGLLBBLBLBLBLBAWSShell(DDR &PCIe)

RapidWright SLR Crossing DCP Generator (Demo)SLR crossing module fromscratch– Parameterizable– Closes timing at 760MHz Clk Period: 1.313ns– Routed clock, placed and routed– Runs in secondsPage 19 SLR Crossing DCP Generator This RapidWright program creates a placed and routed DCP that can beimported into UltraScale designs to aid in high speed SLR crossings. SeeRapidWright documentation for more information.Option------?, -h-a [String: Clk input net name]-b [String: Clock BUFGCE site name]-c [String: Clk net name]-d [String: Design Name]-i [String: Input bus name prefix]-l [String: Comma separated list ofLaguna sites for each SLR crossing]-n [String: North bus name suffix]-o [String: Output DCP File Name]-p [String: UltraScale Part Name]-q [String: Output bus name prefix]-r [String: INT clk Laguna RX flops]-s [String: South bus name suffix]-t [String: INT clk Laguna TX flops]-u [String: Clk output net name]-v [Boolean: Print verbose output]-w [Integer: SLR crossing bus width]-x [Double: Clk period constraint (ns)]-y [String: BUFGCE cell instance name]-z [Boolean: Use common centroid] Copyright 2018 XilinxDescription----------Print Help(default: clk in)(default: BUFGCE X0Y218)(default: clk)(default: slr crosser)(default: input)(default: LAGUNA fault:(default:(default:north)slr crosser.dcp)xcvu9p-flgc2104-2-i)output)GCLK B 0 1)south)GCLK B 0 0)clk out)true)512)1.538)BUFGCE inst)false)

COMPLEXITY / DIFFICULTYVision: Pre-implemented ModulesAlgorithmic Engines(SAT Solvers, ILP, )ParameterizableCircuit GeneratorsPROBLEM SIZEPage 20 Copyright 2018 Xilinx Vivado-optimized OOCSolutions

TakeawaysRapidWright enables customized solutions– Relocate & replicate pre-implemented modules– On-the-fly circuit generators– Leverage algorithmic engines (SAT Solvers, ILP, )Modular pre-implemented Methodology– Up to 50% performance improvement– 10X productivity gains– Near-spec performance (94% of spec)www.rapidwright.io– Open source -- try it out today– Documentation, tutorials, source code, and demosPage 21 Copyright 2018 Xilinx

www.rapidwright.ioPage 22 Copyright 2018 Xilinx

Enabling Custom Crafted Implementations for FPGAs Chris Lavin & Alireza Kaviani . Series 7 (28ns) UltraScale (20ns) UltraScale (16nm) MHz Architecture Performance1 . Custom clocking of Leaf clock buffer and delay tuning Custom clock root routing per SLR crossing [1] LinkBlaze: Efficient global dat

Related Documents:

Since we have a custom component in the model we can open the Custom component editor. Edit custom 1. Select the User_end_plate component symbol. component 2. Right-click and select Edit custom component. The Custom component editor opens along with the Custom component editor toolbar, the Custom component browser and four views of the custom .

Custom folder and custom invoice in the Invoices Custom folder. If you have a custom invoice, your installation process is complete. If you have a custom report, you can see it under Reports menu Report Center Custom. Be sure to check the Legacy View option on the bottom so

Salesforce Communities UI. A few examples include: Custom Header & Footer Custom CSS & Graphics Custom Menu Custom Home Page Custom Pages. Custom UI Example. DEMO: If you would like a demonstration of an out-of-box Salesforce Community vs. a Salesforce Community with a Custom UI, call us at 800-708-1790 to schedule a demo.

-Onyx Core This core can be crafted at the Volcano Forge. -Elarikan Core This core can be crafted at the Frost Temple Smithy. -Kronyxium core This core can be crafted at the Frost Temple Smithy. Sanguis Schematics These schematics can be used to craft the Sanguis Outfit and Daggers.

Kubernetes operator is a custom controller watching a custom resource and taking action to modify the custom resource status based on the custom resource specification. This custom controller is created by the developer with functionality specific to the custom resource it reconciles. It is also worth noting that a Kubernetes operator can be .

AZ Foothills is here to report four custom home builders that can help you achieve your wildest dreams in a home: Salcito Custom Homes, Sage Luxury Homes, Argue Custom Homes and Alexander Homes. . Scottsdale, AZ 85251 Argue Custom Homes As a Preferred Builder in Silverleaf, Argue Custom Homes is dedicated to making sure your home building .

1. Select Detailing Define custom component to open the Custom component wizard dialog box. Define Fastener plate custom part 2. On the Type/Notes tab, set Type to Part, enter a name and description (description is not mandatory) for the custom component as shown. Click Next . The Custom component types available:

2 The Adventures of Tom Sawyer. already through with his part of the work (picking up chips), for he was a quiet boy, and had no adventurous, troublesome ways. While Tom was eating his supper, and stealing sugar as opportunity offered, Aunt Polly asked him questions that were full of guile, and very deep for she wanted to trap him into damaging revealments. Like many other simple-hearted souls .