C-to-Verilog : High-Level Synthesis Using LLVM

2y ago
69 Views
3 Downloads
884.00 KB
35 Pages
Last View : 15d ago
Last Download : 3m ago
Upload by : Adalynn Cowell
Transcription

C-to-Verilog.com:High-Level Synthesis Using LLVMNadav Rotem, Haifa University

Computing tradeoffs Different kinds of computational problems Different kind of architecture solutionsCamera rGraphics

Python, JavaCPUC/C , OpenCLMultiCoreEasy to program.Flexible. Slow.GPUDSPVerilog, VHDL FPGAASICPerformance, powerefficient. Difficult toprogram.

Introduction to HighLevel Synthesis

Hardware description languages Complex digital systems are made of basiclogic elements (AND, NOT, FF, etc.) Designers use Hardware DescriptionLanguages to describe logic blocks Use C-like syntax to express hardwaremodule toplevel(clock,reset,out);input clock;input reset;output reg out;reg flop1;reg flop2;always @ (posedge reset or posedge clock)If (reset) beginflop1 0;flop2 1;end else beginflop1 flop2;flop2 flop1;out flop2 flop1;endendmodule

The problem with HDL Need to 'think' hardware All parts of the circuit operate at the same time Explicit notion of time (clock, synchronization) Explicit notion of space (size and connectivity ofcomponents) Extremely long compilation cycle Difficult to develop and verify Details, Details, Details

High-level synthesis HLS: Compilation of high-level languages, suchas C, to Hardware Description Languages. Easier to write and test code in C Use a subset of C– No IO, recursion, jump by value, etc.– Standard hardware/software interfacewhile(true) {out val1 val2;.}module toplevel(clock,reset,out);input clock;input reset;output reg out;reg flop1;reg flop2;always @ (posedge reset orposedge clock)If (reset) beginflop1 0; flop2 1;end else beginflop1 flop2; flop2 flop1;out flop2 flop1;endendmodule

C-to-Verilog.com LLVM-based high-level synthesis system Developed as a graduate research project Website is web-interface for the synthesissystem Free, Open, etc.

High-Level Synthesis using LLVM

HLS using LLVM– Use Clang and LLVM to parse and optimize thecode– Special LLVM passes optimize the code at IR level– HLS backend synthesize the code to ionpassesVerilogBackendVerilog

HLS backend for LLVM

Simple High-Level Synthesis– It is trivial to compile sequential C-like code to HDL– A state-machine can represent the original code– We can create a state for each 'opcode‘case (state)ST0: begin– Example:A B 5;entry:%A add i32 %B, 5%C icmp eq i32 %A, 0%br i1 %C, label %next, label %entry state ST1;endST1: beginC (A 0)state ST2;endST2: beginif (C)state ST9;elsestate ST0;end.endcase

High-Level synthesis challenges The simple state-machine translation isinefficient We want to optimize:– Fast designs (few clock cycles to complete)– High-frequency (low clock latency)– Size and resource efficient (few gates, memoryports)– Low-power

Scheduling pipelined resources– Generally, in HLS resources can be synthesized Unlimited registers, arithmetic ops, etc.– Some resources are limited, and need to beshared. External memory ports ASIC Multipliers (for FPGA synthesis)– Often, hardware resources are 'pipelined', to gainhigh frequencies. Multiplier – 5 stages, Memory – 2 stages, etc.

List Scheduling– Schedule a single basic block– Convert a DDG into a [Time x Resource] table– Requirements: Preserve DDG dependencies Expose parallelism Conserve resources, use pipelined resources– After scheduling, HDL syntax generation is simple

Example (bad) Multiplier – 3 cycles Load/Store – 2 cycles Other – 1 cycle 2 2 2 3 3 3 1 1 2 2LoadALoadBMultLoadCMultMultAddXor21 cyclesStoreEStoreD

TimeLoadALoadBMultLoadCMult13 cyclesMultAddXorStoreEMultMemOtherStoreD

IR Optimizations for hardware

Reduce-bitwidth-opt CPUs have fixed execution units In hardware synthesis, arithmetic operationsare synthesized into circuits Fewer bit width arithmetic operationstranslate to smaller circuits which operate athigher frequencies

Reduce-bitwidth-opt Reduce bit width in several cases:1. Detect local bit reducing patterns (masks)2. Reduce constant integers to lowest bit-width3. Use smallest possible arithmetic operation basedon input width Simple LLVM PassY X & 0xFF1i8 i8 i9i4 * i4 i8i32 0x523

Arithmetic tree height reduction*************Short dependency chain,High parallelismLong dependency chain,Low parallelism*

Arithmetic tree height reduction Simple LLVM pass Collect long chains of arithmetic operationsand balance them Only for commutative arithmetic and logicoperations Not suitable for software where number ofregisters is unlimited

HLS flow example

Pop-count example// count the number of 1’s in a wordunsigned popCnt(unsigned input) {unsigned sum 0;for (unsigned i 0; i 32; i ) {sum (input) & 1;input input/2;}return sum;}

Pop-count Runtime:– The program has a loop, which executes 32 times. Size:– Has several 32-bit registers.– Has control-flow logic. Frequency:– Has 32bit-adders, long carry-chains. IR-level optimizations can be very beneficial

Pop - count First, we let LLVM unroll and optimize theloop// count the number of 1’s in a wordunsigned popCnt(unsigned input) {unsigned sum 0;sum (input 0) & 1;sum (input 1) & 1;sum (input 2) & 1; return sum;}

Pop - count Next, we balance the long-chain of adders tobecome a tree of 31-additions Balance

Pop - count// count the number of 1’s in a wordunsigned popCnt(unsigned input) {unsigned sum0,sum1, sum2, sum3 t0, t0 (input 0) & 1;t1 (input 1) & 1;t2 (input 2) & 1; sum0 t0 t1;sum1 t2 t3; sum30 sum29 sum 28;return sum;}

Pop - count Finally, we’ll reduce the bit-width of eachoperation&& & & & & & & &1 results in 1-bit valueAddition of 1 bit inputs 2-bitAddition of 2 bit inputs 3-bit Addition of 5 bit inputs 6-bit

Pop - count// count the number of 1’s in a wordunsigned popCnt(unsigned input) {uint1 t t0, t1 uint2 t sum0,sum1, sum2, sum3 uint3 t sum17,sum18, sum19, sum20 uint6 t sum31; t0 (input 0) & 1; sum0 t0 t1; sum31 sum29 sum 30;return sum;}

Pop – count Finally, we pass the hw-optimized IR to thebackend for scheduling and syntax generationOnly a wireCycle 0&& & & & & & Cycle 2 Cycle 1&Small operations canbe scheduled into asingle clock cycle

Pop - count IR-level optimizations are very beneficial– Size: fewer and smaller registers, no control flow– Frequency: smaller arithmetic ops (32bit - 6 bits)– Cycles: Fewer cycles (32 - 4)

Conclusion High-level synthesis automates circuit design LLVM is an invaluable tool when developing aHLS compiler HLS compiler is made of IR-level optimizationpasses and a scheduling backend

Questions ?

Ref C-to-Verilog.ComHigh Level SynthesisSynthesis of Pipelined Arithmetic UnitsHLS Publications

C-to-Verilog.com LLVM-based high-level synthesis system Developed as a graduate research project Website is web-interface for the synthesis system Free, Open, etc. High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the codeFile Size: 884KB

Related Documents:

Verilog-A HDL Overview 1.1 Overview This Verilog-A Hardware Description Language (HDL) language reference manual defines a behavioral language for analog systems. Verilog-A HDL is derived from the IEEE 1364 Verilog HDL specification. This document is intended to cover the definition and semantics of Verilog-A HDL as proposed by Open Verilog .

Verilog PLI Tutorial ? : 20% Complete What's new in Verilog 2001? : 50% Complete Verilog Quick Reference. Verilog in One Day : This tutorial is in bit lighter sense, with humor, So take it cool and enjoy. INTRODUCTION Introduction. Verilog is a HARDWARE DESCRIPTION LANGUAGE (HDL). A hardware

Verilog HDL model of a discrete electronic system and synthesizes this description into a gate-level netlist. FPGA Compiler II / FPGA Express supports v1.6 of the Verilog language. Deviations from the definition of the Verilog language are explicitly noted. Constructs added in versions subsequent to Verilog 1.6 might not be supported.

CSE 371 (Roth): Verilog Primer 5 And now Verilog Structural Verilog: use for actual designs Wires and wire assignment Combinational primitives Hierarchical modules Timing Behavioral Verilog: use for wrappers and testing only I.e., things you don't want to write gate-level designs for Registers and memories

The Verilog Golden Reference Guide is a compact quick reference guide to the Verilog hardware description language, its syntax, semantics, synthesis and application to hardware design. The Verilog Golden Reference Guide is not intended as a replacement for the IEEE Standard Verilog Language Reference Manual.

an independent Verilog consultant, specializing in providing comprehensive expert training on the Verilog HDL, SystemVerilog and PLI. Stuart is a co-authorof thebooks "SystemVerilogfor Design", "Verilog-2001: A Guide to theNewFeatures in the Verilog Hardware Description Language" and

Verilog vs. VHDL –Verilog is relatively simple and close to C –VHDL is complex and close to Ada –Verilog has 60% of the world digital design market (larger share in US) Verilog modeling range –From gates to proc

Verilog code thinks it is calling a native Verilog task or function Using the SystemVerilog DPI – Verilog code can directly call C functions – Verilog code can dire