# FPGA Tool-flows: **CASPER** and Beyond

Wesley New SKA-SA wesley@ska.ac.za



science and technology

Department: Science and Technology REPUBLIC OF SOUTH AFRICA



















- SKA-SA
- DBE Team Digital Back End
- Real-time data processing
- Use FPGA based hardware
- ROACH Board
- CASPER Collaboration





- Provide and overview of FPGA technologies
- Introduce the CASPER Collaboration
- Examine the CASPER FPGA Design-flow
- Introductions to the tutorials
- Discuss the future of FPGA design-flows and how CASPER and SKA can take advantage of these

#### **Background: FPGAs**

- FPGA Field Programmable Gate Array
- Effectively a reconfigurable semiconductor
- Consists of Logic Elements and Interconnects
- Well suited to parallel DSP computing
- Contain Hard Cores
- Getting progressively more complex
- Design for FPGAs using Hardware Description Languages (HDL) Verilog and VHDL
- There is a move towards higher-level design
- CPU, GPU, FPGA, ASIC

# **FPGA: Logic and Interconnects**



SRAMS cells throughout the FPGA determine the functionality of the device



- 2 Major players, Xilinx and Altera plus a few smaller ones
- Each provide there own software for designing for their FPGAs
- Xilinx ISE/Vivado
- Altera Quartus
- Both offer plug-ins for Simulink to take advantage of block diagram style design and simulation
- Complexities of porting designs between vendors
- IP specific to a vendor



**XILINX** 



# HDL (HW Description Lang)

- 2 Major languages Verilog and VHDL
- Verilog more of a C syntax
- HDL use the event-driven methodology
- Generally values of registers change on the edges of clocks

# **CPU v GPU v FPGA v ASIC**

- Tradeoffs, tradeoffs, tradeoffs
- ASICs, Long time to develop, hard to make changes, high NRE costs, run at a higher speed
- FPGAs, Short development time, expensive per unit, bad at floating-point, highly reconfigurable
- GPU, Easier to design for, good at floating-point, high power consumption
- CPU, very general purpose, easy to develop for, lower performance

#### **FPGA vs ASIC Cost Per Unit**



## HDL vs Traditions SW

- HDL is very susceptible to bad coding
- This can make designs use more power and resources
- The way of thinking when writing in an HDL is very different to software.
- HDLs statements are concurrent
- Sequential statements are executed simultaneously as apposed to sequentially
- For loops are also executed in parallel
- This is a huge change to the traditional programming mindset





- Synthesis translate HDL to gates and optimise
- MAP to the resources of the FPGA
- Place and Route Place the output of the MAP stage on the FPGA
- Timing analysis Run thought the design and check that the timing constraints are met
- Bitfile generation create the file to upload to the FPGA



- CASPER, Collaboration for Astronomy Signal Processing and Electronics Research
- Open Source Philosophy
- Provides a series of FPGA based hardware, IBOB, BEE2, ROACH1, ROACH2 and many ADCs
- Provides a design-flow for developing applications for this hardware
- Provides a space in which to share and collaborate in astronomy instrumentation design
- Members from all around the world



#### **ROACH2** Architecture





# **FPGA: Logic and Interconnects**



\*Complex multiply allows for fine delay control and per-channel digital gain control. White coloured blocks not yet implemented.

## **CASPER/MSSGE Toolflow**

- Matlab
- Simulink
- Xilinx System Generator
- Xilinx EDK
- CASPER Libraries, framework and base
   projects

## Matlab/Simulink

- Simulink provides an environment for block diagram design
- Provides blocks to aid in simulating designs
- Such as Signal Generators and Scopes
- It is also possible to pull use the Matlab language to aid in simulation
- Such as generation of inputs, comparing outputs and verify the simulation



# XSG (Xilinx System Generator)

- Plugs into Simulink
- Provides the Xilinx blockset to use in the Simulink environment
- Simulation models are also provided
- Lets the designer target a particular FPGA chip to taylor the blocks for best performance
- Generates a netlist of the whole DSP design
- This is then used pulled into a base project and connected up appropriately





- Pcores controllers
- Used as part of the glue logic to pull aspects of the design together
- Manages the bus infrastructure

## **Typical FPGA Design**



## Simple CASPER Design



System Generator

## **Design Configuration**



| 'data/casper                             |                                                                  |
|------------------------------------------|------------------------------------------------------------------|
|                                          | Block Parameters: XSG core config (on fiona)                     |
|                                          | _xsg core config (mask) (link)                                   |
| imand Window                             | The XSG Core Config block is used to configure the System        |
| lew to MATLAB? Watch thi                 | Generator design                                                 |
| In matlabrc at 209                       |                                                                  |
| arning: casper_lib                       | configure the                                                    |
| ink. To view, disca                      | Xilinx System Generator block parameters automatically,          |
| In 🛛 🔿 🔿 r2_sv                           |                                                                  |
| In<br>In File Edit View                  | script execution. It needs to be at the top level of all designs |
|                                          | being compiled                                                   |
| arn                                      | with the casper_xps toolflow.                                    |
| ara 🚺                                    | - Parameters                                                     |
| In MSSGE<br>Te BOACH2                    | Parameters                                                       |
| - 년이 - 8 8                               | Hardware Platform ROACH2:sx475t                                  |
| In XSG core config                       |                                                                  |
| arn                                      | User IP Clock Source sys_clk                                     |
| rp 🍯                                     | Lison ID Clock Poto (MHz)                                        |
| In 🔨                                     | User IP Clock Rate (MHz)                                         |
| In System                                | 400                                                              |
| In Generator<br>ann                      | Sample Deried                                                    |
| r p                                      | Sample Period                                                    |
| In                                       | 1                                                                |
| In                                       |                                                                  |
| In                                       | Synthesis Tool: XST                                              |
| arning: xps_library                      |                                                                  |
| r propagate the cha<br>In load_system at |                                                                  |
| In <u>startup at 12</u>                  | <u>OK</u> <u>Cancel</u> <u>H</u> elp <u>Apply</u>                |
| In matlabro at 209                       |                                                                  |

#### **CASPER: Base Designs**

- Each hardware platform supported by the CASPER tools has a base design
- Applications (DSP designs) get pulled into the project
- This manages clocking infrastructure
- Control bus infrastructure
- And configures constraints

## **CASPER: DSP Libraries**

- Green Blocks
- Each block has a mask scrips
- This redraws the underlying block when a parameter is changed
- This allows a huge amount of flexibility when designing a block

#### **PFB FIR**

sync

sync\_out >

> pol1\_in1 pol1\_out1 >
> pol1\_in2 pol1\_out2 >
> pol1\_in3 pol1\_out3 >
> pol1\_in4 pol1\_out4 >
 pfb\_fir\_real
 taps=4, add\_latency=1



| Function Block Parameters: pfb_fir_real                                                                                                                              | [ |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
| pfb_fir_real (mask)                                                                                                                                                  | ſ |
| Fold adders into DSPs: Causes adders to be absorbed into DSP blocks<br>(supported in Virtex5)<br>Adder implementation: Cores using Fabric or DSP48 or behavioral HDL |   |
| - Parameters                                                                                                                                                         |   |
| Size of PFB: (2^? pnts)                                                                                                                                              |   |
| 12                                                                                                                                                                   |   |
| Total Number of Taps:                                                                                                                                                |   |
| 4                                                                                                                                                                    |   |
| Windowing Function: hamming                                                                                                                                          | L |
| Number of Simultaneous Inputs: (2^?)                                                                                                                                 |   |
| 2                                                                                                                                                                    |   |
| Make Biplex                                                                                                                                                          |   |
| 0                                                                                                                                                                    |   |
| Input Bitwidth:                                                                                                                                                      |   |
| 8                                                                                                                                                                    |   |
| Output Bitwidth:                                                                                                                                                     |   |
| 18                                                                                                                                                                   |   |
| Coefficient Bitwidth:                                                                                                                                                | ĺ |
|                                                                                                                                                                      | ļ |
| <u>O</u> K <u>C</u> ancel <u>H</u> elp <u>A</u> pply                                                                                                                 |   |

# **CASPER: Controller Libraries**

- Yellow Blocks
- Any block that interacts with peripherals
- These are scripted to pull in the correct core for the hardware
- Registers accessible from the CPU, DRAM controllers, ADC controllers

## **Controller Libraries**





#### **Complex Design**



## **Simulation is Key**

- Simulink provides the ability to simulate designs
- This is one of the most important features of the tools
- Unfortunately bit-wise simulation can take days
   to complete
- Forced to simulate smaller sections of the design and test their integration on the FPGA

## **Tut1 Bit Simulation**



#### - 🗆 🛛 🐳 wave - default File Edit View Insert Format Tools Window 17 1 1 1 1 1 🛎 🖶 🖨 縃 🔏 🖻 🛍 🚧 👆 🕺 눈 🚽 💽 🖳 🏗 🔍 🔍 🔍 👫 3+ /symmetric\_fir\_tb/u\_symmetric\_fir/clk /symmetric\_fir\_tb/u\_symmetric\_fir/reset /symmetric\_fir\_tb/u\_symmetric\_fir/clk\_enable /symmetric\_fir\_tb/u\_symmetric\_fir/x\_in 0000010... 0000001111111111 ⊞--{ /symmetric\_fir\_tb/u\_symmetric\_fir/h\_in1 1111111101110111 /symmetric\_fir\_tb/u\_symmetric\_fir/h\_in2 ⊞–� (11111111110101010 /symmetric\_fir\_tb/u\_symmetric\_fir/h\_in3 0000000011001111 /symmetric\_fir\_tb/u\_symmetric\_fir/h\_in4 (0000000110100000 $\infty$ /symmetric\_fir\_tb/u\_symmetric\_fir/ce\_out /symmetric\_fir\_tb/u\_symmetric\_fir/y\_out (0000000... 11111111... 1111111... 1111111... 10000000... 10000000... 10000 ~~~~~ /symmetric\_fir\_tb/y\_out\_ref 00000000 <u> 11111111... 11111111... 1111111... 10000000... 10000000... 10000</u> **⊞**–₹ /symmetric\_fir\_tb/u\_symmetric\_fir/delayed\_x\_out 00000011 ⊞--/symmetric\_fir\_tb/delayed\_x\_out\_ref 00000011 Now. 145 ns 20 40 60 80 Cursor 1 0 ns 0 ns Þ F • Oins to 97 ns Now: 145 ns Delta: 1

#### **CASPER Success**

- Model driven development approach
- Easy to use
- Abstracts the application designer away from the low-level technical aspects of FPGAs, so that he can focus on the application
- Collaboration, open-source

# **Future of FPGA Design-flows**

- Model driven development approach is key
- One click compile solutions
- Easy migration of designs from one hardware platform to another
- High-level Languages used for FPGA design
- Mathworks HDL Coder, MyHDL (Python), C to Gates, Migen

## **Ideal Simulation**



- Bitwise simulation takes a long time
- Need the ability to simulate parts of the design and then use a higher level simulation to verify the design as a whole
- Different levels of simulation
- Bit-wise
- Functional Verification
- Co-simulation
- simulation models for each module in the design





- One click compile solutions
- Easy migration of applications from one hardware platform to another





- Target independent synthesizable Verilog and VHDL, but...
- Convert Mealy and Moore state charts to HDL
- Convert Matlab code to HDL
- Provides automatic pipelining
- Provides resource estimations
- Integration between design documents and design
- Can target custom boards

#### **HDL Coder**



#### **HDL Coder**



# HDL Coder and CASPER

- Can integrate the current mask scripts that are used to redraw the current CASPER blocks, by using HDL Coder the blockset
- Ability to simulate or verify
- Support for custom boards

#### **HDL Coder**







What is MyHDL?

MyHDL is an open-source Python package that enables Python to be used as a hardware description language. It does this by means of the Python Generator and Decorator functionality. MyHDL code can be converted to either Verilog or VHDL and then implemented onto silicon.

The power of Python is that it provides a high level design language and the ability to simulate the design using other Python packages such as NumPy and SciPy.



#### How does MyHDL Model Hardware in a Functional/O-O Language

- Concurrency
- A = B
- **B** = A
- Generators provide an elegant solution for modelling concurrency
- A generator is a resumable function
- Instead of return we use yield





- Enables Python to be used as a high-level modelling language.
- Converts Python to HDL
- Can be used to wrap existing HDL code
- Provides the ability to simulate and model designs
- Allows OO concepts to be used in hardware design ie bus objects



# **MyHDL Architecture**

- Python conversion to HDL
- Using MyHDL to wrap HDL modules
- Modelling the HDL modules in Python
- Able to use ngc files, HDL and Python in one design



## **MyHDL Example**

```
mem = [Signal(intbv(0)[RAM DATA WIDTH:]) for i in range(2**RAM ADDR WIDTH)]
@always(a clk.posedge)
def a logic():
   if rst:
      a data out.next = 0
      a data out.next = mem[a addr.val]
      if a wr:
         mem[a addr.val] = a data in.val
@always(b clk.posedge)
def b logic():
   if rst:
      b data out.next = 0
      b data out.next = mem[b addr.val]
      if b wr:
         mem[b addr.val] = b data in.val
return a logic, b logic
```

#### **MyHDL Example**

```
localparam RAM DATA DEPTH = 2**RAM ADDR WIDTH; // depth
reg [RAM DATA WIDTH-1:0] mem [RAM DATA DEPTH-1:0];
always @(posedge a clk) begin
   if (`ifdef ACTIVE LOW RST !rst `else rst `endif)
      a data out <= {RAM DATA WIDTH{1'b0}};</pre>
      a data out <= mem[a addr];</pre>
      if (a wr) begin
         mem[a addr] <= a data in;</pre>
 / Port B
always @(posedge b clk) begin
   if (`ifdef ACTIVE LOW RST !rst `else rst `endif)
      b data out <= {RAM DATA WIDTH{1'b0}};</pre>
      b data out <= mem[b addr];</pre>
      if (b wr) begin
         mem[b addr] <= b data in;</pre>
```



## **MyHDL Example**

```
BRAM Verilog Instantiation
bram sync dp wrapper.verilog code = \
. . . .
bram sync dp #(
  .RAM DATA WIDTH ($RAM DATA WIDTH),
  .RAM ADDR WIDTH ($RAM ADDR WIDTH)
 bram sync dp $block name (
  .rst ($rst),
  .a clk ($a clk),
  .awr ($awr),
  .a addr ($a addr),
  .a data in ($a data in),
  .a data out ($a data out),
  .b clk ($b clk),
  .b_wr ($b_wr),
  .b addr ($b addr),
  .b data in ($b data in),
  .b data out ($b data out)
);
....
```

# **MyHDL Architecture**

**Levels of Flexibility** 

- Parameterized modules
- Generate Statements
- Precompiler Directives
- Python Scripting (Redrawing)

#### **Levels of Simulation**

- Functional Verification
- Co-Simulation
- Bit-accurate Simulation (Via 3<sup>rd</sup> party software)







- CASPER is a successful and design flow
- But we need to keep up with the latest technologies and software
- MyHDL has some great methodologies but lacks a block diagram design environment
- HDL Coder very good product and would provides most of the features we need, although it isn't cheap.