DigiKey-emag- Edge AI&ML-Vol-10

Why and how to use Efinix FPGAs for AI/ML imaging – Part 2: Image capture and processing

Usage of the block RAM and digital signal processing (DSP) blocks is also very efficient, using only 4 of the 640 DSP blocks and 40% of the memory blocks (Table 1). At the device IO, the DDR interface for the LPDDR4x is used to provide the application memory for the Sapphire SoC and the image frame buffers. All of the device-dedicated MIPI resources are utilized along with 50% of the phase lock loops (Table 2). The general purpose I/O (GPIO) is used to provide the I²C communications along with several of the interfaces connected to the Sapphire SoC, including NOR FLASH, USB UART, and SD card. The HSIO is used to provide the high-speed video output to the ADC7511 HDMI transmitter. One of the crucial elements when designing with FPGAs is not only implementing and fitting the design within the FPGA, but also being able to place the logic design within the FPGA and achieve the required timing performance when routed. Long gone are the days of single- clock-domain FPGA designs. There are several different clocks, all running at high frequencies in the Ti180 reference design. The final timing table shows the maximum frequencies achieved for the clocks within the system. This is where the

Timing

Core Resources

Table 1: Resource allocation on the Efinix architecture shows only 42% of the XLR cells are used, leaving ample room for additional processes. Image source: Adam Taylor

Worst Negative Slack (WNS)

0.182 ns

Inputs

1264 / 3706

Worst Hold Slack (WHS)

0.026 ns

Outputs

1725 / 4655

i_pixel_clk

211.909 MHz

XLRs

73587 / 172800

tx_escclk

261.370 MHz

Memory Blocks

508 / 1280

i_pixel_clk_tx

210.881 MHz

DSP Blocks

4 / 640

Figure 5. Clock constraints for the reference design. Image source: Adam Taylor

i_sys_clk

755.858 MHz

cost through system integration.

image processing features, and extensive routing is needed to connect the IP cells at the required frequencies. The reference design uses approximately 42% of the XLR cells within the device, leaving ample room for additions, including custom applications such as edge ML.

i_axi0_mem_clk

130.429 MHz

One of the key benefits of reference designs is that they can be used to kickstart application development on custom hardware, enabling developers to take critical elements of the design and build off it with their needed customizations. This includes the ability to use Efinix’s TinyML flow to implement vision- based TinyML applications running on the FPGA. This can leverage both the parallel nature of FPGA logic and the ability to easily add custom instructions into RISC-V processors, allowing the creation of accelerators within the FPGA logic. Implementation As discussed in Part 1, the Efinix architecture is unique in that it uses eXchangeable Logic and Routing (XLR) cells to provide both routing and logic functionality. A video system such as the reference design is a mixed one that is both logic and routing heavy: extensive logic is required to implement the

requested timing performance can also be seen in the constraints (Figure 5), which have a maximum clock frequency of 148.5 megahertz (MHz) for the HDMI output clock. Timing implementation against the constraints shows the potential of the Titanium FPGA XLR structure as it reduces the possible routing delay, thereby increasing design performance (Table 3).

i_sys_clk_25mhz

234.577 MHz

i_soc_clk

187.231 MHz

i_hdmi_clk

233.918 MHz

mipi_dphy_rx_inst1_WORD_ CLKOUT_HS mipi_dphy_rx_inst2_WORD_ CLKOUT_HS mipi_dphy_rx_inst3_WORD_ CLKOUT_HS mipi_dphy_rx_inst4_WORD_ CLKOUT_HS

273.973 MHz

262.881 MHz

Conclusion

Periphery Resource

204.290 MHz

The Ti180 M484 reference design clearly showcases the capabilities of Efinix FPGAs and the Ti180 in particular. The design leverages several of the unique I/O structures to implement a complex image processing path that supports several incoming MIPI streams. This image processing system operates under the control of a soft-core Sapphire SoC, which implements the necessary sequential processing elements of the application.

207.598 MHz

DDR

1 / 1

mipi_dphy_tx_inst1_SLOWCLK 201.979 MHz

GPIO

22 / 27

mipi_dphy_tx_inst2_SLOWCLK 191.865 MHz

HSIO

20.0 / 59

mipi_dphy_tx_inst3_SLOWCLK 165.235 MHz

JTAG User TAP 1 / 4

mipi_dphy_tx_inst4_SLOWCLK 160.823 MHz

MIPI RX

4 / 4

jtag_inst1_TCK

180.505 MHz

MIPI TX

4 / 4

Table 3: Timing implementation against the constraints shows the potential of the Titanium FPGA XLR structure to reduce the possible routing delay, thereby increasing design performance. Image source: Adam Taylor

Oscillator

0 / 1

Table 2: Snapshot of the interface and I/O resources used. Image source: Adam Taylor

PLL

4 / 8

we get technical