Why and how to use Efinix FPGAs for AI/ML imaging – Part 2: Image capture and processing
Usage of the block RAM and digital signal processing (DSP) blocks is also very efficient, using only 4 of the 640 DSP blocks and 40% of the memory blocks (Table 1). At the device IO, the DDR interface for the LPDDR4x is used to provide the application memory for the Sapphire SoC and the image frame buffers. All of the device-dedicated MIPI resources are utilized along with 50% of the phase lock loops (Table 2). The general purpose I/O (GPIO) is used to provide the I²C communications along with several of the interfaces connected to the Sapphire SoC, including NOR FLASH, USB UART, and SD card. The HSIO is used to provide the high-speed video output to the ADC7511 HDMI transmitter. One of the crucial elements when designing with FPGAs is not only implementing and fitting the design within the FPGA, but also being able to place the logic design within the FPGA and achieve the required timing performance when routed. Long gone are the days of single- clock-domain FPGA designs. There are several different clocks, all running at high frequencies in the Ti180 reference design. The final timing table shows the maximum frequencies achieved for the clocks within the system. This is where the
Timing
Core Resources
Table 1: Resource allocation on the Efinix architecture shows only 42% of the XLR cells are used, leaving ample room for additional processes. Image source: Adam Taylor
Worst Negative Slack (WNS)
0.182 ns
Inputs
1264 / 3706
Worst Hold Slack (WHS)
0.026 ns
Outputs
1725 / 4655
i_pixel_clk
211.909 MHz
XLRs
73587 / 172800
tx_escclk
261.370 MHz
Memory Blocks
508 / 1280
i_pixel_clk_tx
210.881 MHz
DSP Blocks
4 / 640
Figure 5. Clock constraints for the reference design. Image source: Adam Taylor
i_sys_clk
755.858 MHz
cost through system integration.
image processing features, and extensive routing is needed to connect the IP cells at the required frequencies. The reference design uses approximately 42% of the XLR cells within the device, leaving ample room for additions, including custom applications such as edge ML.
i_axi0_mem_clk
130.429 MHz
One of the key benefits of reference designs is that they can be used to kickstart application development on custom hardware, enabling developers to take critical elements of the design and build off it with their needed customizations. This includes the ability to use Efinix’s TinyML flow to implement vision- based TinyML applications running on the FPGA. This can leverage both the parallel nature of FPGA logic and the ability to easily add custom instructions into RISC-V processors, allowing the creation of accelerators within the FPGA logic. Implementation As discussed in Part 1, the Efinix architecture is unique in that it uses eXchangeable Logic and Routing (XLR) cells to provide both routing and logic functionality. A video system such as the reference design is a mixed one that is both logic and routing heavy: extensive logic is required to implement the
requested timing performance can also be seen in the constraints (Figure 5), which have a maximum clock frequency of 148.5 megahertz (MHz) for the HDMI output clock. Timing implementation against the constraints shows the potential of the Titanium FPGA XLR structure as it reduces the possible routing delay, thereby increasing design performance (Table 3).
i_sys_clk_25mhz
234.577 MHz
i_soc_clk
187.231 MHz
i_hdmi_clk
233.918 MHz
mipi_dphy_rx_inst1_WORD_ CLKOUT_HS mipi_dphy_rx_inst2_WORD_ CLKOUT_HS mipi_dphy_rx_inst3_WORD_ CLKOUT_HS mipi_dphy_rx_inst4_WORD_ CLKOUT_HS
273.973 MHz
262.881 MHz
Conclusion
Periphery Resource
204.290 MHz
The Ti180 M484 reference design clearly showcases the capabilities of Efinix FPGAs and the Ti180 in particular. The design leverages several of the unique I/O structures to implement a complex image processing path that supports several incoming MIPI streams. This image processing system operates under the control of a soft-core Sapphire SoC, which implements the necessary sequential processing elements of the application.
207.598 MHz
DDR
1 / 1
mipi_dphy_tx_inst1_SLOWCLK 201.979 MHz
GPIO
22 / 27
mipi_dphy_tx_inst2_SLOWCLK 191.865 MHz
HSIO
20.0 / 59
mipi_dphy_tx_inst3_SLOWCLK 165.235 MHz
JTAG User TAP 1 / 4
mipi_dphy_tx_inst4_SLOWCLK 160.823 MHz
MIPI RX
4 / 4
jtag_inst1_TCK
180.505 MHz
MIPI TX
4 / 4
Table 3: Timing implementation against the constraints shows the potential of the Titanium FPGA XLR structure to reduce the possible routing delay, thereby increasing design performance. Image source: Adam Taylor
Oscillator
0 / 1
Table 2: Snapshot of the interface and I/O resources used. Image source: Adam Taylor
PLL
4 / 8
we get technical
54
55
Powered by FlippingBook