The co-processor architecture: an embedded system architecture for rapid prototyping
the proven performance of the existing hardware can be coupled with flexibility and futureproofing. Even within existing systems, the co-processor architecture provides options to designers, which would otherwise not be available [6].
the co-processor architecture to the embedded systems designer but also showcased
Figure 8: Xilinx Vivado HLS design flow.
the performance-enhancing options available with modern FPGA tools. Enhancements, like the ones mentioned below, may not be available or may have less impact for other hardware architectures. The discrete cosine transform (DCT) was selected as a computationally intensive algorithm, and its progression from a C-based implementation to an HDL-based implementation was at the heart of these findings. DCT was chosen since this algorithm is used in digital signal processing for pattern recognition and filtering [8]. The empirical findings were based upon a laboratory exercise, which was completed by the author
Rapid prototyping advantages
need to be an expert in hardware or HDL to modify, route, or implement different soft-core processors or components within the FPGA. So long as the designer is aware of the interface and the formats of the data, they have full control over the signal paths and can refine the system’s performance. Empirical findings – the discrete cosine transform case study The empirical findings not only confirmed the flexibility availed by
At its heart, the rapid prototyping process strives to cover a substantial amount of product development area by executing tasks in parallel, identifying ‘bugs’ and design issues quickly, and validating data and signal paths, especially those within a project’s critical path. However, for this process to truly produce streamlined, efficient results there must be sufficient expertise in the project areas required. Traditionally, this means that there must be a hardware engineer, an embedded software or DSP engineer, and an HDL engineer. Now, there are plenty of interdisciplinary professionals, who may be able to satisfy multiple roles; however, there is still substantial project overhead involved in coordinating these efforts. In their paper, An FPGA based rapid prototyping platform for wavelet coprocessors, the authors promote the idea that using a co-processor architecture allows a single DSP engineer to fulfil all of these roles, efficiently and effectively. For this
study, the team began designing and simulating the desired DSP functionality within MATLAB’s Simulink tool. This served two primary functions, in that it, 1) verified the desired performance through simulation, and 2) served as a baseline to which future design choices could be compared and referenced. After simulation, critical functionalities were identified and divided into different cores – these are soft-core components and processors that can be synthesized within an FPGA. The most important step during this work was to define the interface among these cores and components and to compare the data-exchange performance against the desired, simulated performance. This design process closely aligned with Xilinx’s design flow for embedded systems and is summarized in Figure 7.
Figure 7: Implementation design flow.
architecture satisfied over 75% of the automotive market’s baseline entertainment functionality; however, it lacked the ability to address video processing applications and wireless communications. By including an FPGA within this existing architecture, further flexibility and capability can be added to this already-existing design approach. The Figure 5 architecture is suitable for both video processing and wireless communications management. By pushing the DSP functionalities to the FPGA, the Amanda processor can serve a system management role and is freed to implement a wireless
communications stack. As both the Amanda and FPGA have access to the external memory, data can
be rapidly exchanged among the system’s processors and components.
Latency
Interval
min
max
min max
The second infotainment in Figure 6 highlights the FPGA’s ability to address both the incoming high- speed analog data and the handling of the compression and encoding needed for video applications. In fact, all of this functionality can be pushed into the FPGA and through the use of parallel processing, these can all be addressed in real- time. By including an FPGA within an existing hardware architecture,
2935
2935
Default (solution 1) Pipeline inner loop (solution 2) Pipeline outer loop (solution 3)
2935
2935
1723
1723
1723
1723
843
843
843
843
Array partition (solution 4)
477
477
477
477
476
343
Dataflow (solution 5)
476
343
By dividing the system into synthesizable cores, the DSP engineer can focus upon the
Inline (solution 6)
463
463
98
98
Table 1: FPGA algorithm execution optimization findings (latency and interval).
most critical aspects of the signal processing chain. She/he does not
we get technical
16
17
Powered by FlippingBook