We get technical
Embedded and MCUs I Volume 16
How to implement a voice user interface on resource-constrained MCUs How single-board computers extend the reach of industrial automation A guide for the ESP32 microcontroller series How to perform firmware updates without halting firmware execution
we get technical
1
Editor’s note Embedded systems and microcontrollers (MCUs) form the backbone of modern electronics, driving innovation across industries from industrial automation and automotive to consumer electronics and IoT. As embedded technology evolves, engineers must navigate a landscape of increasing complexity, balancing power efficiency, performance, and security while meeting the growing demands for connectivity and real-time processing. One of the biggest shifts in embedded development today is the integration of AI and machine learning at the edge. The rise of MCUs with dedicated AI acceleration is enabling real-time inferencing in applications such as predictive maintenance, machine vision, and intelligent automation. Engineers now have access to hardware platforms that can perform complex computations locally, reducing the need for cloud reliance while improving latency and security. Another key trend is the ongoing push towards RISC-V architecture. As an open-source alternative to proprietary instruction set architectures, RISC-V is gaining traction among developers looking for customization, scalability, and cost-effective solutions. This shift is reshaping the embedded industry, offering engineers new opportunities to design tailored solutions while fostering innovation through open collaboration. Security remains a critical concern, particularly as more embedded systems become connected. From secure boot and hardware root of trust to post- quantum cryptography, the demand for robust security frameworks is growing. Engineers must consider not just performance and efficiency but also long-term resilience against emerging cybersecurity threats. This ebook explores the latest advancements in embedded systems and MCUs, offering insights into how engineers can harness cutting-edge technology to develop next-generation applications.
4 How to implement a voice user interface on resource-constrained MCUs 10 The co-processor architecture: an embedded system architecture for rapid prototyping 20 How single-board computers extend the reach of industrial automation 24 Getting started with the Raspberry Pi Pico multicore microcontroller board using C 28 A guide for the ESP32 microcontroller series 34 Special feature: retroelectro The birth of the microprocessor and Chuck Peddle 44 How to select and use an audio codec and microcontroller for embedded audio feedback files
50 How to perform firmware 54 How to implement Time
updates without halting firmware execution
Sensitive Networking to ensure deterministic communication
we get technical
2
3
within voice range with no need to use a keyboard, mouse, buttons, menus, or other interfaces to input commands (Figure 1).
Figure 1: VUI technology has been widely adopted in homes and smart buildings because it is convenient and flexible. Image source: Renesas
The downside of a VUI is its complexity. Conventional
technology is based on the lengthy training of a model with specific words or phrases. But natural language processing is word-order independent, which demands considerable development work and significant computing power to run in real-time. This has slowed the broader adoption of VUIs. Now, a new technique simplifies VUI software to the extent that it can run on small, efficient microcontrollers (MCUs) such as Arm Cortex-M devices. This technique relies on the fact that all words in each spoken language are made up of linguistic sounds called phonemes. There are far fewer phonemes than words; English has 44, Italian has 32, and the traditional Hawaiian language
on phonemes that significantly reduces the processing requirements. The result is highly accurate and efficient VUI software that can run on familiar 32-bit microcontrollers (MCUs) and is supported by easy-to-use design tools. This article describes VUI challenges and use cases. It then introduces commercial, easy-to- use MCU application software and local phoneme-based VUI software for connected home applications. The article concludes by showing developers how to get started on VUI projects using Renesas MCUs, VUI software, and evaluation kits. The challenges of building a VUI A VUI is speech recognition technology that enables interaction
with a computer, smartphone, home automation system, or other device using voice commands. After early engineering challenges, the technology has matured into a reliable control interface and is now widely used in smart speakers and other smart home devices. The key benefit of a VUI is its convenience: instant control from anywhere
How to implement a voice user interface on resource- constrained MCUs
Smart speakers and other connected hubs form the heart of the smart home, allowing users to control devices and access the Internet. Two trends are apparent as these devices proliferate: users prefer voice control over button presses or complicated menu systems, and there is increasing discomfort with continuous Cloud connectivity because of privacy concerns. However, a robust and secure voice user interface (VUI) typically
demands powerful hardware and complex software for voice recognition. Anything less will likely result in poor performance and unsatisfactory user experiences. Also, many smart speakers and hubs are battery powered, so a VUI must be achieved within a tight power budget. Such an ambitious project can be daunting for a developer lacking experience with voice interfaces. Chip makers are responding by introducing a technique based
Written by DigiKey’s North American Editors
Figure 2: Representing words using phonemes
demands fewer microcontroller resources. Image source: Renesas
we get technical
4
5
How to implement a voice user interface on resource-constrained MCUs
DSpotter asks for each command word or phrase, which the tool breaks down into phonemes. The command set and supporting data for the VUI are then built into a binary file that the developer includes in the project along with the Cyberon library. The library and the binary file are used together on the MCU to support the recognition of the desired speech commands. The DSpotter tool creates ‘CommandSets’ that can be logically connected by the developer’s program to create a VUI with different levels. This allows for multi-level commands such as, ‘I’d like the lightbulb set to high, please’: the command words being ‘lightbulb’, followed by ‘set’, and ‘high’. Each command in a group has its own index, as does each command within a level (Figure 3). The DSpotter library processes incoming sound and searches for phonemes that match the commands in the database. When it finds a match, it returns with the index and group numbers. Such an arrangement allows the main application code to create a hierarchical switch statement to process the command words/ phrases as they come. The resulting library can be small enough to fit on an MCU with just 256 kilobytes (Kbytes) of flash memory and 32 Kbytes of SRAM. The CommandSet can grow if more memory is available.
Figure 3: The DSpotter tool allows the creation of ‘CommandSets’ that can be logically connected by the developer’s program to create a VUI with different levels. Image source: Renesas library of phonemes and phoneme combinations. This is an alternative approach to the traditional and computing-heavy training of algorithms to recognize specific words. To break down words into phonemes and then represent them as tokens, the developer can use the DSpotter Modeling Tool. DSpotter is embedded (non-Cloud) software that works as a local voice trigger and command-recognition solution with robust noise reduction. It consumes minimal resources and is highly accurate. Depending on the selected MCU, secure data transfer can also be implemented.
Renesas has shown a commercial VUI software package based on the phoneme principle as part of its ecosystem. The software, called Cyberon DSpotter, creates a VUI algorithm that is streamlined enough to run on Renesas RA series MCUs featuring Arm Cortex-M4 and M33 cores.
has just 14. If a VUI uses an English command set of 200 words, each word could be broken down into its associated phonemes from the set of 44. Within VUI software, each phoneme could then be identified by a numeric code (or a ‘token’), with the various tokens forming the language. Storing words as sounds requires extensive computational resources and takes up far more memory space than phonemes stored as tokens. Processing phoneme tokens (and thus command words) in an expected order further simplifies computation and
Developing with Cyberon DSpotter
Cyberon DSpotter is built on a
Figure 5: The R7FA4W1AD2CNG MCU provides ample resources to build a non- Cloud VUI for applications like a smart light switch . Image source: Renesas
to appreciate that there are limitations to the phoneme
One design suggestion is to add a visual indicator to the VUI (for example, an LED) to indicate when the processor assumes it is at the top level of the CommandSet, prompting the user to reissue the command in the logical sequence (Figure 4). Running a non-Cloud VUI with restricted resources The efficiency of Cyberon DSpotter allows it to run on Renesas’ RA2, RA4, and RA6 families of Arm Cortex-M MCUs. These are popular
makes it possible to run VUI software locally on a modest MCU (Figure 2). This means that the software efficiencies achieved by using phonemes allow the processing to run locally. Removing the need for Cloud processing means there is no requirement for continuous internet connectivity that introduces user privacy and data security concerns.
method for a VUI. The relatively limited resources of the MCU dictate that Cyberon DSpotter is speech recognition rather than voice recognition. This means the software cannot perform natural language processing. Hence, if the command words don’t follow a logical sequence (for example, ‘high’, ‘lightbulb’, ‘set’ instead of ‘lightbulb’, ‘set’, ‘high’), the system won’t recognize the command and will reset back to the top level.
Figure 4: The streamlined nature of Cyberon DSpotter requires that commands follow a logical sequence, or they won’t be recognized. Image source: Renesas
It is important for the developer
we get technical
6
7
How to implement a voice user interface on resource-constrained MCUs
across a wide range of consumer, industrial, and IoT applications. They are supported by easy-to-use design tools, making it relatively straightforward to build a simple VUI without extensive coding experience or in-house expertise. The choice of a particular RA family MCU primarily comes down to the complexity of commands and the Cyberon library’s size. A smart light switch, which requires a modest command set and limited computing power to operate effectively, could be based on the R7FA4W1AD2CNG from the RA4 family. This MCU has a battery- friendly 48-megahertz (MHz) Arm Cortex-M4 core supported by 512 Kbytes of flash memory and 96 Kbytes of SRAM. It features a segment LCD controller, a capacitive touch sensing unit, Bluetooth Low Energy (Bluetooth LE) wireless connectivity, USB 2.0 Full-Speed, a 14-bit analog-to- digital converter (ADC), a 12-bit digital-to-analog converter (DAC), plus security and safety features (Figure 5). A more extensive Cyberon DSpotter library and a more powerful core are needed for an application such as a smart speaker. A suitable candidate is the R7FA6M4AF3CFM. This MCU from the RA6 family
features the more powerful 200 MHz Arm Cortex-M33 core supported by 1 megabyte (Mbyte) of flash memory and 256 Kbytes of SRAM. It has a CAN bus, Ethernet, I²C, LIN bus, a capacitive touch sensing unit, and many other interfaces and peripherals. The RA4 and RA6 families are supported by evaluation boards, the RTK7EKA4W1S00000BJ and the RTK7EKA6M4S00001BE, respectively, to allow a developer to exercise the MCUs’ capabilities. Each evaluation board has the target MCU and an onboard debugger. Renesas also offers a VUI solution kit to accelerate development. The kit is similar to the evaluation boards in that it incorporates the target device and debuggers. The board also features several I/O interfaces and has four microphones: two analog and two digital. Access to the software needed for development with the VUI solution kit is available on Cyberon’s website. This includes complimentary Cyberon DSpotter Modeling Tool access and features an e2 studio project with a working voice CommandSet (e2 studio is an Eclipse-based integrated
development environment (IDE) for Renesas MCUs). The example CommandSet can be used as a template for developing custom voice command sequences. The system’s reactions can then be monitored using a terminal window. It generally takes about 15 minutes to create the VUI structure shown in Figure 4. More sophisticated application software design for the Cyberon package is supported by the company’s Renesas Flexible Software Package (FSP) for embedded system designs using the RA families. The FSP is based on an open software ecosystem and includes Azure RTOS or FreeRTOS, legacy code, and third- party ecosystems. It can run in several IDEs, including e2 studio.
Background Noise
SNR
Distance
Hit-rate
Alexa Requirements
(Clean)
none
1.5 m
100.00%
90%
(Clean)
none
3 m
100.00%
90%
10 dB
Babble
1.5 m
98.55%
80%
10 dB
Babble
3 m
98.84%
80%
10 dB
Music
1.5 m
98.26%
80%
10 dB
Music
3 m
98.55%
80%
10 dB
TV
1.5 m
98.84%
80%
10 dB
TV
3 m
98.55%
80%
5 dB
Babble
1.5 m
98.84%
80%
5 dB
Babble
3 m
96.24%
80%
5 dB
Music
1.5 m
98.84%
80%
5 dB
Music
3 m
97.08%
80%
5 dB
TV
1.5 m
93.37%
80%
How well does the VUI perform?
5 dB
TV
3 m
90.72%
80%
It is one thing for a VUI to perform well in a quiet laboratory, but quite another for it to work accurately with significant background noise. A typical operating environment for a smart speaker could include a TV or radio, conversation, other music sources, and the general hubbub of a household or a social gathering. Moreover, the VUI will have to contend with dialects and less- than-perfect diction. Despite these challenges, users expect almost flawless performance.
Table 1: Command success test results for a Cyberon-powered VUI with various sources of background noise. In all cases, the VUI outperformed the Amazon Alexa benchmark. Image source: Renesas
To improve performance in a difficult listening environment, Cyberon DSpotter software running on the Renesas RA family of MCUs includes noise immunity features that require minimal processor resources. To demonstrate its efficacy, tests were done with a Cyberon DSpotter VUI listening to commands while subject to various background noise sources at 1.5-
and 3-meter (m) distances, and with signal-to-noise ratios (SNRs) of 0, 5, and 10 decibels (dB). In all cases, the VUI outperformed the Amazon Alexa benchmark (Table 1). Conclusion VUIs are rapidly becoming the preferred consumer control
interface for smart products. A speech control approach using phonemes as the basis of commands and a strict command structure can dramatically reduce memory and computing requirements, allowing the technology to run locally on small, resource-constrained MCUs.
It is one thing for a VUI to perform well in a quiet laboratory, but quite another for it to work accurately with significant background noise.
we get technical
8
9
digitized, read into the FPGA, and then some digital signal processor (DSP) algorithms are applied to this signal. Last of all, the FPGA then makes decisions based upon the findings. Such an application will serve as the example throughout this article. Furthermore, Figure 1 illustrates a generic co-processor architecture, where the MCU and FPGA are connected through the MCU’s external memory interface. The FPGA is treated as if it were a piece of external static random-access memory (SRAM). Signals come back to the MCU from the FPGA and serve as hardware interrupt lines and status indicators. This allows the FPGA to indicate critical states to the MCU, such as communicating that an ADC conversion is ready, or a fault has occurred, or another noteworthy event has happened. The strengths of the co-processor approach are probably best seen within the deliverables of each of the above-mentioned milestones. Value is assessed by not only listing the accomplishments of a task or phase but also by assessing the enablement that these accomplishments allow. The answers to the following questions assist in assessing the overall value
Figure 1: Generic co-processor diagram (MCU + FPGA). Image source: CEPD
Microcontroller milestone, the System Management with the Microcontroller milestone, and the Product Deployment milestone. By the conclusion of this article, it will be demonstrated that a flexible hardware architecture can be better suited to modern embedded systems design than a more rigid approach. Furthermore, this approach can result in improvements to both project cost and time to market. Arguments, provided examples, and case studies will be used to defend this position. By observing the value of each milestone within the design flexibility that this architecture provides, it becomes clear that an adaptive hardware architecture is a powerful driver in pushing embedded systems design forward. Exploring the strengths of the co-processor architecture: design flexibility and high- performance processing A common application for FPGA designs is to interface directly with a high-speed analog-to-digital converter (ADC). The signal is
can be harrowing, and yet, they have been spoken and continue to be reinforced throughout the market. What is needed is a design approach, which allows for an evolutionary iterative process to be implemented, and just like with most embedded systems, it begins with the hardware architecture. The co-processor architecture, a hardware architecture known for combining the strengths of both microcontroller unit (MCU) and field programmable gate array (FPGA) technologies, can offer the embedded designer a process capable of meeting even the most demanding requirements, and yet it allows for the flexibility necessary to address both known and unknown challenges. By providing hardware capable of iteratively adapting, the designer can demonstrate progress, hit critical milestones, and take full advantage of the rapid prototyping process. Within this process are key project milestones, each with their own unique value to add to the development effort. Throughout this article, these will be referred to by the following terms: The Digital Signal Processing with the
Written By Noah Madinger, Colorado Electronic Product Design (CEPD) The co-processor architecture: an embedded system architecture for rapid prototyping
Introduction The embedded systems designer finds themselves at a juncture of design constraints, performance expectations, and schedule and budgetary concerns. Indeed, even the contradictions in modern project management buzzwords and phrases further underscore the precarious nature of this role: ‘fail fast’; ‘be agile’; ‘future- proof it’; and ‘be disruptive!’. The acrobatics involved in even trying to satisfy these expectations
Editor’s note: although well known for its digital processing performance and throughput, the co-processor architecture provides the embedded systems designer opportunities to implement project management strategies, which improve both development costs and time to market. This article, focused specifically upon the combination of a discrete microcontroller (MCU) and a discrete field programmable gate
array (FPGA), showcases how this architecture lends itself to an efficient and iterative design process. Leveraging researched sources, empirical findings, and case studies, the benefits of this architecture are explored, and exemplary applications are provided. Upon this article’s conclusion, the embedded systems designer will have a better understanding of when and how to implement this versatile hardware architecture.
of a milestone’s deliverables: ■ Can the progress of other
team members now more rapidly continue, as project dependencies and bottlenecks are removed?
we get technical
10
11
The co-processor architecture: an embedded system architecture for rapid prototyping
■ How do the accomplishments of the milestone enable further parallel execution paths?
Built upon the lesson learned from the MCU’s implementation, the designer carries this confidence forward into this next milestone. Tools, such as the aforementioned Vivado HLS from Xilinx, provide a functional translation from the executable C/C++ code to synthesizable HDL. Now, timing constraints, process parameters, and other user preferences must still be defined and implemented, however, the core functionality is persevered and translated to the FPGA fabric. For this milestone, the MCU’s role is that of a system manager. Status and control registers within the FPGA are monitored, updated, and reported on by the MCU. Furthermore, the MCU manages the user interface (UI). This UI could take the form of the web server accessed over an Ethernet or Wi-Fi connection, or it could be an industrial touchscreen interface giving access to users at the point
The digital signal processing with the microcontroller milestone The first development stage that this hardware architecture allows places the MCU front and center. All things being equal, MCU and executable software development is less resource and time-consuming than FPGA and hardware descriptive language (HDL) development. Thus, by initiating product development with the MCU as the primary processor, algorithms can be implemented, tested, and validated more rapidly. This allows algorithmic and logical bugs to be discovered early in the design process, and this also allows for substantial portions of the signal chain to be tested and validated. The FPGA’s role in this initial milestone is to serve as a high- speed data gathering interface. Its task is to reliably pipe data from the high-speed ADC, alert the MCU that data is available, and present this data on the MCU’s external memory interface. Although this role does not include implementing HDL-based DSP processes or other algorithms, it is nonetheless highly critical. The FPGA development performed at this phase lays the foundation for the product’s ultimate success both
Figure 4: Application program, host processor, and FPGA-based hardware - used in satellite communications example.
of use. The key takeaway from the MCU’s new, more refined role is this: by being relieved from the computationally intensive processing tasks, both the MCU and FPGA are now being leveraged in tasks for which they are well suited.
the product’s processes
2. Having been first developed and validated within the MCU, algorithmic risks have been mitigated, and these mitigations are translated over into synthesizable HDL. Tools, such as Vivado HLS, make this translation an easier process. Furthermore, FPGA-specific risks can be mitigated through integrated simulation tools, such as the Vivado design suite 3. Stakeholders are not exposed to significant risk by moving the processes over to the FPGA. On the contrary, they get to see and enjoy the benefits that the FPGA’s speed and parallelism provide. Measurable performance improvements are observed and focus can now be given to readying this design for manufacturing
Figure 2: Architecture – digital signal processing with the microcontroller. Image source: CEPD
within the product development efforts and upon release to the market. By focusing on just the low-level interface, adequate time can be dedicated to testing these essential operations. Only once the FPGA is reliably and confidently performing this interfacing role, can this milestone be concluded confidently. Key deliverables from this initial milestone include the following benefits:
3. The lessons learned
from implementing the algorithms in C/C++ will be directly transferable to HDL implementations – through the use of software-to-HDL tools, e.g., Xilinx HLS The system management with the microcontroller milestone The second development stage, which this co-processor approach offers, is defined by the moving of DSP processes and algorithm implementations from the MCU to the FPGA. The FPGA is still responsible for the high-speed ADC interface, however, by assuming these other roles, the speed and parallelism offered by the FPGA are fully utilized. Additionally, unlike the MCU, multiple instances of the DSP processes and algorithm channels can be implemented and run simultaneously.
Key deliverables form this milestone and include these benefits:
1. Fast, parallel execution of
DSP processes and algorithm implementations are being provided by the FPGA.The MCU provides a responsive and streamlined UI and manages
1. The full signal path – all
amplifications, attenuations, and conversions – will have been tested and validated
2. The project development time and effort will have been reduced by initially
implementing the algorithms in software (C/C++); this is of considerable value to management and other stakeholders, who must see the feasibility of this project before approving future design phases
Figure 3: Architecture – system management with the microcontroller. Image source: CEPD
we get technical
12
13
The co-processor architecture: an embedded system architecture for rapid prototyping
remotely. This could be as simple as changing logical conditions, or as complicated as updating a communications modulation scheme. The programmability offered by FPGA technologies and the co-processor architecture can accommodate the entirety of this range of capabilities, all while offering radiation-hardened component choices. The final key takeaway from this milestone is progressive cost reduction. Cost reductions, bill of materials (BOM) changes, and other optimizations can also occur at this milestone. During field deployments, it may be discovered that the product can operate just as well with a less expensive MCU, or less capable FPGA. Because of the co-processor, architecture designers are not stuck using components whose capabilities exceed their application’s needs. Furthermore, should a component become unavailable, the architecture allows for new components to be integrated into the design. This is not the case with a single-chip, system on a chip (SoC) architecture, or with a high-performance DSP or MCU that attempts to handle all of the product’s processing. The co-processor architecture is a good mix of capability and flexibility giving the designer more choices and freedoms both with the development phases and upon release to the market.
The product deployment milestone With the computationally intensive processing being addressed within the FPGA, and the MCU handling its system management and user interface roles, the product is ready for deployment. Now, this paper does not advocate for bypassing Alpha and Beta releases; however, the emphasis for this milestone are the capabilities that the co- processor architecture provides to product deployment. Both the MCU and FPGA are field updateable devices. Several advancements have been made to make FPGA updates just as accessible as software updates. Moreover, since the FPGA is within the addressable memory space of the MCU, the MCU can serve as the access point for the entire system: receiving both updates for itself as well as for the FPGA. Updates can be conditionally scheduled, distributed, and customized on a per end-user basis. Last of all, user and use-case logs can be maintained and associated with specific build implementations. From these data sets, performance can continue to be refined and enhanced even after the product is in the field. Perhaps the strengths of this total-system updatability are no more underscored than in space-based applications. Once a product is launched, maintenance and updates must be performed
Automotive infotainment example Entertainment systems within automobiles are distinguishing features for discerning consumers. Unlike a majority of automotive electronics, these devices are highly visible and are expected to provide exceptional response time and performance. However, designers are often squeezed between the current needs of the design and the flexibility, which future features will require. For this example, the implementation needs of signal processing and wireless communications will be used to highlight the strengths of the co- processor hardware architecture. One of the predominant automotive entertainment system architectures used was published by the Delphi Delco Electronics Systems corporation. This architecture employed an SH-4 MCU with a companion ASIC, Hitachi’s HD64404 Amanda peripheral. This
Supporting research and related case studies
Satellite communications example
In short, the value of a co- processor is to offload the primary processing unit so that tasks are executed upon hardware, in which accelerations and streamlining can be taken advantage of. The advantage of such a design choice is a net increase in computational speed and capabilities, and, as this article argues, a reduction in development cost and development time. Perhaps one of the most compelling realms for these benefits is in the area of space communications systems. In their publication, FPGA based hardware as coprocessor, G. Prasad and N. Vasantha detail how data processing within an FPGA blends the computational needs of satellite communications systems without the high non- recurring engineering (NRE) costs of application-specific integrated circuits (ASICs) or the application-specific limitations of a hard-architecture processor. Just as was described in the Digital Signal Processing with the Microcontroller Milestone, their design begins with the application processor performing a majority of the computationally intensive algorithms. From this starting point, they identify the key sections of software that consume a majority of the central processing unit (CPU) clock’s cycles and migrate
Figure 5: Infotainment FPGA co-processor architecture example 1.
these sections over to HDL implementation. The graphical representation is highly similar to what has been presented so far, however, they have chosen to represent the Application Program as its own independent block, as it can be either realized in the Host (Processor) or in the FPGA based Hardware. By utilizing a peripheral component interconnect (PCI) interface and the host processor’s direct memory access(DMA), peripheral performance is dramatically increased. This is mostly observed within the improvements for the Derandomization process. When this process was performed in the host processor’s software, there was clearly a bottleneck in the real-time response of the system. However, when moved to the FPGA, the following benefits were observed: ■ The Derandomization process executed in real-time without
causing bottlenecks ■ The host processor’s
computational overhead was significantly reduced, and it could now better perform a desired logging role ■ The total performance of the entire system was scaled up All of this was achieved without the costs associated with an ASIC, and while enjoying the flexibility of programmable logic [5]. Satellite communications present considerable challenges, and this approach can verifiably meet these
requirements, and continue to provide design flexibility.
Figure 6: Infotainment FPGA co-processor architecture example 2.
we get technical
14
15
The co-processor architecture: an embedded system architecture for rapid prototyping
the proven performance of the existing hardware can be coupled with flexibility and futureproofing. Even within existing systems, the co-processor architecture provides options to designers, which would otherwise not be available [6].
the co-processor architecture to the embedded systems designer but also showcased
Figure 8: Xilinx Vivado HLS design flow.
the performance-enhancing options available with modern FPGA tools. Enhancements, like the ones mentioned below, may not be available or may have less impact for other hardware architectures. The discrete cosine transform (DCT) was selected as a computationally intensive algorithm, and its progression from a C-based implementation to an HDL-based implementation was at the heart of these findings. DCT was chosen since this algorithm is used in digital signal processing for pattern recognition and filtering [8]. The empirical findings were based upon a laboratory exercise, which was completed by the author
Rapid prototyping advantages
need to be an expert in hardware or HDL to modify, route, or implement different soft-core processors or components within the FPGA. So long as the designer is aware of the interface and the formats of the data, they have full control over the signal paths and can refine the system’s performance. Empirical findings – the discrete cosine transform case study The empirical findings not only confirmed the flexibility availed by
At its heart, the rapid prototyping process strives to cover a substantial amount of product development area by executing tasks in parallel, identifying ‘bugs’ and design issues quickly, and validating data and signal paths, especially those within a project’s critical path. However, for this process to truly produce streamlined, efficient results there must be sufficient expertise in the project areas required. Traditionally, this means that there must be a hardware engineer, an embedded software or DSP engineer, and an HDL engineer. Now, there are plenty of interdisciplinary professionals, who may be able to satisfy multiple roles; however, there is still substantial project overhead involved in coordinating these efforts. In their paper, An FPGA based rapid prototyping platform for wavelet coprocessors, the authors promote the idea that using a co-processor architecture allows a single DSP engineer to fulfil all of these roles, efficiently and effectively. For this
study, the team began designing and simulating the desired DSP functionality within MATLAB’s Simulink tool. This served two primary functions, in that it, 1) verified the desired performance through simulation, and 2) served as a baseline to which future design choices could be compared and referenced. After simulation, critical functionalities were identified and divided into different cores – these are soft-core components and processors that can be synthesized within an FPGA. The most important step during this work was to define the interface among these cores and components and to compare the data-exchange performance against the desired, simulated performance. This design process closely aligned with Xilinx’s design flow for embedded systems and is summarized in Figure 7.
Figure 7: Implementation design flow.
architecture satisfied over 75% of the automotive market’s baseline entertainment functionality; however, it lacked the ability to address video processing applications and wireless communications. By including an FPGA within this existing architecture, further flexibility and capability can be added to this already-existing design approach. The Figure 5 architecture is suitable for both video processing and wireless communications management. By pushing the DSP functionalities to the FPGA, the Amanda processor can serve a system management role and is freed to implement a wireless
communications stack. As both the Amanda and FPGA have access to the external memory, data can
be rapidly exchanged among the system’s processors and components.
Latency
Interval
min
max
min max
The second infotainment in Figure 6 highlights the FPGA’s ability to address both the incoming high- speed analog data and the handling of the compression and encoding needed for video applications. In fact, all of this functionality can be pushed into the FPGA and through the use of parallel processing, these can all be addressed in real- time. By including an FPGA within an existing hardware architecture,
2935
2935
Default (solution 1) Pipeline inner loop (solution 2) Pipeline outer loop (solution 3)
2935
2935
1723
1723
1723
1723
843
843
843
843
Array partition (solution 4)
477
477
477
477
476
343
Dataflow (solution 5)
476
343
By dividing the system into synthesizable cores, the DSP engineer can focus upon the
Inline (solution 6)
463
463
98
98
Table 1: FPGA algorithm execution optimization findings (latency and interval).
most critical aspects of the signal processing chain. She/he does not
we get technical
16
17
The co-processor architecture: an embedded system architecture for rapid prototyping
Processing Slice for the UltraScale Architecture
row and column processes can now execute concurrently. The number of required clock cycles is kept to a minimum, even if this consumes more FPGA resources.
and coworkers, to obtain the Xilinx Alliance Partner certification for 2020-2021. The following tools and devices were used in this effort: ■ Vivado HLS v2019 ■ The device for assessment and simulation was the xczu7ev- ffvc1156-2-e Beginning with the C-based implementation, the DCT algorithm accepts two arrays of 16-bit numbers; array ‘a’ is the input array to the DCT, and array ‘b’ is the output array from the DCT. The data width (DW) is therefore defined as 16, and the number of elements within the arrays (N) is 1024/DW, or 64. Last of all, the size of the DCT matrix (DCT_SIZE) is set to 8, which means an 8 x 8 matrix is used.
acceleration, loop unrolling, and other techniques are readily available. Once the DCT code was created within the Vivado HLS tool as a project, the next step is to begin synthesizing the design for FPGA implementation. It is at this next step where some of the most impactful benefits from moving an algorithm’s execution from an MCU to an FPGA become more apparent – as a reference this step is equivalent to the System Management with the Microcontroller milestone discussed above. Modern FPGA tools allow for a suite of optimizations and enhancements that greatly enhance the performance of complex algorithms. Before analyzing the results, there are some important terms to keep in mind: ■ Latency – The number of clock cycles required to execute all iterations of the loop [10] ■ Interval – The number of clock cycles before the next iteration of a loop starts to process data [11] ■ BRAM – Block Random Access Memory ■ DSP48E – Digital Signal
Should parts become obsolete, or optimizations be required, the same architecture can allow for these changes. New MCUs and new FPGAs can be fitted into the design, all the while the interfaces can remain relatively untouched. Additionally, since both the MCU and FPGA are field updatable, user- specific changes and optimizations can be applied in the field and remotely. In closing, this architecture blends the development speed and availability of an MCU with the performance and expandability of an FPGA. With optimizations and performance enhancements available at every development step, the co-processor architecture can meet the needs of even the most challenging requirements – both for today’s designs and beyond.
Array partition This directive maps the contents of the loops to arrays and thus flattens all of the memory access to single elements within these arrays. By doing this, more RAM is consumed, but again, the execution time of this algorithm is cut in half. Dataflow This directive allows the designer to specify the target number of clock cycles between each of the input reads. This directive is only supported for top-level function. Only loops and functions exposed to this level will benefit from this directive. Inline The INLINE directive flattens all loops, both inner and outer. Both
■ FF – Flipflop ■ LUT – Look-up Table ■ URAM – Unified Random-Access Memory (can be composed of a single transistor)
Conclusion
The co-processor hardware architecture provides the embedded designer with a
Default
The default optimization setting comes from the unaltered result of translating the C-based algorithm to synthesizable HDL. No optimizations are enabled, and this can be used as a performance reference to better understand the other optimizations.
high-performance platform that maintains its design flexibility throughout development and past product release. By first validating algorithms in C or C++, processes, data and signal paths, and critical functionality can be verified in a relatively short amount of time. Then, by translating the processor- intensive algorithms into the co-processor FPGA, the designer can enjoy the benefits of hardware acceleration and a more modular design.
Pipeline inner loop
The PIPELINE directive instructs Vivado HLS to unroll the inner loops so that new data can start being processed while existing data is still in the pipeline. Thus, new data does not have to wait for the existing data to be complete before processing can begin.
Following the premise of this article, the C-based algorithm
implementation allows the designer to quickly develop and validate the algorithm’s functionality. Although it is an important consideration, this validation places functionality at a higher weighting than execution time. This weighting is allowed, since the ultimate implementation of this algorithm will be in an FPGA, where hardware
Table 2: FPGA algorithm execution optimization findings (resource utilization).
BRAM_18K DSP48E
FF
LUT
URAM
Pipeline outer loop
5
1
246
964
0
Default (solution)
By applying the PIPELINE directive to the outer loop, the outer loop’s operations are now pipelined. However, the inner loops’ operations now occur concurrently. Both the latency and interval time are cut in half through applying this directly to the outer loop.
5
1
223
1211
0
Pipeline inner loop (solution 2)
5
8
516
1356
0
Pipeline outer loop (solution 3)
The co-processor hardware architecture provides the embedded designer with a high- performance platform that maintains its design flexibility throughout development and past product release.
3
8
862
1879
0
Array partition (solution 4)
3
8
868
1654
0
Dataflow (solution 5)
3
16
1086
1462
0
Inline (solution 6)
we get technical
18
19
How single-board computers extend the reach of industrial automation Written by Jeff Shepard
Figure 1: The model IS.MDUINO.21+ from Industrial Shields has 13 inputs and 8 outputs. Image source: Industrial Shields
Embedded PLC for small machines Designers of small machines for labelling, forming, and sealing, carton packing, gluing, electric ovens, industrial washers and dryers, mixers, and so on can turn to the 170 x 90 x 50 millimeters (mm) Portenta machine control PLC. It has a DIN bar compatible housing and push-in terminals for fast connection and is rated for operation from -40°C to +85°C without external cooling (Figure 3). The main processor is the dual-core STM32H747 with a 480 MHz Cortex M7 and a 240 MHz Cortex M4. The board can support flat screen displays, touch panels, keyboards, joysticks, and mice for installer and operator interfaces. It can be programmed using the Arduino PLC IDE or other embedded development platforms. The Portenta Machine Control can support predictive maintenance and artificial intelligence (AI) software. Its embedded RTC supports synchronization of processes and enables real-time data collection and remote control of equipment.
maintenance algorithms. Secure over-the-air (OTA) firmware updates are supported by the onboard secure element and X.509 compliance. Opta PLCs are available in three variants differentiated by their communications capabilities. All three include USB-C. The models are: ■ Opta Lite, model AFX00003, that adds 10/100BASE-T Ethernet ■ Opta RS485, model AFX00001, that adds 10/100BASE-T Ethernet and half-duplex RS-485 ■ Opta Wi-Fi, model AFX00002, that adds 10/100BASE-T Ethernet, half-duplex RS-485 802.11 b/g/n Wi-Fi, and Bluetooth low energy (BLE) These micro PLCs have eight programmable analog/digital inputs and four normally-open relay outputs rated for 10 A (2.3 kW). The real-time clock (RTC) has a typical ten days of power retention at +25°C, and network time protocol (NTP) synchronization is available through the Ethernet port. They are DIN rail compatible to speed system integration (Figure 2).
The availability of single-board- computers (SBCs) like Arduino and Raspberry Pi, rated for use in industrial environments together with software development tools based on the International Electrotechnical Commission (IEC) 61131-3 standard, have opened new opportunities for machine and factory automation designers. Some of these new SBC-based solutions also open new possibilities for automating environmental monitoring, smart home and building installations, agricultural applications, and other non-industrial systems. Industrial SBCs are being used in machine controllers, industrial PCs (IPCs), Industrial Internet of Things (IIoT) gateways, micro programmable logic controllers (PLCs), soft PLCs, analog and digital input/output (I/O) modules, and more. These SBC-based devices are built on open hardware and open software platforms, sometimes including full root rights. Compliance with IEC 61131-3 means that the five standard automation programming languages are supported, including ladder diagram, structured text,
function block diagram, sequential function diagram, and instruction list. Being built using SBCs means developers can also turn to languages like Java, Python, C, or C++, providing greater flexibility than traditional industrial control hardware. Some support data security from the hardware to the Cloud or a higher-level network like an enterprise resource planning (ERP) system with an onboard secure element and International Telecommunications Union (ITU) X.509 Standard public key compliance. This article presents examples of SBC-based solutions available to machine and automation designers from Arduino, Industrial Shields, and KUNBUS for various applications, including small- to medium-scale automation, embedded control in small machines, and large factory automation installations. The article closes with a look at how PROFINET and deterministic networking can be implemented on SBC PLCs.
Arduino-based PLCs is the availability of the Arduino PLC integrated development environment (IDE) for writing
■ 2 Interrupts (5 VDC to 24 VDC)
■ 6 software configurable as analog (0 VDC to 10 VDC, 10 bit) or digital (5 VDC to 24 VDC) ■ 8 Outputs: ■ 5 opto isolated digital (5 VDC to 24 VDC) ■ 3 software configurable as analog (0 VDC to 10 VDC, 8 bit), digital (5 VDC to 24 VDC), or pulse width modulated (5 VDC to 24 VDC) ■ 256 KB memory ■ Ethernet, RS-232, RS-485 and USB communications ■ Expandable with up to 127 modules
control software. The Arduino PLC IDE enables users to choose any of the five programming languages defined by IEC 61131-3 and quickly code PLC applications or port existing ones. It also includes ready-to-use Arduino sketches (programs), tutorials, and libraries. Industrial Shields’ Arduino-based PLCs can be programmed using the Arduino IDE or directly using C. These PLCs include open-source tools and can be programmed with multiple software platforms. They can be programmed through the USB or Ethernet ports for remote connections. Users can continuously monitor the status of all the variables, inputs, and outputs. The model IS.MDUINO.21+ from Industrial Shields is rated for operation from 0°C to +60°C, and its ATmega processor achieves a throughput of 16 MIPS at 16 MHz (Figure 1). Features include: ■ 13 Inputs: ■ 7 opto isolated digital (5 VDC to 24 VDC)
Micro PLCs The Arduino Opta is a micro PLC designed to support IIoT
applications. Programmable with the Arduino PLC IDE, it supports Arduino sketches and standard PLC languages. The main processor is the dual-core STM32H747 with a 480 MHz Cortex M7, a 240 MHz Cortex M4, and 1 MB program memory that supports real-time control, monitoring, and implementation of predictive
Figure 2: Opta Lite Arduino micro PLC showing the four 10 A relay outputs on the left front of the unit. Image source: Arduino
Arduino PLCs
One of the benefits of most
we get technical
20
21
How single-board computers extend the reach of industrial automation
universal asynchronous receivers/transmitters (UARTs), dual micro HDMI ports that support 4K output, and more. Industrial Shields’ Raspberry Pi Ethernet PLCs use the BCM2711B0, operate with 12 VDC to 24 VDC input voltages, and draw up to
topologies like ad-hoc, master- slave, and client-server. The original program has an intuitive application programming interface (API) for Arduino environments. Industrial Shields recently adapted SimpleComm for the Linux environment found on Raspberry Pi PLCs.
available to run on most Arduino and Raspberry Pi PLCs. Industrial automation networks need high-speed and deterministic communication. PROFINET focuses on deterministic performance that delivers messages exactly when needed and expected. That means delivering each message with the appropriate speed based on the task being performed. Not all tasks are equally time sensitive. PROFINET can deliver messages on various protocols, including: ■ PROFINET Real-Time (RT) ■ PROFINET Isochronous Real- Time (IRT) ■ Time Sensitive Networking (TSN) ■ TCP/IP (or UDP/IP) Conclusion A wide range of SBC-based PLCs and industrial networking devices based on Arduino and Raspberry Pi technologies are available. They use open-source software and, in some cases, open-source hardware. Arduino PLCs are available as standard-sized units for small networks, micro PLCs for space-sensitive installations, and machine controllers for embedded applications. Quad-core Raspberry Pi-based PLCs can support more complex industrial networking applications. Raspberry Pi-based IPCs and IIoT gateways that support high levels of flexibility in network design and deployment are available.
Figure 5: Examples of RevPi Core SE IPC (left) and RevPi Connect IIoT Gateway (right). Image source: KUNBUS
Figure 3: The Portenta Machine Control board is designed for embedded applications in a wide range of machines. Image source: Arduino
supervisory control and data acquisition (SCADA) software for controlling, monitoring, and analyzing industrial devices and processes. The availability of full root access speeds up the implementation of custom programs. The RevPi Core S and SE are built on an open hardware and open software platform that conforms to the IEC 61131 standard. RevPi Core S units are compatible with all KUNBUS expansion modules, including fieldbus gateways. RevPi Core SE units are compatible with KUNBUS I/O modules but don’t support the fieldbus gateways. RevPi Core S/SE IPCs have USB, Micro-USB, Ethernet, and HDMI connections. They feature a 1.5 GHz quad-core processor with 1 GB RAM, and models are available with 8, 16, and 32 GB of storage. For example, model PR100360, RevPi Core S has 16 GB of memory. To support IIoT connectivity, the RevPi Connect S and SE Gateways are available with up to 32 GB of memory and include two RJ45 10/100 Ethernet sockets, two USB ports, a 4-pin RS-485 interface, plus micro-HDMI, and micro-USB
sockets. The two Ethernet sockets support simultaneous connectivity with automation and information technology (IT) networks. As an open-source software platform, applications can be programmed using Node-RED, Python, and C. RevPi Connect can be upgraded with PROFINET, EtherNet/IP, EtherCAT, Modbus TCP, and Modbus RTU functionality without the use of expansion modules. Examples of RevPi Connect units include: ■ PR100363, RevPi Connect S with 16 GB memory ■ PR100197, RevPi digital I/O expansion module ■ PR100250, RevPi analog expansion module PROFINET and SBC PLCs SBC PLCs can be sophisticated devices capable of supporting advanced networking protocols. Process field network (PROFINET) is an open standard for industrial networking devices like PLCs, drives, robots, diagnostic tools, etc. It runs over industrial Ethernet and is optimized for collecting data and controlling industrial equipment with real-time communications. It’s
IPC and IIoT gateway solution
It can connect to various external sensors and actuators with isolated and programmable digital and analog I/O connections, three configuration temperature channels, and an I2C connector. Resettable fuses protect all I/Os. Network connectivity is supported by USB, Ethernet, Wi-Fi, BLE and RS-485. Raspberry Pi for factory automation More complex automation tasks can benefit from the processing power of Raspberry Pi 4-based PLCs using the Broadcom BCM2711B0 processor. Fabricated on a 28 nanometer (nm) process, the BCM2711B0 uses the Cortex-A72 architecture. It has four cores with a clock speed of 1.5 GHz and 4 GB RAM. It integrates numerous peripherals, including timers, interrupt controller, general purpose I/O (GPIO), USB, PCM/ I2S digital audio interface, direct memory access (DMA) controller, I2C masters, serial peripheral interface (SPI) masters, PWM,
1.5 A of current. They include the Linux operating system and have dual Ethernet ports, dual RS-485 ports, Wi-Fi, BLE, and CAN bus options, making them capable of connecting with many devices using multiple protocols and communications ports. They have been optimized for applications that benefit from real-time control and are available with 2, 4, and 8 GB of RAM. Examples of Industrial Shields’ Raspberry Pi PLCs include: ■ 012003000200, with 4 GB RAM and 21 I/Os a (Figure 4) ■ 012003001100, with 4 GB RAM and 54 I/Os ■ 016003000200, with 4 GB RAM,
When greater flexibility is needed, designers can turn to KUNBUS’ RevPi Core S and SE IPCs and the RevPi Connect S and SE IIoT gateway, all based on Raspberry Pi and designed for DIN rail mounting (Figure 5). In addition to providing circuit diagrams, KUNBUS uses an open-source adaptation of the Raspberry Pi operating system (OS) with a real-time operation patch. The Raspberry Pi OS offers robust interoperability with a wide range of software applications developed for Raspberry Pi. KUNBUS works with software vendors to support
21 I/Os, and general packet radio service (GPRS) cellular connectivity
Bridging Arduino and Raspberry Pi in PLCs with SimpleComm The SimpleComm C++ library lets designers send data using RS-485, RS-482, Ethernet, and other protocols. It can be adapted to different communications
Figure 4: Industrial Shields’ Raspberry Pi Ethernet PLC with 4 GB RAM and 21 I/Os. Image source: Industrial Shields
we get technical
22
23
Page 1 Page 2-3 Page 4-5 Page 6-7 Page 8-9 Page 10-11 Page 12-13 Page 14-15 Page 16-17 Page 18-19 Page 20-21 Page 22-23 Page 24-25 Page 26-27 Page 28-29 Page 30-31 Page 32-33 Page 34-35 Page 36-37 Page 38-39 Page 40-41 Page 42-43 Page 44-45 Page 46-47 Page 48-49 Page 50-51 Page 52-53 Page 54-55 Page 56-57 Page 58-59 Page 60Powered by FlippingBook