The Birth of Real-Time Edge Intelligence: When DSP and AI Met the Arm Processor

Over the past 40 years, I’ve worked with a wide range of microcontrollers and DSPs across industries—from biomedical systems and industrial control to radar and audio processing. What follows is not theory—it’s a reflection based on firsthand experience with many of the chips and platforms that have shaped the embedded world as we know it.

Before the Age of General-Purpose DSPs

In the early 1980s, Digital Signal Processing (DSP) was often too demanding for microprocessors, which lacked the speed and specialised instruction sets. To address this, manufacturers developed dedicated hardware accelerators—custom ICs built to perform tasks such as Fast Fourier Transforms (FFT), digital filtering, and modulation. Companies such as TRW LSI Products, Harris Semiconductor and even Intel offered function-specific chips that could be dropped into hardware designs to offload processing from the main CPU. These chips laid the groundwork for programmable DSPs, which would later unify these capabilities into a single, software-controlled processor—offering far greater flexibility and reusability.

The Three Titans of Classic DSP: TI, Analog Devices and Motorola

Back in the golden age of classic DSPs, three companies stood out: Texas Instruments, Analog Devices, and Motorola. Each brought its own innovations, architectures, and ecosystems that defined digital signal processing for decades.

Texas Instruments (TI) took the lead with its TMS320 family. Initially introduced in 1983, the fixed-point TMS32010 was one of the first general-purpose DSPs on the market. TI’s later C2000 series brought DSP-like performance to microcontrollers—making it a popular choice for motor control and industrial automation. But it was the TMS320C6000 series, launched in the late 1990s, that truly set a new benchmark. These VLIW (Very Long Instruction Word) DSPs—such as the C62x (fixed-point) and C67x (floating-point)—enabled complex signal analysis, control, and real-time processing with high instruction throughput.Later multicore versions like the C6474 scaled this up with three high-performance DSP cores on a single chip. Some designs even scaled to six cores in custom implementations. These chips made it possible to perform real-time control, sensor fusion, and high-throughput signal analysis in a way that felt almost futuristic —achieving over 28,000 MIPS!

TI also addressed ultra-low-power use cases with the C55x series—a fixed-point DSP line optimised for battery-powered audio and telecommunication applications. Whether for performance or power efficiency, TI had an answer across the spectrum.

Analog Devices (ADI) also played a central role in shaping DSP. Their SHARC and TigerSHARC families offered high-performance floating-point computation, making them a favourite in professional audio, instrumentation, and aerospace applications. Equally important was the Blackfin family—a fixed-point, 16/32-bit hybrid architecture introduced in the early 2000s. Blackfin was optimised for embedded multimedia and signal processing tasks, combining DSP capability with control-oriented features like MMUs, timers, and flexible I/O. ADI offered excellent tools, tight integration with analog components, and very strong real-time performance for demanding applications.

Motorola, meanwhile, contributed both microcontroller and DSP innovations. The legendary 68000, launched in 1979, wasn’t strictly a DSP, but it offered 32-bit general-purpose performance that was far ahead of its time. It was widely adopted in embedded control systems and even some signal processing applications—powering everything from defence platforms to the Commodore Amiga. Motorola also developed a dedicated line of DSPs in the 56000 family, which gained a following in telecommunications and audio. Eventually, Motorola spun off its semiconductor business as Freescale, and later moved to PowerPC architectures. Still, its legacy in early DSP computing remains significant.

Collectively, these three vendors established the blueprint for programmable signal processing—offering both fixed-point and floating-point variants, with increasing performance, integration, and software support over time. Their contributions laid the groundwork for today’s hybrid processors, SoCs, and enhanced microcontrollers.

Enter Microchip’s dsPIC: A Different Kind of DSP

When Microchip launched the dsPIC series in 2001, it was a radical departure from the classic DSP playbook. Instead of focusing on high-performance signal processing, Microchip blended DSP-like instructions into a 16-bit microcontroller core—creating a hybrid that prioritised real-time control, affordability, and embedded peripherals all-in-one. This unorthodox approach defied convention, yet proved surprisingly effective in areas such as motor control and power electronics.

In 2007, Microchip introduced the PIC32, a 32-bit fixed-point MIPS architecture intended to compete with Arm-based microcontrollers. Floating-point support only arrived in PIC32MZ devices in 2013, which still retained the MIPS lineage. By then, the industry had largely moved on. After acquiring Atmel in 2016, Microchip began transitioning toward Arm-based designs—but found themselves playing catch-up. STMicroelectronics, NXP, and others had already embraced Arm more than a decade earlier.

Hitachi and the SH Architecture

Another important player in the 1990s was Hitachi, whose SuperH (SH) family of microcontrollers gained traction in both consumer electronics and automotive systems. The SH architecture offered a compact 32-bit RISC instruction set and DSP-like instructions, making it well-suited for signal processing tasks without the complexity of a full DSP.

The SH-2, in particular, was developed for motor control and stood out for its high performance and nicely designed 32-bit architecture. It was backed by a professional-grade toolchain—more expensive than Microchip’s offerings, and far more capable. While Microchip’s later dsPIC family also targeted fixed-point DSP applications, it was positioned more toward the hobbyist and cost-sensitive market. In contrast, the SH-2 catered to industrial and automotive use cases that demanded greater precision, performance, and software maturity.

The SH line powered everything from set-top boxes and printers to game consoles, with the SH-2 used in the Sega Saturn and the more powerful SH-4 driving the Sega Dreamcast. After merging with Mitsubishi’s semiconductor division to form Renesas in 2003, the SH series continued for a while but was gradually replaced by newer architectures such as RX and eventually Arm-based cores. Nonetheless, the SH family remains a fascinating example of early DSP-capable microcontrollers.

How Arm Processors Changed the Game

The real turning point for Arm came in 1995, when they secured a major contract with Nokia. At the time, Nokia was searching for a low-power processor for its mobile phones—something that could offer signal processing without the power drain of conventional DSPs.

Arm responded with the Thumb instruction set—a compact 16-bit format that dramatically reduced code size while preserving much of the performance of 32-bit Arm instructions. Later, Thumb-2 extended this approach with mixed 16/32-bit support, enabling DSP-like functionality within a power-efficient and compact silicon footprint. It was a game-changer.

However, what truly propelled Arm forward was its strategy and licensing model. Rather than manufacturing chips themselves, Arm licensed its processor IP to silicon vendors—forming deep partnerships with companies such as STMicroelectronics, NXP, Texas Instruments, and Analog Devices. These vendors integrated Arm cores into their own SoCs, often alongside analog front ends, accelerators, or even DSP blocks. The result was a wave of highly optimised, application-specific devices built atop a shared architecture.

A key milestone came with Broadcom’s adoption of the Cortex-A family, which powered the first Raspberry Pi. The Pi’s success brought Arm processors into education, prototyping, and hobbyist markets—seeding a new generation of developers trained on Arm platforms.

Combined with a robust ecosystem of compilers, development tools, and middleware, Arm’s architectural dominance spread rapidly across consumer, industrial, and IoT domains.

The Emergence of the Enhanced Microcontroller

Today’s microcontrollers are far more than just control engines. Many now include SIMD instructions, DSP acceleration, advanced timers, cryptographic modules, and hardware-based security. This gives rise to what might be called the enhanced microcontroller—a class of devices that blur the line between DSPs and MCUs.

Today’s microcontrollers are no longer just simple control devices. Many now include SIMD instructions, DSP acceleration, and hardware-based security. This gives rise to what we can call the enhanced microcontroller—a hybrid class that combines:

General-purpose control and peripheral integration
DSP capabilities for signal conditioning and real-time analysis
Hardware security features such as TrustZone
Low power consumption for battery-powered and IoT systems
Affordable pricing for mass deployment

STMicroelectronics’ STM32 family—based on Cortex-M4 and M7 cores—is a textbook example. These microcontrollers can handle filtering, FFTs, and real-time signal analysis with ease, all while supporting familiar C-based toolchains and low-power sleep modes. They may not match a high-end floating-point DSP in every respect, but they strike an ideal balance for most embedded applications.

Arm Helium: Specialised Edge Hardware Acceleration

Arm Helium—also known as the Armv8.1-M Scalable Vector Extension (MVE)—takes this trend a step further. Designed for edge AI, Helium allows microcontrollers to perform complex filtering, sensor fusion, and even neural inference tasks with impressive efficiency.

The Cortex-M52 captures the essence of this shift—bringing DSP, general-purpose control, security, and low-power performance together into a single core. It introduces Arm TrustZone for embedded security, supports Helium acceleration, and enables localised processing of tasks that once required external compute or specialised DSPs.

Many new Helium-based MCUs also leverage TSMC’s 22nm ultra-low-power process, delivering up to 50% power savings over 40nm chips. This makes edge intelligence viable even in battery- or solar-powered deployments, with no compromise on performance.

It’s Not Just the Hardware—It’s the Ecosystem

Arm’s success is also rooted in the richness of its ecosystem. Developers benefit from mature toolchains, CMSIS-DSP libraries, trusted third-party support, and a wide community of contributors.

This infrastructure allows engineers to focus on solving domain-specific problems rather than wrestling with the underlying hardware. Thanks to tools from ASN, Qeexo, Mathworks and Edge Impulse even sophisticated algorithms—such as biomedical filters, IoT sensor cleaning filters, and predictive maintenance monitors—can now be designed, validated, and deployed as efficient C code within minutes, without requiring deep DSP expertise.

Arm’s ecosystem lowers the entry barrier while raising the ceiling, enabling individual developers and small teams to compete with traditional DSP engineering departments. It’s this accessibility and scalability that makes the platform so compelling for modern embedded development.

Enter the Market Disruptor: Espressif Semiconductor

While much of the evolution in embedded DSP has been driven by microcontrollers based on Arm-Cortex cores from vendors such as STMicroelectronics, Texas Instruments, Analog Devices and NXP, a disruptive force entered the scene in 2016 with the launch of Espressif Semiconductor’s ESP32-WROOM-32.

Based on dual-core Xtensa® 32-bit LX6 processors, the original ESP32 combines integrated Wi-Fi and Bluetooth with a hardware single-precision floating-point unit (FPU) and DSP-style instructions such as multiply-accumulate and saturation arithmetic. Despite lacking hardware support for division and square root, its real-world floating-point performance is often comparable to an Arm Cortex-M4F, thanks to its 240 MHz clock and efficient memory architecture.

Originally aimed at wireless control applications, the ESP32 quickly gained traction for low-cost edge processing tasks—including audio filtering, FFT analysis, and real-time control. Espressif’s open SDK and active global community positioned it as the go-to resource for hobbyists, start-ups, and even commercial IoT products.

While it lacks the advanced DSP acceleration of Arm Helium or the precision of high-end floating-point DSPs, the ESP32’s exceptional cost-to-performance ratio has democratised edge intelligence. It has showed the world that embedded DSP doesn’t have to be expensive or exclusive.

Where Do We Stand Now?

DSP has not disappeared—it has evolved. What was once the domain of dedicated chips has become a fundamental capability, embedded across a wide range of computing platforms. From soft DSP cores in FPGAs to integrated signal processing units in SoCs, and now to scalable vector extensions within microcontrollers, the function of DSP is alive and well—although the form has changed.

Specialist chips, such as those from Texas Instruments or Analog Devices, have not vanished. Many are now integrated into SoCs—handling radar, video, and high-performance industrial tasks in highly integrated systems. The IWR6834 mmwave radar SoC from TI is a perfect example of this convergence: a C674x floating point DSP, two Arm Cortex-R4 processors, communication channels, memory, RF frontend, and antennas all in one compact, high-performance chip. Newer flavours of the SoC, include an AI engine for micro-doppler pattern classification. All of which can be inferenced at the edge in real-time!

However, for most embedded applications, enhanced microcontrollers based on Arm processors now deliver the best balance of performance, power, and price. With integrated support for signal processing, connectivity, security, and energy efficiency—all within a mature ecosystem—they have become the logical choice for the next generation of intelligent edge devices.

Across the Cortex-M family, options like the M4 and M7 provide a strong foundation for signal processing in real-time control applications. The Helium-enabled M55, M85, and most recently the M52, take this further—offering vectorised DSP acceleration, better energy efficiency, and increasingly robust security features.

The Cortex-M52 in particular captures the essence of this evolution. It unifies three essential capabilities for Edge AI into one compact, low-power device:

DSP/AI functionality for signal processing algorithms, such as filtering and feature extraction, and running ML models
Low power consumption to enable battery or energy-harvesting applications
Hardware-based security, including Arm TrustZone, to safeguard code and data

This convergence enables true Real-Time Edge Intelligence (RTEI)—where signals are captured, interpreted, and acted upon locally, without relying on cloud infrastructure or external accelerators. Tasks such as biomedical filtering, sensor fusion, anomaly detection, and embedded inference can now be performed directly at the edge, at a fraction of the power and cost of what was previous required.

And with newer devices manufactured on advanced low-power nodes—such as TSMC’s 22nm process—power consumption is reduced by up to 50% compared to older 40nm designs. This makes the deployment of smart, responsive, and secure edge systems more sustainable and scalable than ever before.

And today, over 90% of all microcontrollers use an Arm core, which is a testament to Arm’s rich ecosystem and proven technologies.

This is not just a shift in hardware—it’s a redefinition of where and how computing happens in the embedded world of 2025 and beyond. It happens in real time, at the edge.

Author

Dr. Sanjeev Sarpal

Sanjeev is a RTEI (Real-Time Edge Intelligence) visionary and expert in signals and systems with a track record of successfully developing over 26 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT/RTEI solutions and strategies for I5.0, telemedicine, smart healthcare, smart grids and smart buildings.
View all posts

July 7, 2025/0 Comments/by Dr. Sanjeev Sarpal

How AI Hijacked the Word ‘Algorithms’

Algorithms Dr. Sanjeev Sarpal

Ask someone what an algorithm is today, and you’re likely to hear: “That’s what powers ChatGPT,” or “That’s what TikTok uses to recommend videos.” In the public imagination, the word algorithm has been almost completely consumed by one thing: AI.

But long before neural networks and machine learning took over the conversation, algorithms had a different meaning—a broader, more fundamental one. They were the logical, step-by-step procedures/mathematical recipes that powered everything from heart monitors, machine automation to advanced auto-pilot systems. These were built using human reasoning and established signals and systems theory based on hard science. Yet today, it would seem that they are invisible.

Algorithms That Built the Real World

Not all algorithms are probabilistic models trained on massive datasets. Some are mathematical systems designed to perform deterministic, repeatable, and real-time operations:

Digital filters – Based on convolution and implemented via difference equations, these are the workhorses of DSP.
Z-transforms – The analytical backbone of signals and systems, essential for understanding and designing discrete-time systems.
Fourier analysis – Fundamental for noise cancellation and predictive maintenance applications.
Control algorithms – From basic PID controllers to Kalman filters used in automotive, aerospace and robotics.

These are not just theoretical methods. They run our phones, cars, aircraft, medical devices, and more. They’re power-efficient, predictable, and interpretable. They’re deterministic algorithms.

What AI Can’t Replace

AI excels at tasks that involve ambiguity, classification, and massive amounts of data. But the world isn’t just made of pictures, text, and social media trends.

Many practical mission-critical applications, such as factory automation robotic systems, automotive cruise control systems, and power plant control systems, all require deterministic real-time fail-safe operation. This can only be achieved with DSP algorithms and Lockstep processor technology.

For these systems, we don’t need probabilistic guesses. We need determinism in the sense:

Understanding of how the algorithm works, and came to the result.
Stability guarantees
Known latencies
Precise resource usage
Fail-safe operation (Lockstep TMR technology)

That’s not AI. That’s signals and systems and advanced fail-safe processor technology.

Lockstep Processor Technology for Mission-Critical Systems

Lockstep processor technology ensures redundancy and fault tolerance in real-time applications. Processors such as the Arm Cortex-R4 and Cortex-R5 feature lockstep functionality, where two identical processors run the same instructions simultaneously and compare results.

This approach is especially effective for detecting hardware faults arising from component ageing, environmental factors or EMI. However, since both processors execute the same code, software bugs will not be detected. For true fault tolerance, Triple Modular Redundancy (TMR) can be used, where three processors vote to determine the correct result.

While AI systems operate probabilistically and often lack transparency in failure modes, Lockstep-based designs ensure deterministic execution, immediate fault detection, and predictable safety behaviour. When safety is paramount, Lockstep wins hands down.

ASIL Compliance and Automotive Safety

Lockstep is vital for Automotive Safety Integrity Level (ASIL) compliance. In systems like adaptive cruise control, a processor mismatch will safely disengage the system rather than risk unsafe behaviour. For the most critical applications, such as flight control or nuclear plant monitoring, TMR is used to maintain operation despite single-point failures.

The Disappearance of Craftsmanship

AI didn’t rise because it was superior. It rose because it was easier.

In a world where fewer people understand physics, system modelling, or real-time signal design, it’s far more convenient to train a black box on loads of examples, rather than to derive a model from first principles.

Most younger engineers today struggle with modelling a dynamic system. They don’t really understand the physics of the system, and how to model the dynamics into usable transfer functions or block diagrams needed for implementation. These are further complicated by a lack of knowledge of analog electronics and the effects of noise on data.

So we give them data. And we give them frameworks to grind through that data until a result falls out. And when these AI-driven systems fail, there is no coefficient to tweak. No transfer function to inspect. Only a tangled web of parameters, and the shrug of a retrain.

It would seem that the artistry of algorithm design—once rooted in insight, mathematics and physics—is being replaced by guesswork.

A Place for Both Technologies

AI isn’t the villain! In fact, in many cases, it’s a breakthrough.

There are real-world problems — ones that are too complex or nonlinear to be modelled easily — where AI offers a practical solution. For example, face recognition, gesture detection, and certain chemical substance classification are all examples where traditional signal processing struggles. They are just some of the examples where AI simplifies the unsimplifiable.

But we must remember: just because we can’t model something doesn’t mean we should stop trying. AI should be seen as one of the many tools that can be employed, not a substitute for understanding.

Generative AI, like ChatGPT, is proving invaluable in tasks such as writing and refactoring code — accelerating prototyping and helping engineers quickly develop new ideas. While this is a remarkable technical achievement, it must be viewed in context, as the AI lacks true understanding, common sense, and awareness of the real-world constraints surrounding the problem it’s addressing.

Real-Time Edge Intelligence — Built on Science

The future doesn’t belong to AI alone. It belongs to hybrid systems that combine the best of both worlds.

Real-Time Edge Intelligence (RTEI) is the next frontier: using DSP algorithms grounded in physics and mathematics to extract features of interest, followed by ML models that classify or make higher-level inferences on high-quality feature data.

This layered approach offers:

Interpretability
Efficiency
Robustness
Scientific traceability

In short, ML models work better when they are fed by high-quality features derived from DSP algorithms based on scientific principles.

How AI Hijacked the Conversation

In part, this is our own fault. As an industry, we let algorithm become synonymous with black box AI. We stopped talking about the elegance of well-designed digital filters for cleaning sensor data.

Even the educational pipeline reflects this shift:

Students focus more on TensorFlow rather than studying system modelling and applied mathematics.
Conferences highlight AI papers, calling them innovation, while ignoring solutions based on DSP innovation.
Changes in educational curricula and investor attitudes increasingly regard AI as the only form of innovation — ignoring traditional DSP altogether.

It would seem that the hype around AI has buried a lot of true innovation using traditional science and mathematics, which needs to change if we’re truly to reap the strength of RTEI for solving the massive societal challenges facing us in the years to come.

Author

Dr. Sanjeev Sarpal

Sanjeev is a RTEI (Real-Time Edge Intelligence) visionary and expert in signals and systems with a track record of successfully developing over 26 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT/RTEI solutions and strategies for I5.0, telemedicine, smart healthcare, smart grids and smart buildings.
View all posts

June 7, 2025/0 Comments/by Dr. Sanjeev Sarpal

Are We Using Algorithms Wisely?

Algorithms Dr. Sanjeev Sarpal

In our age of AI, IoT, and real-time signal processing, it’s tempting to believe that algorithmic thinking is a modern invention. But over 1,100 years ago, in the heart of the Islamic Golden Age, two extraordinary thinkers were already shaping what we now call systems engineering: Muḥammad ibn Mūsā al-Khwārizmī and Yaḥyā ibn ʻAdī al-Kindī.

Their brilliance wasn’t just in what they knew—it was in how they thought.

The Power of Critical Thinking

Al-Khwārizmī and Al-Kindī lived in a time when knowledge was unified. Mathematics, philosophy, astronomy, and medicine were seen as part of a single, coherent worldview. This intellectual breadth allowed them to see connections others missed.

In some ways, generative AI attempts to mimic this cross-disciplinary reach—drawing associations across vast domains with billions of model parameters. But unlike Al-Kindī and Al-Khwārizmī, it operates without commonsense, without understanding, and without true context. This alone makes AI fundamentally inferior to the critical thinking of these scholars. Their minds weren’t just expansive—they were disciplined, purposeful, and deeply structured.

They also had something rare today: time. Freedom from the pressure to publish or the noise of constant alerts, they dedicated themselves to real understanding. Without emails, 24hour news feeds, or KPIs pulling at their attention, they achieved the kind of clarity that produced lasting contributions to mathematics, science, and cryptography—many of which still shape our world today.

Just as the Latinized name of Al-Khwārizmī gave rise to the word algorithm, the idea behind algorithms—structured procedures for solving problems—is far older and more global. Across different civilizations, these ideas took root in unique yet strikingly similar ways.

In classical Tamil, the word சூத்திரம் (sūthiram) has long referred to concise formulas or rules. Found in ancient grammar, mathematics, and philosophy, a suthiram was a compact, logical step—an algorithm in spirit. Tamil arithmetic manuscripts used them to express procedures for multiplication and permutations, echoing the same structured problem-solving seen in other cultures.

Even today, Tamil retains this legacy. In schools and textbooks, சூத்திரம் appears alongside the transliterated அல்காரிதம்—a reminder that algorithmic thinking has deep, indigenous roots. The algorithm is not a modern invention. It’s a universal pattern—and every culture has its own word for the thread.

This global legacy reminds us that algorithms were once tools of understanding, not just instruments of automation. If we are to use algorithms wisely today, we must restore that spirit: clear purpose, cultural context, and conscious design.

And Yet, Here We Are

Today, we automate decisions, and design systems that affect real lives. The legacy of Al-Khwārizmī and Al-Kindī challenges us to ask:

Are our algorithms truly serving people—or just chasing efficiency?

Nowhere is this more urgent than in AI. As we build increasingly powerful models, the temptation is to optimize for performance at all costs. But Al-Khwārizmī reminds us that an algorithm must be understandable. And Al-Kindī reminds us that it must serve human reason and values.

This is precisely why Real-Time Edge Intelligence (RTEI) is so important. RTEI combines human reasoning, grounded in deterministic DSP algorithms, with digital reasoning, driven by machine learning.

But there is a crucial difference: digital reasoning can hallucinate. It can invent patterns and correlations—what we sometimes call “creativity”—but it has no ethics, no intent, and no understanding. Human reasoning, on the other hand, brings structure, purpose, and responsibility.

RTEI is not just a technical fusion. It is a moral architecture: one that safeguards the unpredictability of AI with the reliability and interpretability of classical DSP. In a way, it echoes the very legacy of these scholars: logic and structure, guided by ethics and accountability.

A Principle That Still Applies in 2025

They remind us that clear, ethical, and deeply structured thinking transcends time.

This is where the classical principle of رفع الحرج (removal of hardship) becomes strikingly relevant. It wasn’t just about compassion—it was about reducing confusion and unnecessary complexity in systems.

RTEI carries this legacy forward: combining the reliability of DSP with the adaptability of ML to build systems that are powerful, yet clear and responsible. In a world where generative AI can hallucinate, we need that balance more than ever.

Sometimes, the most radical idea is to slow down and think like it’s the 9th century!

Author

Dr. Sanjeev Sarpal

Sanjeev is a RTEI (Real-Time Edge Intelligence) visionary and expert in signals and systems with a track record of successfully developing over 26 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT/RTEI solutions and strategies for I5.0, telemedicine, smart healthcare, smart grids and smart buildings.
View all posts

May 27, 2025/0 Comments/by Dr. Sanjeev Sarpal

The Future of Edge AI: Real-Time Edge Intelligence (RTEI)

Algorithms Dr. Sanjeev Sarpal

True progress in Edge AI won’t be achieved by simply throwing more silicon at the problem. It will come from smarter models, energy-aware algorithms, adaptive architectures, robust memory systems, and a deep rethinking of what “intelligence at the edge” truly means. This article introduces the concept of Real-Time Edge Intelligence (RTEI) as the foundation for the next generation of computing.

Why Edge AI Matters More Than Ever

Edge AI is no longer optional. With billions of devices generating real-time data, the need for low-latency, private, and energy-efficient intelligence is undeniable. The edge is not merely an extension of cloud services; it is a new frontier for autonomy, privacy, and responsiveness.

Real-Time Edge Intelligence (RTEI) is the evolution: autonomous, adaptive intelligence operating locally where data is created. RTEI prioritizes immediacy, privacy, energy efficiency, deterministic behavior, and autonomy over mere computational brute force.

Where the Industry is Getting it Wrong

The industry is still obsessed with scaling computational power—bigger NPUs, more cores, faster memory—without fundamentally rethinking software, architecture, or intelligence models. The default AI deployment model remains cloud-heavy, treating edge inference as a scaled-down afterthought. Meanwhile, the growing “energy debt” of AI is a looming crisis. In the United States, data-centre consumption is projected to reach 12% of total electricity use by 2028. Similarly, in Europe, electricity demand from data-centres is expected to nearly triple by 2030, growing from 62 terawatt-hours (TWh) in 2022 to over 150 TWh. This would raise data-centres’ share of the EU’s total electricity consumption from approximately 2% to around 5%, largely driven by the expansion of AI workloads.

Edge AI must evolve. It must be Real-Time Edge Intelligence.

The Real Future: Smarter, Leaner, Specialized Edge Intelligence

The future belongs to compact models, made possible through better feature extraction based on scientific understanding of the processes being modeled. By leveraging domain knowledge and intelligent preprocessing, we can dramatically simplify machine learning models, reducing the need for brute-force complexity. While techniques such as TinyML, pruning, quantization, and sparsity can enhance efficiency, they alone are not sufficient to achieve true scalable, real-time edge intelligence. A deeper understanding of the underlying processes, leading to better feature extraction and model design, is critical to minimizing complexity and maximizing performance at the edge. Energy-aware AI will prioritize event-driven, minimalist computation rather than brute-force processing. Hybrid architectures will combine human-defined rule-based logic with neural networks, offering both transparency in decision-making and adaptability to complex real-world data. Devices will enable continuous, private learning without cloud dependence, while systems will be built with security-by-design at both hardware and firmware levels to ensure autonomous, trusted operation. Above all, Real-Time Edge Intelligence must serve human needs directly, not corporate agendas.

Critically, RTEI systems must ensure real-time determinism, enabling predictable and reliable decision-making even under variable load conditions. Efficient memory architectures and local memory bandwidth optimization are foundational design considerations for enabling such real-time behaviour.

Practical Technology Shifts Required

The future demands edge-optimized toolchains that simplify the journey from training to deployment, targeting energy-efficient and predictable inference. Edge-first architectures must prioritize real-world latency, energy use, and built-in security. A hybrid model of edge and cloud will allow dynamic collaboration between local and cloud resources. In the long term, neuromorphic computing—brain-inspired systems—will offer dramatic improvements in efficiency.

Toolchain maturity and developer enablement will be critical success factors. Without easy-to-use, efficient development platforms, RTEI solutions will fail to scale beyond isolated projects.

One promising hardware platform is the next-generation Arm Cortex-M processors with Helium technology. These processors integrate vector processing capabilities optimized for AI, DSP, and machine learning workloads at the edge, delivering the performance and energy efficiency needed for true RTEI applications. The combination of Helium’s SIMD capabilities with secure, deterministic real-time processing offers a practical foundation for future edge intelligence solutions.

At the same time, technical evolution must be matched with rigorous, internationally recognized standards. Regulatory bodies such as the IEC (International Electrotechnical Commission) must develop and enforce standards for AI systems, particularly at the edge, to ensure safety, reliability, security, and ethical alignment. Legislation alone is too slow to keep pace with technological change; therefore, technical standards must lead the way in setting best practices, safeguarding users, and accelerating responsible innovation.

Real Challenges to Solve

Data ownership must be protected by design, ensuring user privacy is not compromised. With increased autonomy comes a wider attack surface, so local security must be significantly strengthened—starting at the silicon level, not as an afterthought. Memory access and real-time processing requirements must be integrated from the earliest design stages. Finally, to avoid a fragmented ecosystem, open standards and interoperability must be embraced and actively developed through collaboration between industry and standards organizations.

The Future Will Not Be Cloud-Centric. It Will Be Edge-Intelligent

Real-Time Edge Intelligence is built on several core pillars. It demands contextual awareness, where intelligence is shaped by immediate environments. It emphasizes energy-conscious design, relying on minimalist, event-driven computation. Autonomous adaptation is critical, with continuous local learning and refinement. Deterministic, predictable behavior underpins trust and usability. Security must be inherent, with hardened local autonomy built into silicon, firmware, and software. Most importantly, RTEI must aim for human-centric goals, enhancing real human experiences.

RTEI is not a miniaturized cloud service, nor a bigger NPU strapped onto an old architecture. It is not a privacy patch on centralized AI or a marketing buzzword. Rather, RTEI is smarter, smaller, ethical, immediate intelligence. It is a new frontier for engineering and creativity, and a survival strategy for an energy-constrained, privacy-conscious future.

We must stop scaling hardware for its own sake and start scaling wisdom—building smarter, faster, more trusted intelligence where it matters: in real time, at the edge.

RTEI is not optional. It is inevitable.

Author

Dr. Sanjeev Sarpal

Sanjeev is a RTEI (Real-Time Edge Intelligence) visionary and expert in signals and systems with a track record of successfully developing over 26 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT/RTEI solutions and strategies for I5.0, telemedicine, smart healthcare, smart grids and smart buildings.
View all posts

April 28, 2025/0 Comments/by Dr. Sanjeev Sarpal

BLW Filtering for ECG Systems: Getting IEC/FDA compliance right

Algorithms, Biomedical Dr. Sanjeev Sarpal

Baseline wander (BLW) is one of the most persistent — and misunderstood — problems in ECG systems. Caused by respiration, movement, and electrode drift, BLW introduces low-frequency drift that can distort key waveform features and disrupt clinical analytics.

While BLW suppression is essential and mandated in all ECG systems, developers often face two key challenges:

Interpreting the filtering requirements in standards such as IEC 60601-2-25 and 60601-2-47, which define passband specifications but fall short on practical implementation guidance
Designing real-time, morphology-preserving filters that are efficient enough to run on embedded platforms such as modern low power microcontrollers used in modern wearable and portable devices.

In both the United States and Europe, regulatory bodies reference the IEC 60601-2-25 and 60601-2-47 standards when evaluating ECG systems. In the United States, the FDA uses them as a key part of its review process, while in Europe they form the basis for CE marking under the MDR (Medical Device Regulation, EU 2017/745). As such, adhering to IEC 60601-2-25 and 60601-2-47 is essential for global regulatory compliance.

Despite the growing use of AI and ML in ECG signal processing, current IEC and FDA standards have not yet caught up. There is no formal guidance on how to validate AI-based filtering or classification pipelines, leaving developers without a clear compliance path when using these emerging techniques.

As a result, most developers still rely on traditional digital filtering methods — especially when targeting regulatory approval. This is particularly true in wearable and portable devices, where Arm Cortex-M microcontrollers, such as the STM32 family, are the dominant hardware platform. Yet even here, many continue to follow legacy filtering advice inherited from the analog era, without re-evaluating what’s truly required for today’s real-time digital systems.

In this article, we break down what the standards actually require, explaining where the widely-cited 0.05 Hz cutoff comes from, and show how to design real-time high-pass filters that balance signal quality, compliance, and power efficiency — all within the constraints of modern embedded platforms such as the STM32 microcontroller.

What the standards don’t say, and why this is confusing

While IEC 60601-2-25 and IEC 60601-2-47 do define passband requirements (e.g., 0.05–150 Hz or 0.67–40 Hz), but they don’t offer practical engineering guidance for how to meet those requirements on resource-constrained platforms like wearables.

They don’t cover things such as:

How to implement the filter in a real-time embedded system
Trade-offs between FIR vs IIR in morphology-sensitive signals
What to do about SoCs with built-in analog filtering
Energy/performance implications of filter design choices
How to practically validate the signal chain for compliance — e.g. when conducting the ‘sinewave test’, should the amplitude of the 0.67-40Hz test sinewaves be measured continuously or averaged over time? Should I use RMS or peak amplitude when comparing the amplitudes with the 5Hz reference amplitude?

This leaves engineers with many unanswered questions:

Is a 0.05 Hz cutoff always necessary?
What level of phase distortion is acceptable?
Can I use an IIR filter and still be compliant?
How do I qualify an SoC with undocumented internal filters?
What measurement methodology should I use for compliance testing?

As a result, developers often either overdesign their filters based on worst-case diagnostic assumptions — or under-design, thinking any high-pass filter will suffice. Neither approach guarantees compliance or robust signal quality.

Baseline wander (BLW) is a well-known issue in ECG systems, arising from slow movements such as respiration, torso motion and electrode impedance shifts. As a consequence, BLW introduces a low-frequency drift, distorting the ECG baseline and compromising the accuracy of QRS detection, HRV analysis, and waveform interpretation.

While baseline wander (BLW) suppression is essential — and mandated in all ECG systems, developers often face two key challenges:

Interpreting the filtering requirements in standards such as IEC 60601-2-25 and 60601-2-47, which define passband specifications but fall short on practical implementation guidance
Designing real-time, morphology-preserving filters that are efficient enough to run on embedded platforms such as modern low power microcontrollers

These issues often lead to overengineered solutions, misinterpretation of compliance requirements, or filters that distort clinically relevant features — all of which can compromise both product performance and regulatory approval.

0.05 Hz: A Diagnostic-Grade requirement from IEC standards

The 0.05 Hz lower cutoff frequency originates from IEC 60601-2-25, which governs diagnostic electrocardiographs. It is also referenced in IEC 60601-2-47 for ambulatory ECG systems when high-fidelity diagnostic performance is required.

The mandated system bandwidth of 0.05-150 Hz is intended to preserve:

ST-segment deviations
T-wave alternans and morphology
Other slow-changing diagnostic features

This specification is well-suited to clinical and hospital environments but presents serious challenges for embedded systems.

Why 0.05 Hz is difficult to implement in realtime

To meet this requirement, developers usually consider two filtering options:

IIR filters: These are computationally efficient but exhibit non-linear phase response, causing phase distortion that alters the shape of the QRS complex. This makes IIR filters unsuitable for ECG applications where signal morphology is critical.

FIR filters with linear phase: These preserve morphology but require very long filter lengths—often extending to 5-10k filter coefficients for a 0.05 Hz cutoff—resulting in increased memory use, latency, and computational cost.

Choosing the best cut-off frequency

Wearable ECG systems operate under constraints in power and processing. These systems typically target a bandwidth of 0.5-40 Hz. This is sufficient for preserving the key features of the ECG waveform, while also easing implementation on embedded hardware. This range aligns with reduced fidelity requirements in IEC 60601-2-47 for wearable or monitoring devices.

Where does BLW actually occur?

In practice, the majority of BLW energy is concentrated below 0.5 Hz. This is supported by:

Real-world ECG datasets
Clinical literature on motion and respiration artefacts
Empirical testing using wearable sensors under normal conditions

Sources of BLW such as respiration (~0.2–0.3 Hz), slow body movement, and electrode drift typically lie in the <0.5 Hz region. This means: a high-pass filter with a cutoff of 0.5 Hz or 0.67 Hz will suppress most baseline drift without distorting the QRS complex or T-wave morphology.

This insight is essential for wearable and mobile ECG systems, where developers must balance:

Signal fidelity
Computational efficiency
Power consumption
Compliance with IEC 60601-2-47 (where applicable)

Over-filtering (e.g., using a 1 Hz high-pass cutoff) can begin to distort critical ECG features, particularly the ST segment and T-wave. On the other hand, under-filtering may leave residual baseline drift that impacts analytics such as heart rate variability and R-peak detection.

The following animation demonstrates how the QRS is warped when using a 1st order IIR filter.

The 0.5–0.67 Hz cutoff range is often the optimal balance in wearable ECG systems — effective in suppressing baseline wander while preserving essential waveform morphology.

In fact, IEC 60601-2-47 stipulates that a 0.67Hz cut-off may be used if no phase distortion is introduced. We can therefore conclude that a signal bandwidth of 0.67-40Hz is IEC compliant for ambulant systems.

When implemented with a linear-phase FIR filter, this approach meets both signal quality and regulatory requirements — and remains computationally feasible for embedded targets such as the STM32.

Implementation on STM32 microcontrollers

STM32 microcontrollers built on the Arm Cortex-M architecture (e.g. M4, M7 cores) are a popular choice for wearable ECG systems. They offer a solid combination of processing performance, on-chip memory, and energy efficiency — all critical factors for medical-grade real-time signal processing.

When implementing high-pass filters for baseline wander suppression, several hardware and algorithmic factors must be considered:

Hardware Floating-Point Support expedites RAD: devices like the STM32F4, F7, and H7 feature single-precision floating-point units (FPUs). This allows developers to expedite RAD (rapid application development) by prototyping and validating filters quickly without needing to deal with fixed-point scaling and rounding errors.
FIR Filters preserve morphology, but are computationally heavy: Linear-phase FIR filters are strongly recommended for ECG applications because they preserve temporal relationships of the waveform, which is essential for correct analysis.
However,
- A 0.5 Hz cutoff at 200 Hz sampling typically requires 100s of filter coefficients, which is manageable on most STM32 devices (Cortex-M4/M7), and just about manageable on ultralow power Cortex-M0+ devices.
- In contrast, a 0.05 Hz cutoff typically requires 1000s of filter coefficients (10k+ in some cases), resulting in high memory usage and processing load — often beyond what’s practical for real-time wearable designs.
Affect on battery life: Long FIR filters keep the FPU active nearly continuously, which prevents the microcontroller from entering low-power states. This significantly impacts battery life — especially in wearables running at higher sample rates, e.g. 500 Hz. For this reason, the cutoff frequency must be carefully selected to balance signal quality with energy efficiency.
Avoid IIR Filters for morphology-critical Applications: While IIR filters are computationally attractive due to their low order, they introduce non-linear phase distortion. This warps the QRS complex, alters timing relationships between ECG segments, and undermines compliance with medical waveform standards. For any application where waveform shape matters, IIR filters should be avoided.

Ultimately, for wearable ECG systems targeting compliance and signal integrity, linear-phase FIR filters on STM32 microcontrollers provide the most practical and reliable foundation for real-time baseline wander removal.

Multirate FIR Filtering for computationally efficient ECG filtering

FIR filter cascades are among the most practical and precise methods for implementing baseline wander removal, particularly when combined with decimation and multirate techniques. Baseline wander removal using FIR filters can be implemented in two primary ways: by cascading multiple FIR low-pass filters and subtracting the result from the original signal, or by designing a cascade of high-pass FIR filters directly. Both approaches are suitable for use in systems that must preserve ECG morphology, with the subtraction method offering particularly clear control over the high-pass behaviour when combined with multirate stages.

This approach offers a high degree of flexibility, as the designer can control the cutoff frequency, stopband attenuation, and transition width with great precision. It also supports linear-phase operation, ensuring that waveform features like the QRS complex remain undistorted — a crucial factor for diagnostic applications.

Although other techniques exist — such as moving average filters and the Kolmogorov-Zurbenko (KZ) cascade, which is a cascade of identical moving average filters — these methods are computationally efficient and linear-phase by nature, but they lack the precise frequency shaping and design flexibility that FIR filters provide. FIR-based filtering, by contrast, gives developers precise control over the cutoff, transition width, and stopband characteristics — making it a strong choice for applications where accurate control over the frequency characteristics are required — particularly in systems targeting regulatory compliance.

FIR filters can also be optimized using polyphase decomposition, which restructures the filter to operate only at decimated output points. This avoids unnecessary computation and memory access, especially when combined with multistage decimation, where each FIR stage operates at a lower sampling rate than the previous one. Instead of a single large filter at the original sampling rate, several short FIRs operating at lower rates achieve the same result with far fewer operations per second.

Even though Cortex-M processors like the STM32F4/F7 lack true parallelism, the polyphase structure maps efficiently onto SIMD instructions, improving throughput and reducing power consumption. FIR filtering, when designed with multirate and polyphase techniques, provides a scalable and standards-compliant solution for real-time baseline wander removal on embedded platforms.

The following C code snippet uses the Arm CMSIS-DSP framework to implement an FIR filter designed for a 0.05 Hz high-pass cutoff, assuming the signal is decimated from 200 Hz to 20 Hz. The decimation factor of 10 significantly reduces the computational load, making a long FIR filter more practical for embedded systems.

#include "arm_math.h"

#define BLOCK_SIZE         200   // Number of input samples per call
#define NUM_TAPS           120   // FIR filter length designed at 20 Hz sampling rate
#define DECIMATION_FACTOR  10    // Decimate by 10 (200 Hz → 20 Hz)

// FIR coefficients designed offline for 0.05 Hz high-pass at 20 Hz
extern float32_t firCoeffs[NUM_TAPS];

float32_t firState[BLOCK_SIZE + NUM_TAPS - 1];
float32_t inputBuffer[BLOCK_SIZE];
float32_t outputBuffer[BLOCK_SIZE / DECIMATION_FACTOR];

arm_fir_decimate_instance_f32 S;

// Initialization
arm_fir_decimate_init_f32(&S, NUM_TAPS, DECIMATION_FACTOR, firCoeffs, firState, BLOCK_SIZE);

// Processing loop
arm_fir_decimate_f32(&S, inputBuffer, outputBuffer, BLOCK_SIZE);

As a closing remark, it’s certainly true that many developers — especially in research or offline processing environments — use zero-phase IIR filtering by applying the filter forward and backward through the signal. While this technique avoids phase distortion and preserves waveform morphology, it is inherently non-causal and unsuitable for real-time applications. This distinction is critical, as filters that rely on forward-backward (zero-phase) processing cannot be implemented in real-time embedded systems, making them unsuitable for wearable applications that operate on streaming data.

Biomedical SoCs: Powerful, but require full signal-chain qualification

Several IC vendors — including Analog Devices (ADI), Texas Instruments (TI), and STMicroelectronics — have developed highly integrated biomedical SoCs that feature built-in analog front ends (AFEs). These devices often include:

Programmable gain amplifiers
Instrumentation amplifiers
Integrated analog filtering
High-resolution ADCs with CIC (Cascaded Integrator-Comb) filters

While these SoCs offer impressive integration and are well-suited for compact, low-power designs, they also introduce a significant challenge, i.e. Developers must still qualify the entire signal chain — including analog and digital filter stages — to ensure IEC compliance.

Typical challenges include:

Measuring the overall frequency response of the signal path
Evaluating the effects of CIC decimation filters and analog high-pass/low-pass stages
Understanding the behaviour of any hidden gain or filtering elements inside the AFE or ADC

Because much of this behaviour is not fully documented, it often requires empirical validation using signal injection and sweep testing — a tedious and time-consuming process that many teams underestimate.

Without this validation, there’s a real risk of:

Non-compliant frequency response
Undetected distortion of QRS or ST segments
Unexpected interaction between analog filtering and digital signal processing

Bottom line: integrated SoCs are powerful, but compliance is never automatic. It’s up to the developer to fully characterize and correct the signal path to meet the requirements of IEC 60601-2-47 — especially when targeting diagnostic or screening-grade wearable ECG systems.

Recommendations based on legacy systems

The commonly referenced 0.05 Hz high-pass cutoff originates from the behaviour of first-order analog filters used in ECG systems developed in the 1970s and 1980s. These filters had slow roll-off and introduced significant phase distortion, but were accepted at the time due to the limitations of analog circuitry and minimal awareness of morphology distortion effects.

Over time, this cutoff value was incorporated into standards such as IEC 60601-2-25, and later IEC 60601-2-47, often without re-evaluation in the context of modern digital systems. As a result, much of today’s ECG design guidance continues to reflect analog-era limitations, despite dramatic advances in hardware and signal processing.

It’s also worth noting that much of the legacy design advice found in textbooks and reference designs stems from the ‘analog era’ — when discrete op-amps, analog filters, and limited processing power dictated system architecture. While those principles served their purpose, modern biomedical SoCs and real-time digital filtering pipelines operate under entirely different constraints and possibilities.

As a result, engineers must challenge inherited assumptions, revalidate their signal path in the digital domain, and adopt design practices suited to real-time embedded systems — not outdated analog models.

This shift in mindset is essential not only for meeting IEC compliance in modern systems, but for achieving robust, efficient, and clinically reliable signal quality — especially in wearable and low-power applications.

Designing for Clarity, Compliance and Real-time constraints

Tools such as the ASN Filter Designer simplify ECG filter development by providing:

Reference designs for designing linear phase IEC compliant ECG filter cascades (0.5-40Hz)
High accuracy frequency response charts
The ability to load and stream ECG dataset to visualize filter performance in real-time
Code export options for Arm processors (ANSI C) and Python and Matlab

This allows engineers to deploy real-time, medically robust filters quickly and with confidence.

Key Takeaways

A 0.05 Hz high-pass cutoff is referenced in both IEC 60601-2-25 (diagnostic ECG) and IEC 60601-2-47 (ambulatory ECG) for systems intended to support full diagnostic fidelity. However, this requirement is not appropriate for most wearable applications.

In wearable environments, baseline wander is dominated by motion and respiration artefacts, most of which lie below 0.5 Hz. As such, filtering at 0.05 Hz does not sufficiently suppress BLW, and imposes significant implementation burdens — including high memory usage, increased latency, high computation requirements and greater power consumption.

Crucially, IEC 60601-2-47 permits a high-pass cutoff of 0.67 Hz for ambulatory systems, provided that no phase distortion is introduced. When implemented using a linear-phase FIR filtering, a bandwidth of 0.67–40 Hz is both IEC-compliant and technically practical for real-time implementation on embedded platforms such as the STM32.

AI-based ECG processing

While the current standards, including IEC 60601-2-47, provide guidance for traditional digital filtering approaches, it’s important to note that they have not yet caught up with AI-based ECG processing techniques. As of today, there are no formal standards, validation protocols, or compliance pathways defined for ECG systems using AI/ML for baseline correction or morphology analysis.

This lack of regulatory clarity means that, for now, classical digital filtering — particularly linear-phase FIR filters — remains the most robust, transparent, and standards-aligned approach for ensuring ECG signal quality and achieving IEC/FDA compliance in wearable and ambulatory systems.

ASN Solutions and Expertise

At ASN, we’ve supported numerous international clients in designing medically robust ECG systems — including helping them achieve FDA and IEC 60601-2-47 compliance. From real-time FIR filter design to signal chain validation and compliance testing workflows, we offer practical, implementation-ready solutions tailored for modern embedded systems and wearable applications.

Author

Dr. Sanjeev Sarpal

Sanjeev is a RTEI (Real-Time Edge Intelligence) visionary and expert in signals and systems with a track record of successfully developing over 26 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT/RTEI solutions and strategies for I5.0, telemedicine, smart healthcare, smart grids and smart buildings.
View all posts

March 28, 2025/0 Comments/by Dr. Sanjeev Sarpal

Rethinking AI: Why Edge Intelligence must go beyond traditional ML

Algorithms Dr. Sanjeev Sarpal

AI has been glorified as the future of automation, often portrayed as the ultimate solution for efficiency, decision-making, and innovation across industries. It has been marketed as an all-encompassing technology capable of transforming everything from healthcare and finance to autonomous systems and industrial processes.

In practice, this narrative does not match reality, as AI in its current form is too limited to be relied upon for mission-critical applications. While it has demonstrated some success in controlled settings, it struggles to adapt to real-world complexities and unpredictability. While tech giants celebrate cloud-trained AI models, these solutions typically fail spectacularly when deployed in dynamic, unpredictable environments. This is because AI lacks commonsense reasoning and struggles with real-world subtlety, i.e. it doesn’t understand the real world in the same way that humans do. It is typically trained on synthetic or limited datasets, which fail to fully capture the diverse and complex scenarios it is expected to handle. As a result, AI systems often misinterpret context, leading to unreliable or misleading outcomes in unpredictable operating environments.

The Consequences of AI’s Limitations

The lack of common sense of how the physical world works and limited training data is a fundamental limitation of AI systems. This can lead to costly failures, false predictions, and in worst cases, complete system breakdowns—making them unsuitable for environments where precision and reliability are paramount.

Another significant limitation, particularly for large-scale models like LLMs, is that AI models require powerful computing resources, making them inefficient for real-time, low-power edge applications. That being said, advancements in Nvidia’s latest chipsets, such as the Jetson Orin series are certainly helping bridge this gap by providing high-performance, power-efficient AI processing directly on edge devices.

While these new chipsets allow AI models to run locally and reduce reliance on cloud computing, AI in general still faces challenges such as excessive power consumption compared to deterministic DSP algorithm-based solutions, reliance on limited datasets, and a lack of explainability. These factors make AI unsuitable for industries requiring strict regulatory compliance and safety. While some smaller AI models can be optimised for edge deployment, many modern AI architectures remain computationally expensive and impractical for real-time, low-power edge applications.

Furthermore, most ML models rely on generalised feature extraction algorithms (mean, standard deviation, kurtosis, correlation etc) and are trained on limited, often unrealistic datasets. AI’s reasoning is entirely data-driven, meaning that we still don’t fully understand how the models work, making them very different from traditional DSP algorithms that use a mathematical recipe or a set of predefined rules. When faced with new, unseen conditions, AI often produces inaccurate or misleading results. In contrast, RTEI (Real-time Edge Intelligence) leverages DSP algorithms that are based on science, making them scientifically accurate and reliable in complex, real-world applications.

The AI Illusion: Why Traditional AI Fails at the Edge

Cloud-based AI solutions dominate today’s landscape because they require vast computational resources to function effectively. However, when deployed on edge devices with limited power and processing capacity, AI’s inefficiencies become apparent.

Predictive maintenance in industrial environments serves as a pertinent example. AI-based solutions are often promoted as game-changers, yet they struggle with a fundamental issue: the lack of real-world failure data. Most foremen and factory managers will not allow researchers to deliberately break machines for data collection, leading to AI models trained on synthetic or limited failure cases. As a consequence, this creates significant gaps in understanding of the normal and abnormal behaviour of the machine or process, leading to potential misdiagnoses and operational inefficiencies.

A more effective approach—Real-Time Edge Intelligence (RTEI)—combines DSP algorithms for feature extraction with ML models for classification. For example, in vibration analysis, DSP-based techniques can be used to generate harmonic fingerprints of velocity and displacement (feature extraction), which can be used to detect anomalies before they lead to system failure. These fingerprints or features are then fed into ML models for fault classification. This hybrid approach ensures accuracy and robustness, as DSP algorithms rely on physics and engineering principles (e.g. Fourier analysis, Kalman filtering) rather than data-driven learning.

RTEI: The Future of Edge Intelligence

RTEI (Real-time Edge Intelligence) represents a fundamental shift in AI for edge computing. By integrating real-time DSP algorithms with ML models, RTEI enhances accuracy, reliability, and computational efficiency. Unlike traditional AI, which operates on probabilistic reasoning, RTEI leverages fundamental scientific principles, making it more predictable and suited for mission-critical applications like autonomous vehicles, industrial automation, and medical diagnostics, where any misjudgements could have catastrophic consequences.

For Edge IoT to reach its true potential, intelligence must be embedded directly into devices. We are already seeing promising advancements—industrial-grade vibration analysis systems using real-time DSP algorithms to detect early signs of mechanical failure, and aircraft autopilot systems that rely on deterministic control algorithms rather than AI, ensuring mission-critical reliability in aviation, and self-driving cars that utilize LiDAR, cameras and other sensors to navigate autonomously without solely depending on AI-based decision-making. These systems prioritise reliability through scientifically proven methods rather than speculative, data-driven predictions.

As AI’s reasoning is fundamentally different to that of traditional DSP algorithms, a key point to realise here is that unlike DSP algorithms based on predefined rules and mathematical concepts (i.e. designed with human intelligence), how the AI reaches its result remains an enigma, and is the primary reason why they shouldn’t be allowed to operate without any scrutiny on critical processes.

RTEI enhances the overall solution by adapting its feature extraction algorithms to real-world variations, ensuring consistently high-quality data for AI classification. For example, when measuring analog sensor data using an ADC, temperature variations in the instrumentation electronics would cause the sampling rate to slightly vary. This variation would lead to a mismatch between the ideal model and real-world signal tendered to the classifier. As such, a conventional AI model would struggle, as these variations were probably not taken into account during the model’s training phase.

This is where DSP algorithms, such as a method that analyses timestamps or a Kalman filter can shine, as sampling rate variation can be taken into account in the estimation model. As such the DSP estimation algorithm can estimate the signal’s sampling rate in real-time and use this estimate to perform other operations required for the feature extraction operation. This ensures that only high-quality features are provided to the classifier in varying temperature environments—a very realistic scenario! Finally, it should be noted that this approach has the added advantage of requiring less ML training data, which expedites development and lowers project costs.

The Role of Arm Processors in RTEI

A major driving force behind this revolution is the Arm-based processor ecosystem. Unlike traditional cloud-based AI solutions, Arm Cortex processors (including the newer Arm Helium processors) provide a power-efficient way to run edge-optimized AI models in real-time. These processors are already at the core of smart sensors, embedded systems, and industrial automation, ensuring that AI-powered IoT devices can process and react instantly to changes in their environments.

Lockstep Processor Technology for Mission-Critical Systems

A fundamental aspect of mission-critical systems is Lockstep processor technology, which ensures redundancy and fault tolerance in real-time applications. Processors such as the Arm Cortex-R4 and Cortex-R5 are designed with lockstep functionality, where two identical processors run the same instructions simultaneously and compare results.

Lockstep technology is particularly useful for detecting hardware faults, which can arise from environmental influences, component ageing and external interference. For example, bit flips in memory can occur due to electromagnetic interference (EMI) from industrial machinery or power supply fluctuations, corrupting data memory and leading to algorithmic errors.

A key concept for dual-processor Lockstep processing is that it detects discrepancies but does not determine which processor is correct. Since both processors execute the same software, software bugs will appear identically on both, making lockstep ineffective for detecting programming errors. For true fault tolerance and error correction, a Triple Modular Redundancy (TMR) approach is often used, where three processors execute the same software, and a majority vote determines the correct outcome either per cycle or function.

ASIL Compliance and Automotive Safety

Lockstep technology is essential for Automotive Safety Integrity Level (ASIL) compliance, ensuring that automotive safety systems can detect and handle processor faults. For example, in an adaptive cruise control system, if a processor mismatch occurs, the system disengages cruise control rather than making an unsafe decision. This prioritisation of passenger safety over continuous operation is crucial for mission-critical applications.

For systems requiring the highest levels of reliability—such as flight control systems or nuclear power plants—TMR is employed to prevent a single fault from compromising the entire system.

Neoverse: High-Performance Edge Computing

Arm’s Neoverse architecture is a high-performance processor family designed for data centres, edge computing, and cloud infrastructure. Unlike Cortex processors, which are commonly used in mobile and embedded applications, Neoverse is optimized for scalability, power efficiency, and AI acceleration, making it well-suited for high-performance edge computing and is an interesting alternative to Nvidia’s Jetson Orin GPU series.

With advancements in Arm Cortex (especially Helium) and Neoverse architectures, developers can now deploy real-time AI workloads directly on edge devices, eliminating cloud dependencies. This means better security, reduced costs, and instant decision-making, all of which are essential for next-generation IoT applications.

Key takeway: The Evolution from AI to RTEI

Rather than dismissing AI’s role in edge applications, RTEI represents the next evolutionary step—one that acknowledges AI’s limitations and enhances its capabilities through deterministic DSP algorithms. Traditional AI struggles to generalize beyond pre-trained scenarios, lacks commonsense reasoning, and remains a black box in decision-making. These weaknesses make it ill-suited for dynamic, real-world applications.

The time has come to move beyond the assumption that cloud-trained AI can work everywhere. Instead, RTEI offers a hybrid intelligence system—combining the strengths of AI and DSP for real-time, reliable, and efficient edge intelligence.

By embedding intelligence directly into edge devices using Arm processors, lockstep technology, and deterministic DSP algorithms, we can build smarter, safer and more adaptable systems.

Author

Dr. Sanjeev Sarpal

Sanjeev is a RTEI (Real-Time Edge Intelligence) visionary and expert in signals and systems with a track record of successfully developing over 26 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT/RTEI solutions and strategies for I5.0, telemedicine, smart healthcare, smart grids and smart buildings.
View all posts

January 31, 2025/0 Comments/by Dr. Sanjeev Sarpal

Beyond the Hype: How Real-Time Edge Intelligence will surpass Generative AI in 2025

Algorithms Dr. Sanjeev Sarpal

Generative AI (software capable of producing new content such as text, images, music or animations) has captured the public’s imagination. Many websites offer tools for creating playful visuals or novelty outputs with just a few clicks, but these entertainment-focused applications often fail to address the pressing, real-world demands of modern businesses.

While such generative technologies excel at creativity and rapidly prototyping ideas, Real-Time Edge Intelligence (RTEI) is positioned to have a far greater impact where it counts most, i.e. mission-critical operations. By processing data at the Edge and providing immediate insights, RTEI directly tackles key needs in manufacturing, healthcare, automotive, logistics, and other fields—scenarios in which split-second decisions and reliability are paramount.

Immediate Decision-Making

While generative AI excels at creative tasks, such as producing articles or amusing content—its outputs typically aren’t time-sensitive in physical environments, i.e. they’re not real-time. RTEI, on the other hand, processes real-time sensor data (e.g. temperature, CO₂ levels, vibration etc) at the Edge in order to take split-second actions. Typical applications include: worker safety systems, autonomous vehicles and factory robotics. This local intelligence can literally save lives and prevent costly downtime for factory owners.

Operational ROI and Efficiency

Generative AI’s large DNN models are computationally intensive, demanding powerful GPU cloud infrastructure which is power intensive and extremely expensive. Due to the present energy crisis currently gripping the Western world, cutting costs is paramount.

Edge-based intelligence runs on significantly lower-cost hardware (such as Arm Helium microcontrollers), often with lower bandwidth needs and reduced energy consumption, making them more suitable for cost-critical factory automation. Real-time data analysis at the edge can uncover anomalies in machinery before they lead to breakdowns or defects. This proactive approach significantly reduces downtime and waste—outcomes with immediate, tangible returns on investment (ROI).

Industry 5.0 Alignment

Industry 5.0 highlights collaboration between humans and machines, with an emphasis on personalization and sustainability. RTEI facilitates swift, localized decisions that keep production lines adaptive, safe, and responsive to human input—enabling a true fusion of human creativity and machine efficiency.

Workers can use Augmented Reality (AR) glasses or even other wearable devices to receive real-time insights on equipment performance or process instructions. By leveraging local analytics, these tools remain functional and effective mediums in areas with poor cloud connectivity—a massive advantage in busy factories or at remote sites with poor Wi-Fi coverage.

Arm Helium: specialised Edge hardware accelerators

Emerging technologies like Arm Helium – also known as the Armv8.1-M Scalable Vector Extension (MVE) enable complex filtering, sensor-fusion, and inference tasks to be performed at the microcontroller level. This hardware acceleration makes edge-based AI solutions far more powerful than before, paving the way for advanced local inference and localised signal processing for sensor data.

The new M52 core is particularly interesting as it adds Arm’s TrustZone security paradigm to the mix. This innovation allows a single low-power edge device to handle intensive AI tasks (e.g., computer vision), perform DSP operations for filtering IoT sensor data, and provide robust hardware security—all in one solution!

Many of the latest Helium-based microcontrollers also leverage TSMC’s 22nm ultra-low-power semiconductor technology, delivering up to a 50% reduction in power consumption compared to the older 40nm process. As a result, RTEI can now be deployed in battery- or solar-powered devices, extending sustainability and reach without sacrificing performance.

Key takeaways

Generative AI will remain a formidable tool for creativity, rapid prototyping, and entertainment oriented applications. However, its reliance on large-scale cloud resources and the generally non-urgent nature of its outputs limits its immediate impact on mission-critical operations. Real-Time Edge Intelligence (RTEI) by contrast, offers instantaneous, localised decision-making, fortified data security, and tangible cost savings—precisely the attributes demanded by many business owners.

As we enter a turbulent 2025, where rising inflation and energy costs are daily matters of concern, RTEI’s practical benefits will far outweigh the playful allure of generative AI. The evolution of specialised hardware (exemplified by Arm Helium) further confirms RTEI as the essential technology shaping next generation manufacturing, healthcare, logistics and beyond. In this rapidly changing landscape, RTEI is destined to surpass generative AI in real-world importance, defining the future of industrial and operational intelligence.

Author

Dr. Sanjeev Sarpal

Sanjeev is a RTEI (Real-Time Edge Intelligence) visionary and expert in signals and systems with a track record of successfully developing over 26 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT/RTEI solutions and strategies for I5.0, telemedicine, smart healthcare, smart grids and smart buildings.
View all posts

January 24, 2025/0 Comments/by Dr. Sanjeev Sarpal

Maximizing IoT sensor performance: Getting the most out of your sensor with DSP and AI

Algorithms Dr. Sanjeev Sarpal

In an era dominated by IoT applications, sensors are everywhere—embedded in our homes, vehicles, industries, and even our bodies. They generate an immense amount of data that holds valuable insights waiting to be uncovered. Traditional DSP algorithms like the Fast Fourier Transform (FFT) and Kalman filters have been fundamental in analysing this data, effectively extracting features of interest and filtering out noise. These algorithms excel in tasks such as frequency analysis, state estimation, and noise reduction, providing precise and reliable results.

However, as the complexity and volume of sensor data grow, relying solely on DSP algorithms is no longer sufficient. The patterns and anomalies within large-scale, multidimensional data streams often exceed the capabilities of traditional methods. This is where AI and ML models become indispensable. AI/ML models are adept at handling complex, nonlinear patterns and can make predictions based on learned data. Yet, they lack common sense of the process that they are modelling and are also highly dependent on the quality of the input data.

Combining the strengths of both DSP algorithms and AI/ML models leads to more robust and efficient sensor data processing systems. DSP techniques can preprocess and enhance the data, making it cleaner and more relevant for AI models to analyse. Arm Cortex processors play a pivotal role in this augmentation. Renowned for their efficiency and performance, they are widely used in AIoT (Artificial Intelligence of Things) solutions, enabling the simultaneous execution of DSP algorithms and AI/ML models directly on edge devices. This combination allows for intelligent data processing that is both rapid and power-efficient, meeting the demands of modern technology applications.

The Necessity of DSP Algorithms

DSP algorithms are essential for transforming raw sensor data into meaningful information. Sensors often collect data that is noisy or distorted, making direct interpretation challenging. DSP algorithms tackle these issues by performing noise reduction, signal enhancement, and feature extraction.

For example, the FFT converts time-domain signals into frequency-domain representations, revealing patterns crucial for applications like vibration analysis and audio. Digital filters such as lowpass, bandpass and high-pass eliminate unwanted frequency regions, isolating signals of interest and improving data quality.

Without DSP techniques, valuable insights within sensor data might remain hidden. DSP algorithms lay the groundwork by refining the data, ensuring that both traditional analysis methods and AI/ML models receive high-quality inputs. They provide reliable results based on established mathematical principles and human reasoning, which is essential in critical applications like medical devices, aerospace, and industrial automation where precision and repeatability are paramount.

As such, it’s important to realise that preprocessing of sensor data with DSP algorithms is an essential step, since AI/ML models rely heavily on the quality of input data for accurate predictions.

Moreover, DSP algorithms are efficient and can operate in real-time on devices with limited resources, such as Arm Cortex processors, making them ideal for edge computing where real-time processing is needed.

State-of-the art AIoT microcontrollers

The Arm Cortex-M52, M55 and M85 are targeted for AIoT applications on microcontrollers. These processors use Arm’s powerful Armv8.1-M architecture that implement their M-Profile Vector Extension (MVE) technology (nicknamed Helium) allowing for 128bit vector mathematical operations (such as dot product operations) needed for ML and some DSP algorithms.

However, as only a few IC vendors (Alif, Ambiq, Samsung, Renesas, HiMax, Bestechnic, Qualcomm) have currently released or are planning to release any devices, Helium processors remain a gem for the near distant future.

The Necessity of AI/ML Models

While DSP algorithms are powerful, they are generally designed to address specific problems and may not scale well with the increasing complexity and volume of sensor data. AI/ML models come into play by offering the ability to learn from data, identify complex patterns, and make predictions without explicit programming for each task. They are particularly useful when:

Patterns are too complex for manual feature extraction: In cases like image and speech recognition, where the features of interest are not easily extracted using traditional DSP methods.
Data is high-dimensional or unstructured: AI/ML models can handle large datasets with numerous variables, finding relationships that may not be apparent using scientific reasoning.
Adaptive learning is required: ML models can be improved over time with more training data as it becomes available.

However, it is important to realise that AI/ML models lack common sense and are heavily reliant on the data they are trained on. As such, they may misinterpret or overlook important features if the input data is noisy or lacks proper pre-processing.

Augmenting DSP and AI/ML: a complementary approach

To maximize the benefits of sensor data processing, a hybrid approach that combines DSP algorithms with AI/ML models is often the most effective. Here’s how they complement each other:

Pre-processing with DSP:
- Noise Reduction: Digital filters (e.g. lowpass) can be used to clean up the signal before it reaches the ML model.
- Feature Extraction: Algorithms like FFT or DWT can extract meaningful features that reduce the dimensionality of the data and highlight important patterns.
AI/ML for Pattern Recognition:
- Classification and Regression: ML models can take the features extracted by DSP algorithms and perform tasks like anomaly detection, predictive maintenance, and classification.
- Adaptive Learning: ML models can adapt to new data trends over time, improving their accuracy and usefulness.
Feedback Mechanisms:
- Model Refinement: The outputs from AI/ML models can inform adjustments in DSP algorithms, creating a feedback loop that enhances overall system performance.

Example Application: Vibration analysis in Industrial equipment

DSP Stage:
- FFT Analysis: Converts vibration signals (usually captured from an accelerometer) from the time to frequency domain to identify characteristic frequencies associated with specific mechanical faults.
- Feature Extraction: Extracts features like peak frequencies, amplitudes, and harmonics. These amplitude features can be further scaled (using properties of the FFT) to extract velocity or displacement estimates from the original acceleration data.
AI/ML Stage:
- Fault Classification: An ML model trained on labelled data predicts the type of fault (e.g., imbalance, misalignment, bearing wear) based on the extracted features.
- Predictive Maintenance Scheduling: Regression models estimate the remaining useful life of equipment, allowing for proactive maintenance.

Benefits of augmentation:

Improved Accuracy: Pre-processing with DSP algorithms enhances the quality of data fed into AI/ML models.
Efficiency: Reduces computational load by focusing on relevant features, which is especially important for edge devices with limited resources.
Reliability: Combining deterministic DSP outputs with probabilistic AI/ML predictions leads to more robust systems.

Detecting motor faults via harmonic fingerprint analysis

Key takeaways

The fusion of DSP algorithms and AI/ML models represents a powerful paradigm for sensor data processing in modern technology. DSP algorithms provide the necessary tools for signal enhancement and feature extraction, ensuring that the data is in the best possible form for analysis. Despite lacking any common sense (see here for a previous article), AI/ML models certainly excel at finding complex patterns and making predictions based on the processed data, making them attractive for many modern AIoT applications.

Arm Cortex processors play a pivotal role in this integration, offering the computational capabilities required to run both DSP algorithms and AI/ML models efficiently on the same platform. This synergy enables the development of advanced AIoT solutions that are capable of processing sensor data intelligently at the edge, leading to faster decisions and reduced latency. This is further strengthend with Arm’s TrustZone extension, that provides developers with a hardware data security model, offering a high level of security against hacking, stealing of encryption keys and counterfeiting.

As the volume and complexity of sensor data continue to grow, leveraging the strengths of both DSP and AI/ML will be essential for advancing technology across industries. By adopting a complementary approach and utilising decent computational platforms such as Arm’s Cortex family of processors, we can build more effective, efficient, and intelligent systems that meet the demands of the future.

Author

Dr. Sanjeev Sarpal

Sanjeev is a RTEI (Real-Time Edge Intelligence) visionary and expert in signals and systems with a track record of successfully developing over 26 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT/RTEI solutions and strategies for I5.0, telemedicine, smart healthcare, smart grids and smart buildings.
View all posts

November 28, 2024/0 Comments/by Dr. Sanjeev Sarpal

An AI/ML model is not an algorithm

Algorithms Dr. Jayakumar Singaram, Dr. Sanjeev Sarpal

In modern computing, there are key concepts that define how machines process information and solve problems: Large Language Models (LLMs), algorithms, and computer programs. Each play a unique role in how tasks are performed and how intelligent systems operate.

LLMs, such as Chat-GPT, are advanced artificial intelligence models trained on massive amounts of text data to understand and generate human-like language responses. They excel at language-based tasks but rely on patterns from data rather having true intelligence based on human reasoning.

Algorithms, on the other hand, are step-by-step instructions (typically following a mathematical recipe), designed to solve specific problems or perform defined tasks. The rules or mathematical recipes that the algorithm follows have been designed by humans using reasoning and strict logic. As such, the output of the algorithm is deterministic, and can be recreate and explained by anybody following the method’s mathematical recipe or set of rules using the same input data.

Computer programs are the broader collection of code that encompasses both algorithms and models like LLMs, orchestrating various tasks by following sets of instructions. While algorithms are the building blocks for problem-solving, LLMs are specialized tools for tasks involving natural language, and programs bring these elements together to create functional software.

Understanding the differences between these three components helps clarify the architecture of modern computational systems. In this article, we discuss the differences between these terms and technologies, and provide hints and tips and a few practical examples for developers working on AIoT applications.

Programs and Algorithms: a Basic Example

A program consists of a set of instructions, often built upon one or more algorithms, to perform specific tasks based on a given input. An algorithm is a step-by-step procedure or formula for solving a problem, while a program is the implementation of that algorithm in a specific programming language.

For instance, consider a simple sorting algorithm like Bubble Sort, which can be implemented in a program:

The algorithm defines how to repeatedly compare and swap adjacent elements in a list until the list is sorted.
The program written in a language like Python or C++ implements this algorithm to sort any given list of numbers.

The key point is that a traditional program does not learn from the input or adapt its behaviour. It just follows the instructions of the algorithm every time based on the specific problem it is designed to solve.

LLMs: A Learning-Based Approach

In contrast, a Large Language Model (LLM) does not rely on predefined algorithms for specific tasks. Instead, it is trained on vast amounts of data and uses this training to predict responses based on learned patterns. For example:

If you ask an LLM to generate a recipe for pizza, it predicts the next word or sentence based on patterns it has seen in training data.
The LLM does not follow a fixed algorithm for generating recipes, but instead uses its learned understanding to predict the best response.

Unlike traditional programs, LLMs do not rely on strict rules or algorithms. They are probabilistic models that learn from a wide range of data, and their output is based on prediction rather than direct instruction.

Key Differences Between Programs and LLMs

Algorithm vs Learning: A traditional program follows strict instructions based on algorithms. LLMs, on the other hand, learn from data and use this learning to generate responses.
Fixed Output vs Prediction: In a program, the output is fixed for a given input based on the algorithm. An LLM predicts responses based on patterns, so the output can vary even with similar inputs.
No Adaptation vs Adaptation: Programs do not adapt or change their behavior unless reprogrammed. LLMs are capable of generating responses based on what they have learned, adapting to new inputs within the scope of their training.

Misconceptions about Algorithms and DLN/ML Models

Many people frequently refer to an ML model as an algorithm. This is incorrect, although the two terms are very closely related. In this section we discriminate between the two, and provide some practical examples.

Is it correct to distinguish between an ‘algorithm’ and a Deep Learning Network / ML model, as these terms are often used interchangeably but have distinct meanings?

An algorithm is a step-by-step procedure or set of rules for solving a problem, while a machine learning (ML) model is the output generated after an algorithm is applied to data during the training process. Essentially, an ML model is the learned representation or a mathematical construct based on an algorithm that can make predictions or decisions on new data.

For example, when training a neural network (which uses an algorithm like backpropagation), the result is an ML model that can classify images or recognize patterns. The algorithm guides the learning process, but the model is what performs the task after training.

What is an Algorithm?

As mentioned earlier, an algorithm is a set of rules or a mathematical recipe used to perform a specific task or to solve a problem. In ML, an algorithm is the method used to train an ML model. Examples include linear regression, decision trees, k-nearest neighbors and gradient descent.

Algorithms are very well established in the IoT sensor world for a variety of tasks, such as instrumentation and measurement, cleaning sensor data, AR (augmented reality), predictive maintenance with MEMS sensors and navigation (drones, cars and robotics). The latter makes heavy use of Kalman filtering and sensor fusion, which has been used with great success for decades.

As a simple example of an algorithm, consider the task of calculating the mean or average of set of numbers in the following dataset, \(z=[3,2,1,4,6]\). The mean can be calculated using the following mathematical recipe,

\(\displaystyle\mu = \frac{1}{5}\sum_{n=0}^{4}z(n) = 3.2\)

Note that this result is deterministic, in the sense that it can be recreated and more importantly explained by anybody following the function’s mathematical recipe using the same input data. This is very different to a ML model that would also reach the same result for the same input dataset, but as discussed in the next section, explaining how the model reached the result remains an enigma.

What is a DLN/ML Model?

An ML model is the resulting output or predicted result after training an algorithm on a various datasets. It typically uses various feature extraction algorithms (e.g. mean, standard deviation and correlation) during the training period in order to extract features of interest for the ML model. The resulting model represents the learned patterns, parameters, or rules that can be used to make predictions on new data.

A key point to realise here, is that unlike algorithms based on predefined rules and mathematical concepts, how the ML model reaches its result remains an enigma, and is the primary reason why they shouldn’t be allowed to operate without any scrutiny on critical processes. As such, AI systems are energy constrained Boltzmann machine models, as the model is trained on data.

In many AIoT applications, Kalman based sensor fusion is typically used for feeding the ML model with high quality features of the underlying process, thus significantly improving the accuracy of the AI system.

How Algorithms and DLN/ML Models Interact

A model provides the capability to make decisions based on input data. It can recognize patterns, make predictions, and adapt to new information. Essentially, a model simulates cognitive functions that are typically associated with human thinking, such as dealing with ambiguity and uncertainty, but as discussed in a previous article, AI does not have any common sense, as it has no understanding of the underlying data or process that it is modelling.

On the other hand, an algorithm is a set of defined instructions or a mathematical recipe. It is a rules based step-by-step procedure used for calculations, data processing, and automated reasoning tasks. Algorithms are the backbone of software and can solve a wide range of problems by following their defined logic.

However, not all functions are computable. This means that there are certain problems for which no algorithm can be formulated to provide a solution. These are referred to as non-computable functions. In such cases, even the most advanced algorithms cannot determine an outcome, highlighting a fundamental limitation in computational theory.

Human Intelligence and Digital Intelligence

In the field of computation, it is essential to differentiate between traditional algorithms and machine learning models. An algorithm is a direct output of human intelligence, crafted through logical reasoning and problem-solving techniques. It represents a set of predefined instructions designed to solve specific problems. The human mind formulates these steps to ensure a consistent and accurate outcome.

In contrast, a trained machine learning (ML) model is the product of digital intelligence. While algorithms underpin the model’s structure, the true power of an ML model arises through its capacity to learn and adapt from new training data. This process involves iteratively adjusting parameters to optimize performance in tasks like prediction, classification, or decision-making. In this sense, the model evolves beyond its initial algorithmic foundation, generating insights and results that may not be directly encoded by human logic.

“An algorithm is a direct manifestation of human intelligence, designed through logic, reasoning, and problem-solving techniques. On the other hand, a trained machine learning model represents the outcome of digital intelligence, which evolves through the iterative processing of data.”

The convergence of these two forms of intelligence—human and digital—marks a significant shift in computational systems. Algorithms, though foundational, are static and require manual updates. Machine learning models, by contrast, learn from experience, dynamically evolving with each new piece of training data. This shift positions ML models as more flexible and adaptive tools for solving complex problems where human-defined rules may fall short.

The distinction between human-driven algorithms and data-driven machine learning models emphasizes the growing role of adaptive systems in areas such as autonomous driving, personalized medicine, and financial forecasting. As machine learning continues to evolve, the boundaries between explicit programming and emergent behavior will continue to blur, paving the way for systems capable of independent learning and decision-making.

Low-Pass Filter and CNN for Classifying Periodic Signals

Both a Low-Pass Filter (LPF) and a Convolutional Neural Network (CNN) can be employed to handle periodic signals, but their approaches and purposes differ fundamentally.

Low-Pass Filter (LPF)

A Low-Pass Filter is an algorithm designed to attenuate the high-frequency components of a signal while allowing the low-frequency components to pass. Its primary use is to filter or clean a signal rather than classify it. Applications of the LPF in AIoT, include removing glitches from sensor data or even cleaning up noise on a measured periodic signal prior to feature extraction and subsequent ML classification, leading to higher accuracy.

A practical IIR (infinite impulse response) digital filter used in both AIoT and IoT may be defined in terms of a finite number of poles \(p\) and zeros \(q\), as defined by the linear constant coefficient difference equation,

\(\displaystyle y(n)=\sum_{k=0}^{q}b_k x(n-k)-\sum_{k=1}^{p}a_ky(n-k) \)

where, \(a_k\) and \(b_k\) are the filter’s denominator and numerator polynomial coefficients, who’s roots are equal to the filter’s poles and zeros respectively. LPF filter can used for all types of signals, not just periodic signals. However, for this article we limit the discussion to periodic signals.

Limitations for Classification

While an LPF can enhance a periodic signal by reducing high-frequency noise, it does not classify the signal. It merely transforms the input based on fixed mathematical operations, with no ability to learn from data or adapt its behaviour.

Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a machine learning model designed to recognize patterns in data by learning from training examples. It can be trained to classify periodic signals by learning distinctive features in the signal’s structure.

Operation

The CNN applies a series of convolution operations:

\(\displaystyle S(i,j) = (I * K)(i,j) = \sum_m \sum_n I(m,n) K(i-m, j-n) \)

where \(I\) is the input signal, \(K\)is the kernel, and \(S(i,j)\) is the resulting feature map.

Classification

Unlike the LPF, the CNN is capable of learning to classify different periodic signals through training. The learned filters allow the network to distinguish between signals based on the periodic features it identifies.

Extraction vs Learned feature

Low-Pass Filter: Performs a deterministic operation that modifies the signal but cannot classify it.
CNN: Learns from data and can classify periodic signals by recognizing their features.

In conclusion, while a Low-Pass Filter may assist in signal preprocessing, a CNN is required for the task of classifying signals.

Adaptive Low-Pass Filters

An adaptive low-pass filter (LPF), such as those based on the Least Mean Squares (LMS) algorithm, introduces several key features and benefits compared to a traditional, static LPF:

Dynamic Adaptability: Adaptive LPFs adjust their characteristics in response to variations in the input signal, allowing for real-time filtering of noise or unwanted frequencies, especially in non-stationary signals.
Error Minimization: These filters utilize a feedback mechanism to minimize the difference (error) between the desired output and the actual output. The filter coefficients are continuously updated based on this error, enhancing the filter’s adaptability to changing signal conditions.
Improved Performance in Noisy Environments: Adaptive LPFs effectively reduce noise by optimizing signal quality, which is particularly valuable in applications like audio processing, telecommunications, and biomedical signal processing where signal characteristics can fluctuate.
Applications in Real-Time Systems: The adaptability of these filters makes them suitable for real-time systems, such as echo cancellation in telecommunication, where the noise characteristics may vary dynamically, ensuring consistent performance over time.
Computational Complexity: While adaptive filters provide significant advantages, they also come with increased computational complexity due to the need for constant updates to the filter coefficients, which can be a concern in systems with limited processing capabilities.

In summary, using an adaptive LPF enhances the filter’s ability to handle varying signal conditions effectively, making it particularly valuable in applications requiring real-time signal processing, thus improving overall performance and robustness against noise and interference.

Adaptive low-pass filter (LPF) differs significantly from a traditional LPF in terms of feature extraction and learning capabilities.

Feature Extraction vs. Feature Learning

Traditional LPF: This filter focuses on extracting specific frequency components from a signal by applying fixed coefficients determined by the filter design, which remain constant during operation. As a result, it extracts features based on pre-defined criteria.
Adaptive LPF: Utilizes algorithms like the Least Mean Squares (LMS) to adjust its filter coefficients in real-time based on the input signal characteristics. This enables the adaptive LPF to extract features that dynamically correspond to changing signal conditions, but it does not learn features in the same manner as a neural network.

Comparison with CNNs

Convolutional Neural Networks (CNNs): CNNs are designed to learn features from data through multiple layers, allowing them to automatically extract high-level features from raw inputs. Unlike traditional LPFs, CNNs perform feature learning, adapting to the input data through training on labeled datasets.
While adaptive LPFs adjust their response based on signal changes, they do not perform feature learning like CNNs. They can optimize their filter characteristics based on feedback but lack the hierarchical feature learning approach present in CNNs.

Adaptive LPFs can extract features based on the immediate conditions of the signal; however, they do not ‘learn’ features in the same way that CNNs do. Instead, adaptive LPFs optimize the extraction process in real-time, making them effective in environments where signal characteristics vary.

Comparison of Adaptive Low-Pass Filters and Convolutional Neural Networks

Adaptive low-pass filters (LPFs), such as those using the Least Mean Squares (LMS) algorithm, exhibit several similarities with convolutional neural networks (CNNs) regarding their operational principles and learning mechanisms.

Adaptive Coefficients: Adaptive LPFs modify their coefficients based on the input signal, similar to how CNNs adjust their weights during training to minimize loss on a dataset.
Supervised Learning: Both systems can be trained using labeled data to optimize performance. Adaptive filters adjust based on real-time feedback while CNNs learn complex patterns through multiple iterations.
Feature Extraction: Adaptive LPFs extract relevant features dynamically, while CNNs automatically learn to identify hierarchical features through their architecture.
Learning Methodology: Adaptive LPFs adjust their parameters based on incoming data but do not learn complex representations as CNNs do. CNNs can learn multiple levels of abstraction through backpropagation.
Structure and Complexity: CNNs consist of multiple layers, allowing them to learn intricate patterns, whereas adaptive LPFs typically operate with a single, simpler structure focused on modifying coefficients.

Items 1,2 and 3 are similar, but item 4 and 5 are different.

While adaptive LPFs and CNNs share similarities in their adaptive behaviors and feature extraction capabilities, they fundamentally differ in methodologies and complexities. Adaptive LPFs do not fully replicate the intricate learning capabilities of CNNs, though both aim to improve task performance through adaptation.

Comparison of Adaptive LPFs and CNNs

Order of the Filter: The order of an adaptive low-pass filter (LPF) determines its ability to capture and process complex signal characteristics. A higher-order filter can approximate a more complex frequency response, allowing it to better handle diverse signal patterns, similar to how a deeper convolutional neural network (CNN) can learn more complex representations.
Learning Capabilities: While both CNNs and adaptive LPFs adjust their parameters based on input, CNNs inherently possess a more advanced learning capability through multiple layers, each designed to extract different levels of abstraction from the data. This allows CNNs to learn hierarchical feature representations effectively. In contrast, increasing the order of an adaptive LPF can enhance its feature extraction capabilities, but it still lacks the sophisticated learning mechanisms that CNNs implement, such as backpropagation and convolutional operations.
Complex Features: CNNs excel in extracting spatial hierarchies in data (e.g., images) by applying filters across multiple layers, progressively identifying edges, shapes, and more abstract features. Adaptive LPFs, when designed with a higher order, can capture complex signal behaviours, but their ability to generalize or learn from large datasets is limited compared to CNNs.

While increasing the order of an adaptive LPF can enhance its performance in signal processing, it does not equate to the deep learning capabilities of CNNs. CNNs utilize their layered architecture to learn complex features in a more robust and generalized manner, making them more suitable for tasks like image recognition and classification.

Parameter Estimation

Parameter estimation plays a crucial role in both traditional algorithmic processes and machine learning. It involves determining the best parameters for a given model based on observed data.

Algorithmic Parameter Estimation

In traditional algorithmic contexts, parameter estimation involves using specific algorithms to find optimal parameters for mathematical models. Key methods include:

Least Squares Estimation (LSE)

This method minimizes the sum of the squared differences between observed and predicted values. The parameter estimation is given by:

\(\displaystyle\hat{\theta} = \arg \min_{\theta} \sum_{i=1}^n (y_i – f(x_i; \theta))^2 \)

where \(\hat{\theta}\) denotes the estimated parameters. this concept is is central to Kalman filtering, whereby a state-space model of the process to be modelled uses the state estimates (i.e. the parameters of interest) to perform the prediction. The Kalman update equations attempt to minimise the error between the model output and the observed data in a least squares sense on a sample-by-sample basis.

Maximum Likelihood Estimation (MLE)

MLE estimates parameters by maximizing the likelihood function, which reflects the probability of the observed data under the model parameters:

\(\displaystyle\hat{\theta} = \arg \max_{\theta} L(\theta; \text{data}) \)

where \(L(\theta; \text{data})\) represents the likelihood function.

Parameter Estimation in Machine Learning

In machine learning, parameter estimation is integral to model training and involves iterative optimization techniques. Examples include:

Training Neural Networks

Parameters such as weights and biases are estimated using gradient-based optimization methods, typically through:

\(\displaystyle\theta_{n+1} = \theta_n – \alpha \nabla_{\theta} L(\theta_n)\)

where \(\theta_n\) represents the parameters at iteration \(n\), \(\alpha\) is the learning rate, and \(L(\theta)\) is the loss function.

Bayesian Parameter Estimation

In Bayesian methods, parameters are estimated based on posterior distributions that combine prior beliefs with observed data:

\(\displaystyle p(\theta | \text{data}) \propto p(\text{data} | \theta) \cdot p(\theta)\)

where \(p(\theta | \text{data})\) is the posterior distribution.

In both traditional algorithms and machine learning contexts, the aim is to find the optimal parameters that best fit the model to the observed data.

Key Takeaways

Many people frequently refer to an ML model as an algorithm. This is incorrect, although the two terms are very closely related. An algorithm is a direct output of human intelligence, crafted through logical reasoning and problem-solving techniques. It represents a set of predefined instructions or a mathematical recipe designed to solve specific problems. The human mind formulates these steps to ensure a consistent and accurate outcome. In contrast, a trained machine learning (ML) model is the product of digital intelligence that uses algorithms and datasets to construct an ML model.

A key takeaway is that algorithms are based on predefined rules and mathematical concepts, whereas AI systems are energy constrained Boltzmann machine models, as the model is trained on data. As such, how an ML model reaches its result remains an enigma, and is the primary reason why they shouldn’t be allowed to operate without any scrutiny on critical processes.

Authors

Dr. Sanjeev Sarpal

Sanjeev is a RTEI (Real-Time Edge Intelligence) visionary and expert in signals and systems with a track record of successfully developing over 26 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT/RTEI solutions and strategies for I5.0, telemedicine, smart healthcare, smart grids and smart buildings.
View all posts
Dr. Jayakumar Singaram

Jayakumar is an Arm ambassador and seasoned expert in semiconductor technology and AIoT. He advices companies such as Mistral Solutions, SunPlus Software, and Apollo Tyres at the strategic level on their AIoT solutions. He successfully founded Epigon Media Technologies, which focuses on Research and Development for the global market, and is also the co-author of the book "Deep Learning Networks: Design, Development, and Deployment."
View all posts

October 25, 2024/0 Comments/by Dr. Jayakumar Singaram, Dr. Sanjeev Sarpal

Deploying Real-Time Edge Intelligence solutions with Arm processor technology

Algorithms Dr. Sanjeev Sarpal

In the rapidly evolving landscape of digital transformation, organizations are increasingly leveraging Real-Time Edge Intelligence (RTEI) solutions to enhance operational efficiency and decision-making capabilities. RTEI refers to the deployment of advanced data processing and analytics at the edge of the network (i.e. closer to where data is generated) rather than relying solely on centralized cloud infrastructure. This approach successfully addresses the challenges posed by traditional data processing methods and offers significant benefits across multiple sectors, particularly when building solutions with Arm processor technology.

Key concepts of Real-Time Edge Intelligence

Improved Response Times: RTEI enables immediate data processing and analysis, resulting in faster decision-making. For industries like healthcare, manufacturing, and transportation, this can mean the difference between success and failure in critical situations. Arm processors allow for high-performance computing in compact form factors, making them ideal for real-time applications.
DSP/ML at the Edge: Arm’s extensive ecosystem of partner solutions and in-built algorithmic accelerator technology makes deploying DSP algorithms and ML models on the edge very easy. This enables RTEI solutions to provide real-time insights and predictions of the process that they’re monitoring, empowering organizations to automate processes and respond dynamically to changing conditions.
Data cleaning and feature extraction: Arm-based devices can clean noisy sensor data and extract features interest at the edge, sending only relevant data to the cloud. This minimizes bandwidth usage and optimizes network performance, ensuring that only critical data is transmitted. Arm’s low-power architecture is ideal for this task, allowing devices to perform complex computations in battery-powered applications.
Cost Efficiency: By reducing the amount of data sent to the cloud, organizations can lower bandwidth costs and cloud storage expenses. The efficient processing capabilities of Arm processors allow for more effective resource use, leading to operational cost savings. Their energy efficiency further contributes to reduced operational costs in large-scale deployments.
Increased Reliability: RTEI solutions can operate independently of cloud connectivity, ensuring that essential mission-critical applications continue to function even during a network outage. The robustness of Arm technology in various environmental conditions enhances system reliability and operational resilience, particularly in remote locations, typically encountered in many IoT applications.
Scalability: Arm-based solutions can be easily scaled to accommodate growing data volumes and an increasing number of connected devices. The modularity of Arm architecture supports the development of a diverse ecosystem of devices, making it easier for organizations to adapt to changing business needs.

Enhanced security and privacy with Arm TrustZone

Security is a critical concern for edge devices, particularly those handling sensitive data. Arm TrustZone (Cortex-M33, Cortex-M52 and Cortex-A) implements a security paradigm that discriminates between the running and access of untrusted applications running in a Rich Execution Environment (REE) and trusted applications (TAs) running in a secure Trusted Execution Environment (TEE). The basic idea behind a TEE is that all TAs and associated data are secure as they are completely isolated from the REE and its applications. As such, this security model provides a high level of security against hacking, stealing of encryption keys, and counterfeiting, and as such provides an elegant way of protecting sensitive client information.

DSP support for Algorithms

DSP is critical for many RTEI applications, including audio and video processing, sensor signal processing and data analysis. Arm’s broad range of processors offer extensive DSP capabilities, allowing for the implementation of complex algorithms in floating-point. The Cortex-M family dominates the low-power micro-controller market as described below, whereas the more powerful Application or Cortex-A processors target mini-computers, such as the Raspberry Pi and smartphones etc. The Cortex-R family targets real-time safety-critical applications, such as automotive and radar.

All three types of processors offer algorithmic support, but the Cortex-M family is particularly interesting, as it adds DSP functionality to low-power microcontroller devices making it highly desirable for the IoT market, as we now discuss in the following section.

Cortex-M processors

Although a few processor technologies exist for microcontrollers (e.g. RISC-V, Xtensa, MIPS), over 90% of the microcontrollers used in the smart product market are powered by so-called Arm Cortex-M processors that offer a combination of high algorithmic performance, low-power and security. The Arm Cortex-M4 is a very popular choice with several silicon vendors (including ST, TI, NXP, ADI, Nordic, Microchip, Renesas), as it offers DSP (digital signal processing) functionality traditionally found in more expensive devices and is low-power.

Acceleration of DSP calculations

The Armv7E-M architecture supports a DSP extension that implements an SIMD (single instruction, multiple data) architecture extension that can significantly improve the performance of an algorithm. The basic idea behind SIMD involves parallel execution of an instruction (e.g. Add, Subtract, Multiply, Divide, Abs etc) on multiple data elements via the use of 64 or 128-bit registers. These DSP extension intrinsics (SIMD optimised instruction) support a variety of data types, such as integers, floating and fixed-point.

The high efficiency of the Arm compiler allows for the automatic dissemination of your C code in order to break it up into SIMD intrinsics, so explicit definition of any DSP extension intrinsics in your code is usually unnecessary. The net result for your application is much faster code, leading to better power consumption and for wearables, better battery life.

What algorithmic operations would use this?

The following examples give an idea of operations that can be significantly speeded up with SIMD intrinsics:

vadd can be used to expedite the calculation of a dataset’s mean. Typical applications include average temperature/humidity readings over a week, or even removing the DC offset from a dataset.
vsub can be used to expedite numerical differentiation in peak finding for a sinewave tracking application.
vabs can be used for expediting the calculation of an envelope of a fullwave rectified signal in EMG biomedical and smartgrid applications.
vmul can be used for windowing a frame of data prior to FFT analysis. This is also useful in audio applications using the overlap-and-add method.

The hardware floating point unit is very good for expediting MAC (multiply and accumulate) operations used in digital filtering, requiring just three cycles to complete. Other DSP operations such as add, subtract, multiply and divide require just one cycle to complete.

Key takeaways

As organizations continue to embrace digital transformation, Real-Time Edge Intelligence (RTEI) solutions, particularly when integrated with Arm processor technology, stand out as key enablers of innovation and efficiency. By harnessing the power of edge computing and the performance advantages of Arm’s Cortex-A and Cortex-M architectures, the security benefits of Arm TrustZone, and the DSP capabilities for advanced algorithms, businesses can achieve rapid decision-making, enhance security, and optimize operational costs. The future of data processing lies at the edge, and those who adopt RTEI solutions powered by Arm technology will be well-positioned to thrive in an increasingly competitive landscape.

Author

Dr. Sanjeev Sarpal

Sanjeev is a RTEI (Real-Time Edge Intelligence) visionary and expert in signals and systems with a track record of successfully developing over 26 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT/RTEI solutions and strategies for I5.0, telemedicine, smart healthcare, smart grids and smart buildings.
View all posts

October 9, 2024/0 Comments/by Dr. Sanjeev Sarpal