Posts

AI (Artificial Intelligence) has its roots with the famous mathematician Alan Turing, who was the first known person to conduct substantial research into the field that he referred to as machine intelligence. Turing’s work was published as Artificial Intelligence and was formally categorised as an academic discipline in 1956. In the years following, work undertaken at IBM by Arthur Samuel led to the term Machine Learning, and the field was born.

In terms of definitions: AI is an umbrella term, whereas ML (Machine Learning) is a more specific subset of AI focused on producing inference using trained networks. During training, the dataset plays a key role in ML quality during inference. AI provides scope for ML and Deep Learning. In fact, Deep Learning Networks use amazing Transformer models for the current generation AI world.

  • AI is the overarching field focused on creating intelligent systems, whereas ML is a subset of AI that involves creating models to learn from data and make decisions.
  • ML is crucial for IoT because it enables efficient data analysis, predictive maintenance, smart automation, anomaly detection, and personalized user experiences, all of which are essential for maximizing the value and effectiveness of IoT deployments.

The difference between AI and ML in a nutshell

  • Artificial Intelligence (AI):
    • Definition: AI is a broad field of computer science focused on creating systems capable of performing tasks that normally require human intelligence. These tasks include reasoning, learning, problem-solving, perception, and language understanding.
    • Scope: Encompasses a wide range of technologies and methodologies, including machine learning, robotics, natural language processing, and more.
    • Example Applications: Voice assistants (e.g., Siri, Alexa), autonomous vehicles, game-playing agents (e.g., AlphaGo), and expert systems.
  • Machine Learning (ML):
    • Definition: ML is a subset of AI that involves the development of algorithms and statistical models that enable computers to learn from and make predictions or decisions based on data.
    • Scope: Focused specifically on creating models that can identify patterns in data and improve their performance over time without being explicitly programmed for specific tasks.
    • Example Applications: Spam detection, image recognition, recommendation systems (e.g., Netflix, Amazon), predictive maintenance of critical machinery and identifying medical conditions, such as heart arrhythmias and tracking vital life signs – the so called IoMT (Internet of Medical Things).

Why We Need ML for IoT

  • Data Analysis:
    • Massive Data: IoT devices generate a vast amount of data. ML is essential for analyzing this data to extract meaningful insights, detect patterns, and make informed decisions.
    • Real-Time Processing: ML models can process and analyze data in real-time, enabling immediate responses to changes in the environment, which is crucial for applications like autonomous vehicles and smart grids. They are also an invaluable tool for monitoring human well-being, such as tracking vital life signs, and checking motion sensor data for falls and epileptic fits in elderly and vulnerable persons.
  • Automation:
    • Smart Automation: ML enables IoT devices to automate complex tasks that require decision-making capabilities, such as adjusting climate control systems in smart buildings based on occupancy patterns.
    • Adaptability: ML models can adapt to changing conditions and improve their performance over time, leading to more efficient and effective automation.
  • Personalization:
    • User Experience: ML can analyze user preferences and behaviors to personalize experiences, such as recommending products, adjusting device settings, or providing personalized health insights from wearable devices.
    • Enhanced Interaction: Improves the interaction between users and IoT devices by making them more intuitive and responsive to individual needs.

What is AIoT exactly?

IoT nodes or edge devices use convolutional neural networks (CNN) or neural networks (NN) to perform inference on data collected locally. These devices can include cameras, microphones, or UAV-based sensors. By having the ability to perform inference locally on IoT devices, it enables intelligent communication or interaction with these devices. Other devices involved in the interaction can also be IoT devices, human users, or AIoT devices. This creates opportunities for AIoT (Artificial Intelligence of Things) rather than just IoT, as it facilitates more advanced and intelligent interactions between devices and humans.

  1. AIoT is interaction with another AIoT. In this case, there is a need for artificial intelligence on both sides to have a meaningful interaction.
  2. AIoT is interaction with IoT. In this case, there is a need for artificial intelligence on one side, and no AI on the other side. Thus, it is not a good and safe configuration for deployment.
  3. AIoT is interaction with a human. In this case, there is a need for artificial intelligence on one side and a human on the other side. This is a good configuration because the volume of data from the device to the human will be less.

Human Also in the Loop is a Thing of the Past

Historically, humans have used sensor devices, now referred to as IoT edges or nodes, to perform measurements before making decisions based on a particular set of data. In this process, both humans and IoT edge devices participate. Interaction between one IoT edge and another is common, but typically within restricted applications or well-defined subsystems. With the rise of AI technologies, such as ChatGPT and Watsonx, AI-enabled IoT devices are increasingly interacting with other IoT devices that also incorporate AI. This interaction is prevalent in advanced driver-assistance systems (ADAS) with Level 5 autonomy in vehicles. In earlier terms, this concept was known as Self-Organizing Networks or Cognitive Systems.

The interaction between two AIoT systems introduces new challenges in sensor fusion. For instance, the classic Byzantine Generals Problem has evolved into the Brooks-Iyengar Algorithms, which use interval measurements instead of point measurements to address Byzantine issues. This sensor fusion problem is closely related to the collaborative filtering problem. In this context, sensors must reach a consensus on given data from a group of sensors rather than relying on a single sensor. Traditionally, M measurements with N samples per measurement produce one outcome by averaging data over intervals and across sensor measurements.

Sensor fusion involves integrating data from multiple sensors to obtain more accurate and reliable information than what is possible with individual sensors. By rethinking this problem through the lens of collaborative filtering—an approach widely used in recommendation systems—we can uncover innovative solutions. In this analogy, sensors are akin to users, measurements are comparable to ratings, and the environmental parameters being measured are analogous to items. The goal is to achieve a consensus measurement, similar to how collaborative filtering aims to predict user preferences by aggregating various inputs. Applying collaborative filtering techniques to sensor fusion offers several advantages. Matrix factorization can reveal underlying patterns in the sensor data, handling noise and missing data effectively. Neighborhood-based methods leverage the similarity between sensors to weigh their contributions, enhancing measurement accuracy.

Probabilistic models, such as Bayesian approaches, provide a robust framework for managing uncertainty. By adopting these methods, we can improve the robustness, scalability, and flexibility of sensor fusion, paving the way for more precise and dependable applications in autonomous vehicles, smart cities, and environmental monitoring.

Kalman filtering and collaborative filtering represent two distinct approaches to processing sensor data, each with unique strengths and applications.

Kalman filtering is a recursive algorithm used for estimating the state of a dynamic system from noisy observed measurement data. It excels in real-time applications, offering a mathematically rigorous method of statistically estimating and predicting a model’s state estimates (i.e. a model’s parameters) using a known model of the system’s dynamics and statistical noise characteristics. However, it is important to note that although the ‘Kalman solution’ is optimum in a statistical sense, it may yield incorrect state estimates in a absolute deterministic sense.

In contrast, collaborative filtering, typically used in recommendation systems, aggregates data from multiple sensors (or users) to identify patterns and similarities. This approach doesn’t rely on a predefined model of system dynamics but instead leverages historical data to improve accuracy. Collaborative filtering is particularly effective when dealing with large datasets from multiple sensors, making it suitable for applications where the relationships between sensors can be learned and exploited.

Both methods can enhance sensor data reliability, but their effectiveness depends on the context: Kalman filtering for dynamic, real-time systems with welldefined models, and collaborative filtering for complex, multi-sensor environments where data-driven insights are crucial.

In our AIoT work, we implement Collaborative Filtering across multiple M sensors or AIoT edges to achieve consensus on a measured value over a specified interval. Then use a Restricted Boltzmann Machine (RBM) model for collaborative filtering. Additionally, we deploy and run these types of models within a network of IoT edge devices. This approach leverages the distributed computing capabilities of IoT edges to enhance the performance and scalability of our collaborative filtering solution.

The integration of Collaborative Filtering algorithms with CMSIS (Cortex Microcontroller Software Interface Standard) on Arm devices presents a significant advancement in leveraging edge computing for intelligent decision-making. Collaborative Filtering, commonly used in recommendation systems, can be enhanced on Arm Cortex-M processors by utilizing the CMSIS-DSP library. This combination allows for efficient signal processing and data analysis directly on microcontroller-based systems, enabling real-time and power-efficient computations. This approach can be particularly powerful in IoT applications, where Arm devices often operate. By implementing Restricted Boltzmann Machines (RBM) using CMSIS, devices can process and analyze sensor data locally, reducing latency and bandwidth usage. This local computation capability can lead to more responsive and intelligent IoT systems, paving the way for advanced applications in smart environments, healthcare, and personalized user experiences.

Signal Processing on the IoT edge

The objective is to measure the signal \(x_n\) for a duration of \(T\) seconds with a sampling rate \(F_s\). The samples collected during that duration \(T\) are \(r_n=1,2,\ldots N\) samples. These measurements are performed \(M\) times repeatedly. Since there are \(M\) sets of \(x_n\) samples of the signal, the revised objective is to find a representative of these \(M\) sets of samples. Let \(\tilde{x}_n \) be the above-mentioned representative.

Let \(y_m(n) = x(n) + v_m(n) \), where \( v_m(n)\) is the measurement noise during the \(m\)-th measurement.

By performing \(M\) measurements, is it possible to

  • Improve the Signal to Noise Ratio (SNR)?
  • Estimate \(x_n\) using Maximum Likelihood and achieve better performance as per the Cramer-Rao bound?
  • Use a priori information about the source that created \(x_n\) and estimate \(x_n\) using a Bayesian network?

To reduce noise and obtain a more accurate representation of the output signal, multiple measurements of \(y(n)\) are taken over time: \(y_1(n), y_2(n), \ldots, y_M(n) \).

The averaged output signal \(\overline{y(n)} \) is calculated as the mean of these measurements:


\(\displaystyle\overline{y(n)} = \frac{1}{M} \sum_{i=0}^{M-1} y_i(n)\)

Consider a smart thermostat system in a home (part of an AIoT system). The thermostat measures the room temperature \(y(n)\) and adjusts the heating or cooling based on the desired setpoint \(u(n)\).

The following averaging measurement might not yield results that overcome the bounds defined by the Cramer-Rao bound:

\(y_m(n) = x(n) + v_m(n)\)

where \(v_m(n)\) is the measurement noise during the \(m\)-th measurement.

In this context, \(y_m(n) \) represents the noisy measurements of the signal \(x(n)\). Averaging these measurements can reduce the noise variance, but it does not necessarily surpass the theoretical lower bounds on the variance of unbiased estimators, as defined by the Cramer-Rao bound. The Cramer-Rao bound provides a fundamental limit on the precision with which a parameter can be estimated from noisy observations.

  • System Description: The thermostat system is represented by \(H(z)\), which controls the heating/cooling based on the input \(u(n)\). The output signal \(y(n)\) represents the measured room temperature.
  • Multiple Time Measurements: The thermostat takes temperature measurements every minute, producing a set of outputs \(y_1(n), y_2(n), \ldots, y_M(n)\).
  • Averaging: To get a more accurate representation of the room temperature and to filter out noise (e.g., transient changes due to opening a door), the thermostat averages these measurements: \(\overline{y(n)} = \frac{1}{M} \sum_{i=0}^{M-1} y_i(n)\). By averaging the noisy output values \(y_i(n)\), the thermostat system can make more stable and accurate adjustments, leading to a more comfortable and energy-efficient environment.
  • Latency: One annoying situation that occurs by the averaging operation, is that it increases the system’s latency, i.e. the smoothed output temperature value lags the observed noisy temperature value taken at time n. This delay is referred to as latency or Group delay in digital filters, and must also be taken into account when designing a closed loop control system. The subject of minimising latency in digital filters can fill a whole book in itself, but suffice to say, IIR digital filters generally have lower latency than FIR filters counterparts. The Moving average filter described herein can be considered as a special case of the FIR filter, as all filter coefficients are equal to one.
       In order to improve matters, Minimum phase filters (also referred to as zero-latency filters) may be used to overcome the inherent \(N/2\) latency (group delay) in a linear phase FIR filter, by moving any zeros outside of the unit circle to their conjugate reciprocal locations inside the unit circle. The result of this ‘zero flipping operation’ is that the magnitude spectrum will be identical to the original filter, and the phase will be nonlinear, but most importantly the latency will be reduced from \(N/2\) to something much smaller (although non-constant), making it suitable for real-time control applications where IIR filters are typically employed.

AI Model in Signal Processing

In signal processing, where signals are sensed by sensors, statistical parameterized models, Bayesian networks, and energy models play crucial roles. Statistical parameterized models help in estimating signal parameters efficiently, providing a structured approach to model signal behavior. Bayesian networks offer a probabilistic framework to infer and predict signal characteristics, accommodating uncertainties inherent in sensor data. Energy models, such as those utilizing MCMC with Contrastive Divergence, optimize the representation of signal data by minimizing energy functions, leading to improved signal reconstruction. Similarly, energy models via Restricted Boltzmann Machines and Backpropagation facilitate learning complex signal patterns, enhancing the accuracy of signal interpretation and noise reduction. Together, these models enable robust analysis and processing of signals, crucial for applications like noise reduction, signal enhancement, and feature extraction.

The Cramer-Rao bound (CRB) provides a lower bound on the variance of unbiased estimators, indicating the best possible accuracy one can achieve when estimating parameters from noisy data. This bound applies to traditional estimation methods under certain assumptions, such as unbiasedness and a specific noise model.

MCMC does not directly ‘overcome’ the Cramer-Rao bound, it provides a framework for obtaining parameter estimates that can be more accurate and robust in practice, especially in complex and high-dimensional settings. This improved performance arises from the ability to use prior information, handle complex models, and perform Bayesian inference. Markov Chain Monte Carlo (MCMC) methods, however, are used primarily for sampling from complex probability distributions and performing Bayesian inference. While MCMC methods themselves do not directly ‘overcome’ the CramerRao bound in a traditional sense, they offer advantages in estimation that may be interpreted as achieving better practical performance under certain conditions:

Individual Models

Each model will have ts own bias and variance characteristics. High-capacity models may fit the training data well (low bias) but may perform poorly on new data (high variance). Low-capacity models may underfit the training data (high bias) but have more stable predictions (low variance).

Averaging Models (Ensembles)

By combining the outputs of multiple models, ensemble methods aim to reduce the overall variance. This results in more robust predictions compared to individual models, particularly when the individual models have high variance.

Combining many models seems promising for some applications. When the model capacity is low, it’s difficult to capture the regularities in the data. Conversely, if the model capacity is too large, it may overfit the training data. By using multiple models, such as in AIoT where models can be sensor-centric or device-centric, better results can be achieved compared to using a single huge model.

  • High-capacity models tend to have low bias but high variance.
  • Averaging models reduces variance, leading to more stable predictions.
  • Bias remains unchanged by averaging, so it’s essential to use models with appropriately low bias.
  • The ensemble approach can outperform individual models by leveraging the strengths of multiple models, especially in scenarios like AIoT, where combining sensor-centric or device-centric models can lead to improved results.

In some cases, an individual predictor may perform better compared to a combined predictor. However, if individual predictors disagree significantly, then the combined predictor can perform well.

AIoT system building blocks

An essential pre-building block in any AIoT system is the feature extraction algorithm. The challenge for any feature extraction algorithm is to extract and enhance any relevant sensor data features in noisy or undesirable circumstances and then pass them onto the ML model in order to provide an accurate classification. The concept is illustrated below:

As seen above, an AIoT system may actually contain multiple feature blocks per sensor and in some cases fuse the features locally before sending them onto the ML model for classification such that the system may then draw a conclusion. The challenge is therefore how to capture sensor data for training and design suitable algorithms to extract features of interest?

The challenge is actually two fold: namely how to capture the datasets for analysis and then which algorithms to use for Feature engineering.

Although a few commercial solutions are available (e.g. Node-RED, Labview, Mathworks Instrumentation toolbox), the latter two are expensive for most developers who just require simple data capture/logging via the UART. One possible solution is Arm’s SDS Framework that provides developers with a set of tools for capturing and playback real-world data using Arm Virtual Hardware. Where, the captured SDS data files can be subsequently converted into a single CSV file for use in 3rd party applications for algorithm development. Unfortunately, the SDS framework is primarily aimed at Arm SoC developers and not particularly suitable for developers working with EVMs/kits.
  Therefore, most developers use web tools based on AutoML (eg. Qeexo) that will assist with the data capture from hardware (eg. from an ST Nucleo board) and then try an automate the ML modelling process by choosing a set of limited feature extraction algorithms (such as mean, median, standard deviation, kurtosis etc) and then try and produce a suitable classification model. In theory, this sounds great, but there are a number of problems with this approach, as performance is dependent on the quality and relevance of datasets. Our experience has shown that the best performance can be obtained from knowledge of the physical process, and by designing Feature extraction algorithms using scientific principles tailored to the process that you are trying to model.

Example: Feature Engineering for human fall detection

A common requirement of most IoMT biomedical wearable products is detecting Human fall detection with a smartwatch, just using accelerometer data. Traditional fall detection algorithms using MEMS sensors are based on the ‘Falling’ concept, whereby all three axes fall close to zero for a second or so. Although this works well for falling objects, such as a cup or box falling from a table, it is not suitable for humans. The challenge is illustrated below:

As seen, a human’s fall is very different to a box or other object falling.

The challenge is discriminating between normal everyday activities and falls. By analysing datasets of net acceleration data of typical everyday activities, such as someone walking, using their smartphone, brushing their teeth or doing some morning exercises, and fall data it is not always easy to discriminate between the two using ’standard’ statistical features.

Therefore, we need to apply some physics to the process that we’re trying to model in order to derive specific features from the sensor data, so that we can make a classification – i.e. is it a fall, or not.

Analysing the diagram we see that there are actually 4 phases from where the person is standing through to the point of the person lying on the ground. So the big question is how do we go about modelling these phases just using accelerometer data? This is best analysed by breaking the fall up into phases:

  • Happy: where the subject is upright and going about their daily business.
  • Falling: Depending on the subject, this period can be very short (around 100ms) and manifests itself very differently to an object falling directly down (i.e. freefall). The net acceleration will usually manifest itself as a negative gradient starting from about 1g tending towards zero, as the body’s centre of gravity changes. This usually lasts for about 60-100ms.
  • Impact: this is the primary event to detect, as any impact from a standing posture with a hard surface will produce a large shock pulse that is several orders of magnitude >1g over a short period.
  • Inactivity: this usually follows impact with the ground, whereby the subject is lying flat and is motionless for several seconds. In the case of a collision with an object (e.g. a piece of furniture or a door) or as a result of a severe medical condition, such as a stroke or heart attack, the subject may become unconscious. In this case, the system should be able to discriminate between inactivity from normal movements, such as hand or slight limb movement and light movement (caused by breathing) and decide whether to alert medical services. In the case that no movement is detected, i.e. the subject may have died as a result of the fall, there is no need to provide swift medical assistance.

Armed with this knowledge we can now use Feature engineering to design our features. This forms the essence of building features based on understanding of the physical process.

What tools and processor technology are available?

Although a few processor technologies exist for microcontrollers (e.g. RISC-V, Xtensa, MIPS), over 90% of the microcontrollers used in the smart product market are powered by so-called Arm Cortex-M processors. These are split up into various market segments, depending on energy requirements and algorithmic performance.

The low-end cores, such as the M0, M0+ and M3 are good for simpler algorithms, such as sensor cleaning filters, simple analytics as they have limited memory and no hardware FP support. To give you an idea of performance, for those of you who own a Fitbit, this is based on the M3 processor.

However, the biggest plus (especially for the M0 family), is that they can have very low power footprint making them an ideal choice for coin cell battery powered wearable applications, as devices can be made to run for months and even in some cases up to a year.

For developers looking for decent computational performance, the M4F is an excellent choice as it has hardware FP support, which is ideal for rapid application development of algorithms. In fact, the Arm Cortex-M4 is a very popular choice with several silicon vendors (including ST, TI, NXP, ADI, Nordic, Microchip, Renesas), as it offers DSP (digital signal processing) functionality traditionally found in more expensive devices and is low-power.

If you need more your application needs more computational performance, then the M7 is an excellent choice, where some devices even offer H/W double precision floating point support, which is ideal for audio enhancement and biomedical algorithms.

For those of you looking for hardware security, then the M33 is a good choice, as it implements Arm TrustZone security architecture, as well as having the computational performance of the M4.

State-of-the art AIoT microcontrollers

Released in 2020, the Arm Cortex-M55 processor and its bigger brother the Cortex-M85 are targeted for AIoT applications on microcontrollers. These processors use Arm’s powerful Armv8.1-M architecture that implement their M-Profile Vector Extension (MVE) technology (nicknamed Helium) allowing for 128bit vector mathematical operations (such as dot product operations) needed for ML and some DSP algorithms.

In November 2023, Arm announced the release of the Cortex-M52 processor for AIoT applications. This processor looks to replace the older M33 processor, as it combines Helium technology with Arm TrustZone technology. However, as only a few IC vendors (Alif, Ambiq, Samsung, Renesas, HiMax, Bestechnic, Qualcomm) have currently released or are planning to release any devices, Helium processors remain a gem for the future.

Toolchains

Arm provides developers with extensive easy-to-use tooling and tried and tested software libraries. The Arm’s CMSIS-DSP and CMSIS-NN frameworks for algorithm development and machine learning (ML) are two very popular examples that are open source and are used internationally by tens of thousands of developers.

The Arm-CMSIS framework solutions are further strengthened by Arm partners ASN and Qeexo who provide developers with easy-to-use real-time filtering, feature extraction (ASN Filter Designer) and ML tooling (Qeexo AutoML) and reference designs, expediting the development of IoT applications, including industrial, audio and biomedical. These solutions have been optimised for Arm processors with the help of Arm’s architecture experts and insider knowledge of compiler workings.

Deployment of Deep Learning Networks to the IoT Edge

Deploying a trained model onto an Edge device requires meticulous attention and effort. Fortunately, there are many tools available to help developers achieve this, such as Qeexo AutoML and the DLtrain toolset. The latter offers robust support for developers working with Arm processor-based boards with Android platforms. DLtrain utilizes the Android NDK (native development kit) to deploy neural networks (NN) or convolutional neural networks (CNN) in the Linux kernel of the Android platform. The deployed components include JNI options to support applications developed in Java, bridging the gap between low-level implementation and high-level application development. Find out more here.

Deploying deep learning (DL) networks on Arm cores of Android platforms involves integrating these networks into the Linux kernel via the Android NDK. While application development is primarily done in Java, DL networks receive input from the Android layer (SDK) and efficiently perform inference. The results are then passed back to the Java side via the Java Native Interface (JNI). The following list describes the layers involved in performing inference on an Android device:

  1. Top Layer: User Interface
  2. Second Layer: Java
  3. Third Layer: Android SDK
  4. Fourth Layer: Arm
  5. Bottom Layer: GPU

This hierarchical structure ensures that the user interface seamlessly interacts with underlying DL networks, optimizing performance and maintaining an efficient workflow from input to inference to output.

Key takeways

AI is an umbrella term focused on creating intelligent systems, whereas ML is a subset of AI that involves creating models to learn from data and make decisions. ML is crucial for IoT because it enables efficient data analysis, predictive maintenance, smart automation, anomaly detection, and personalized user experiences, all of which are essential for maximizing the value and effectiveness of IoT deployments.

Arm and its rich ecosystem of partners provide IoT developers with extensive easy-to-use tooling and tried and tested software libraries for designing an implementing IoT algorithms for their smart products. Arm Cortex-MxF processors expedite RAD by virtue of their ease of use and hardware floating-point support, and modern semiconductor technology ensures low-power profiles making the technology an excellent fit for IoT/AIoT mobile/wearables applications.

Authors

  • Dr. Jayakumar Singaram

    Jayakumar is a seasoned expert in semiconductor technology and AIoT. He advices companies such as Mistral Solutions, SunPlus Software, and Apollo Tyres at the strategic level on their AIoT solutions. He successfully founded Epigon Media Technologies, which focuses on Research and Development for the global market, and is also the co-author of the book "Deep Learning Networks: Design, Development, and Deployment."

    View all posts
  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

Many IoT applications use a sinewave for estimating the amplitude of an entity of interest – some examples include:

  • Measuring material fatigue/strain with a loadcell – in vehicle and bridge/building applications measuring material fatigue and strain is essential for safety. An AC sinusoidal excitation overcomes the difficulty of dealing with instrumentation electronics DC offsets.
  • Calibrating CT (current transformers) sensors channels – a sinusoid of known amplitude is applied to channel input and the output amplitude is measured.
  • Measuring gas concentration in infra-red gas sensors – the resulting sinusoid’s amplitude is used to provide an estimate of gas concentration.
  • Measuring harmonic amplitudes in power quality smart grids applications – in 50/60Hz  power systems, certain harmonic amplitudes are of interest.
  • ECG biomedical compliance testing – channel compliance with IEC regulations needed for FDA testing typically uses a set of sinewaves at known amplitudes, to ensure that the channel amplitude error is within specification.

In a previous article, we discussed how differentiation could be used to find the peaks and troughs of sinewave, i.e. finding the zero crossing points. However, a much more traditional approach has been to use fullwave rectification, whereby a non-linear operator and lowpass filtering are employed. The concept used is described below:

  1. Remove any DC or low-frequency offsets via a highpass filter.
  2. Apply a non-linear operator via an abs() or sqr() non-linear operator.
  3. Lowpass filter the result to obtain an estimate of the sinusoid’s amplitude.
  4. Scale the amplitude.

Although this sounds easy, care should be taken to understand the effects of how the non-linear operator alters the waveform and affects the estimation of amplitude using lowpass filtering.

IoT application

A typical IoT application using a sinusoid is shown below:

As seen, the waveform can be modelled as:

\(x\left(n\right)=A\,sin\left(2\pi f_ot\right)+B\)

Where, \(f_o\)  is the frequency of oscillation and \(A\) is the amplitude of sinusoid respectively. Notice that the sinusoid is non-linear and symmetrical around the offset, \(B\). Notice also that it has a peak-to-peak amplitude of \(2A\), since specifying an amplitude \(A\) results in a bipolar amplitude of \(±A\). As many microcontrollers employ low-cost unipolar ADCs, the bipolar sinusoid needs to be offset by a DC offset, \(B\) (usually achieved by a resistor network) to ensure that the signal remains within the common-mode range of the ADC input.

As mentioned above, before applying the non-linear operator any DC offsets need to be removed. This can easily be achieved with either an IIR or FIR highpass filter. If using an IIR filter, it should be noted that the filter’s phase and group delay (latency) will significantly increase at the cut-off frequency, so a degree of experimentation is required to find a good trade-off.

After highpass filtering the data, we can apply the non-linear operator. Two popular operators are the abs()and sqr() operators.

Using the abs() operator, the Fourier series of \(\left|A \,sin(2\pi f_ot)\right|\) is shown below:

\(\left|A\ sin(2\pi f_ot)\right|\ =\ \displaystyle A\left[ \frac{2}{\pi}\ -\ \displaystyle\frac{4}{\pi}\normalsize{\sum\limits_{k=1}^{\infty}}\frac{cos(4k\pi f_ot)}{4k^2-1}\right]\)

Analysing the equation, it can be seen that the abs() operation doubles the frequency and that the DC component is actually \(\frac{2A}{\pi}\), as illustrated below.

As seen, lowpass filtering this result in its current form will produce an amplitude estimate of \(\frac{2A}{\pi}\) (dashed red line), which is clearly incorrect for estimating the sinewave’s amplitude, \(A\). However, this can be simply remedied by scaling the amplitude estimates by \(\frac{\pi}{2}\), which removes the bias, leaving the sinewave’s amplitude, \(A\).

Likewise, for a sqr() operator, we can define the resulting waveform using trigonometrical identities, i.e.

\({sin^2(2\pi f_ot)}\ =\ A\left[\displaystyle\frac{1\ -\ cos(4\pi f_ot)}{2}\right]\)

Lowpass filtering this signal requires a correction scaling factor of 2.  

Lowpass filter

Although any lowpass filter will suffice, the moving average filter is used by most developers by virtue of its computational simplicity and noise reduction characteristics. A more detailed explanation of moving average filters can be found here.

A 24th order moving average filter with a post gain of \(\frac{\pi}{2}\) or 1.571 is shown below.

Applying this moving average filter to a sinewave \(f_o=10Hz, A=0.5\), sampled at 500Hz processed with the abs()operator we obtain the following:

As seen, the amplitude estimation of the sinusoid using a lowpass filter and the \(\frac{\pi}{2}\) scaling factor is now correct. However, for real world applications that contain noise, it is considered to be more accurate to measure the RMS amplitude, in which case the scaling factor becomes \(\frac{\pi}{2\sqrt 2} \).

Note that these scaling factors are only valid for sinusoidal scaling. If your waveform is non-sinusoidal (e.g. triangular or square or affected by harmonics) another scaling factor/method will be required, as discussed below.

True RMS

In practice, many sinusoidal waveforms will be affected by harmonics (e.g. smart grid power systems) which will alter the shape of the main sinusoid and offset the RMS estimate using the \(\frac{\pi}{2\sqrt 2} \) scaling factor concept.

A much better method is to calculate the True RMS, whereby the sqr() operator is used for the full wave rectification, but this time a sqrt() function is used for scaling after the lowpass operation. The results of the two methods are shown below, where it can be seen that the True RMS method correctly estimates the signal’s RMS amplitude.

Author

  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

Over the decades Microsoft has produced various programming languages (e.g. C#, F#, Visual Basic) and flavours of their .NET paradigm, such as the very popular 4.x .NET Framework.  

C# is very popular among Windows developers, as it is based on objected orientated programming concepts together with a few other pearls, such as automatic memory management (something that had to be handled manually in C++), in an attempt to provide many of the advantages of C++ in a much more simplified way.

These pros combined with the .NET Framework’s extensive charting library, UI toolbox, Networking, ML engine libraries and general ease of use in Microsoft Visual Studio was one of the primary reasons that many Windows-based scientific programs (e.g. ASN Filter Designer) were developed using the C# language and .NET Framework.

Over the last few years, Microsoft has tried to consolidate their .NET solution into a single, fast, cross-platform solution with the advent of the .NET Core paradigm. .NET Core is an open-source framework for developing Windows, web applications, services, and mobile applications and it can be run on Windows, Mac and Linux. Microsoft contends that .NET Core is much faster than the .NET Framework since the architecture has been completely rewritten to make it modular, lightweight, fast and cross-platform compatible.

DSP filtering library for C# .NET applications

For many modern scientific applications digital filtering of datasets is just one of the many components needed for an algorithm. This could be extracting ECG and PPG biomedical features, cancelling powerline interference or removing the noise from a dataset. In many cases, it is desirable to perform the filtering in real-time on streaming data as part of an algorithm updating dashboard analytics.

ASN Filter Designer provides .NET developers with an automatic code generator and SDK for developing high-performance data filtering applications in C#.

The tool’s project wizard bundles all of the necessary SDK framework files needed to run a designed filter cascade without the need for any other dependencies or 3rd party plugins. The deployed code is fully compatible with Microsoft Visual Studio and all .NET versions.

A complete overview of the SDK is available here.

 

 

Download demo now

Licencing information

Author

  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

Advances in telemedicine healthcare products over the past decades have been truly miraculous with ingenious little devices invented by start-ups as well as by larger corporations, .e.g Apple’s smart watch and the Fitbit. These advancements have been facilitated by the availability of low-cost microcontrollers offering algorithmic functionality, allowing developers to implement wearables with excellent battery life and edge based real-time data analysis.

Over 90% of the microcontrollers used in the smart product market are powered by so called Arm Cortex-M processors that offer a combination of high algorithmic performance, low-power and security. The Arm Cortex-M4 is a very popular choice with hundreds of silicon vendors (including ST, TI, NXP, ADI, Nordic, Microchip, Renesas), as it offers DSP (digital signal processing) functionality traditionally found in more expensive devices and is low-power. Arm and its rich eco system of partners provide developers with easy-to-use tooling and tried and tested software libraries, such as the CMSIS-DSP and CMSIS-NN frameworks for algorithm development and machine learning.

The choice is vast, and can be very confusing. Therefore, here are some practical hints and tips for both managers and developers to help you decide which Arm Cortex-M processor is best for your biomedical product.

Which Arm Cortex-M processor do I choose for my biomedical application?

The Arm Cortex-M0+ processor is an ultra-low power 32-bit processor designed for very low-cost IoT applications, such as simple wearable devices. The low price point is comparable with equivalent 8-bit devices, but with 32-bit performance. Microcontrollers built around the M0+ processor provide developers with excellent battery life (months to years), a rich peripheral set and a basic amount of connectivity and computational performance. The latter means that only simple algorithms can be implemented, such as algorithms for correcting baseline wander and minimizing the effects of motion artefacts using accelerometer data via an adaptive filter, such as the NLMS algorithm. Although for PPG pulse rate measurement applications, the sampling rate is typically 50Hz, leaving the processor plenty of time to perform various simpler algorithmic operations, such as digital filtering and zero-crossing detection.

For high performance PPG applications, sampling rates in the order of 500Hz are typically used. These types of applications usually look at more biomedical features, such as identifying the Systolic and Diastolic phases and finding the Dicrotic notch using feature extraction algorithms and ML models. These extra functionalities provide a significant strain on the processor’s abilities, and as such are beyond the abilities of the M0+.

The Cortex-M3 is a step up from the M0+, offering better computational performance but with less power efficiency. The extra processing power, rich hardware peripheral set for connecting other sensors and connectivity options makes the M3 a very good choice for developers looking to develop slightly more advanced wearable products, such as the Fitbit device that is based on ST’s low-power STM32L series of microcontrollers.

High performance wearables and beyond

The Arm Cortex-M4 processor and its more powerful bigger brother the Cortex-M7 are highly-efficient embedded processors designed for IoT applications that require decent real-time signal processing performance and memory. Depending on the flavour of the processor, the M4F/M7F processors implement DSP hardware accelerated instructions, as well hardware floating point support. This lends itself to the efficient implementation of much more computationally intensive biomedical DSP and ML algorithms needed for more advanced telemedicine products.

The hardware floating point support unit expedites RAD (rapid application development), as algorithms and functions developed in Matlab or Python can be ported to C for implementation without the need for a lengthy data arithmetic quantisation analysis. Microcontrollers based on the M4F or M7F, usually offer many of the hardware peripheral and connectivity advantages of the M3, providing developers with a very powerful, low power development platform for their telemedicine application.

The Arm Cortex-M33 is a step up from the M4 focusing on algorithms and hardware security via Arm’s TrustZone technology and memory-protection units. The Cortex-M33 processor attempts to achieve an optimal blend between real-time algorithmic performance, energy efficiency and system security.

State-of-the art AI microcontrollers

Released in 2020, the Arm Cortex-M55 processor and its bigger brother the Cortex-M85 are targeted for AI applications on microcontrollers. These processors feature Arm’s Helium vector processing technology, bringing energy-efficient digital signal processing (DSP) and machine learning (ML) capabilities to the Cortex-M family. In November 2023, Arm announced the release of Cortex-M52 processor for IoT applications. This processor looks to replace the older M33 processor, as it combines Helium technology with Arm TrustZone technology.

Although the IP for these processors is available for licencing, only a few IC vendors have developed a microcontroller, e.g. Samsung’s Exynos W920 SoC that has been specifically designed for the wearables market. The SoC packs two Arm Cortex-A55 processors, and the Arm Mali-G68 GPU using state-of-the art 5nm semiconductor technology. The chipset also features a dedicated low-power Cortex-M55 display processor for handling AoD (Always-on Display) tasks – although a little over the top for simple wearable devices, the Exynos processor family certainly seems like an excellent choice for building next generation AI capable low-power wearable products.

So, which one do I choose?

The compromise for biomedical product developers when choosing an M4, M7 or M33 based microcontroller over an M3 device usually comes down to a trade-off between algorithmic performance, security requirements and battery life. If good battery life and simple algorithms are key, then M3 devices are a good choice. However, if more computationally intensive analysis algorithms are required (such as ML models), then the M4 or M7 should be used.

As mentioned earlier, the Armv7E-M architecture used in M4/M7 processors supports a DSP extension that implements an SIMD (single instruction, multiple data) architecture extension that can significantly improve the performance of an algorithm. The hardware floating point unit is very good for expediting MAC (multiply and accumulate) operations used in digital filtering, requiring just three cycles to complete. Other DSP operations such as add, subtract, multiply and divide require just one cycle to complete.

The M7 out performs its M4 little brother by offering approximately twice the computational performance and some devices even offer hardware double precision floating point support which make M4/M7 processors attractive for high accuracy algorithms needed for medical analysis.

If data security is paramount, for example protecting and securing transferring patient data to a cloud service, then the M33 or the M52 (when avalaible) are good choices. These devices also offer a high level of protection against tampering and running of authorised code via TrustZone’s trusted execution environment.

Some IC vendors now offer hybrid micro-controllers that implement multi-processors on chip, such as ST’s ST32Wx family that combine the M0+ and M4 in order to get the advantages of each processor and maximise battery life. 

Finally, advances in semiconductor technology means that a modern M4F processor produced with 40nm process technology may match or even surpass the energy efficiency of an M3 produced with 90nm technology from several years ago. As such, higher performance processors that were until several years too costly and energy inefficient for low-cost wearables products are rapidly becoming a viable solution to this exciting marketplace.

Author

  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

We live in a time where wearable/mobile products comprised of sensors, apps, AI and IoT (AIoT) technology are part of everyday life. Every year we hear about amazing advances in processor technology and AI algorithms for all aspects of life from industrial automation to futuristic biomedical products.

For developers, the requirement to design low-cost products with better battery life, higher computational performance and analytical accuracy, requires access to a suite of affordable processor technology, algorithmic libraries, design tooling and support.

This article aims to provide developers with an overview of all salient points required for algorithm implementation on Arm Cortex-M processors.

Can you give me a concrete example?

Almost all IoT sensor applications require some level of signal processing to enhance data and extract features of interest. This could be temperature, humidity, gas, current, voltage, audio/sound, accelerometer data or even biomedical data.  

Consider the following application for gas concentration measurement from an Infra-red gas sensor. The requirement is to determine the amplitude of the sinusoid in order to get an estimate of gas concentration – where the bigger amplitude is the higher the gas concentration will be.

Analysing the figure, it can be seen that the sinusoid is corrupted with measurement noise (shown in blue), and any estimate based on the blue signal will have a high degree of uncertainty about it – which is not very useful for getting an accurate reading of gas concentration!

After cleaning the sinusoid with a digital filter (red line), we obtain a much more accurate and usable signal for our gas concentration estimation challenge. But how do we obtain the amplitude?

Knowing that the gradient at the peaks is zero, a relativity easy and robust way of finding the peaks of the sinusoid is via numerical differentiation, i.e. computing the difference between sample values and then looking for the zero-crossing points in the differentiated data. Armed with the positions and amplitudes of the peaks, we can take the average and easily obtain the amplitude and frequency.  Notice that any DC offsets and low-frequency baseline wander will be removed via the differentiation operation.

This is just a simple example of how to extract the properties of a sinusoid in real-time using various algorithmic IP blocks. There are of course a number of other methods that may be used, such as complex filters (analytic signals), Kalman filters and the FFT (Fast Fourier Transform).

Arm Cortex-M processor technology

Although a few processor technologies exist for microcontrollers (e.g. RISC-V, Xtensa, MIPS), over 90% of the microcontrollers used in the smart product market are powered by so-called Arm Cortex-M processors that offer a combination of high algorithmic performance, low-power and security. The Arm Cortex-M4 is a very popular choice with several silicon vendors (including ST, TI, NXP, ADI, Nordic, Microchip, Renesas), as it offers DSP (digital signal processing) functionality traditionally found in more expensive devices and is low-power.

Algorithmic libraries and support

An obvious hurdle for many developers is how to port their algorithmic concept or methods from Python/Matlab into embedded C for real-time operation? This is easier said than done, as many software engineers are not well-versed in understanding the mathematical concepts needed to implement algorithms. This is further complicated by the challenge of how to implement algorithms developed by researchers that are not interested/experienced in developing real-time embedded applications.

A possible solution offered by the Mathworks (Embedded Coder) automatically translates Matlab algorithms and functions into C for Arm processors, but its high price tag and steep learning curve make it unattractive for many.

That being said, Arm and its rich ecosystem of partners provide developers with extensive easy-to-use tooling and tried and tested software libraries. Arm’s CMSIS-DSP and CMSIS-NN frameworks for algorithm development and machine learning (ML) are two very popular examples that are open source and are used internationally by tens of thousands of developers.

The Arm CMSIS-DSP software framework is particularly interesting as it provides IoT developers with a rich collection of fast mathematical and vector functions, interpolation functions, digital filtering (FIR/IIR) and adaptive filtering (LMS) functions, motor control functions (e.g. PID controller), complex math functions and supports various data types, including fixed and floating point. The important point to make here is that all of these functions have been optimised for Arm Cortex-M processors, allowing you to focus on your application rather than worrying about optimisation.  

The Arm-CMSIS framework solutions are strengthened by Arm partners ASN and Qeexo who provide developers with easy-to-use real-time filtering, feature extraction and ML tooling (AutoML) and reference designs, expediting the development of IoT applications, including industrial, audio and biomedical. These solutions have been optimised for Arm processors with the help of Arm’s architecture experts and insider knowledge of compiler workings.

A benchmark of ASN’s floating point application-specific DSP filtering library versus Arm’s CMSIS-DSP library is shown below for three types of Arm cores.

Framework Benchmarks: lower number of clock cycles means higher performance.

As seen, the performance of the ASN library is slightly faster by virtue of the application-specific nature of the implementation. The C code is automatically generated from the ASN Filter Designer tool.

Cortex-M4 and Cortex-M7

The Arm Cortex-M4 processor and its more powerful bigger brother the Cortex-M7 are highly-efficient embedded processors designed for IoT applications that require decent real-time signal processing performance and memory.

Both the Cortex-M4 and M7 core benefit from the Armv7E-M architecture that offers additional DSP extensions. Depending on the flavour of the processor, the M4F/M7F processors implement DSP hardware accelerated instructions (SIMD), as well as hardware floating point support via an FPU (floating point unit), giving them a significant performance boost over the Cortex-M3. The ‘F’ suffix signifies that the device has an FPU.

This lends itself to the efficient implementation of much more computationally intensive DSP and ML algorithms needed for more advanced IoT products and real-time control applications requiring highly deterministic operations.

Microcontrollers based on the M4F or M7F, usually offer many of the hardware peripheral and connectivity advantages of the simpler M3, providing developers with a very powerful, low-power development platform for their IoT application. The Cortex-M7F typically offers much higher performance than its Cortex-M4F little brother, doubling the performance on FFT, digital filters and other critical algorithms.

Floating point or fixed point?

The hardware floating point support unit expedites RAD (rapid application development), as algorithms and functions developed in Matlab or Python can be ported to C for implementation without the need for a lengthy data arithmetic quantisation analysis. Although floating point comes with its own problems, such as numeric swamping, whereby adding a large number to a small number ignores the smaller component. This can become troublesome in digital filtering applications using the standard Direct Form structure. It is for this reason that all floating-point filters should be implemented using the Direct Form Transposed structure, as discussed in the following article.

Correctly designing and implementing these tricks requires specialist knowledge of signal processing and C programming, which may not always be available within an organisation. This becomes even more frustrating when implementing new algorithms and concepts, where the effects of the arithmetic are yet to be determined.

Single vs double precision floating point

For a majority of IoT applications single precision (32-bit) floating point arithmetic will be sufficient, providing approximately 7 significant digits of precision. Double precision (64-bit) floating point provides approximately 15 significant digits of precision, but in truth should only be used in applications that require more than 7 significant digits of precision. Some examples include: FFT based noise cancellation, CIC correction filters and Rogowski coil compensation filters. 

Some Cortex-M7F’s (e.g. STM32F769) implement a Double precision FPU providing an extra performance boost to high numerical accuracy IoT applications.

Fixed point

Fixed point is not necessarily less accurate than floating point, but requires much more quantisation analysis, which becomes tricky for signals with a wide dynamic range. As with floating point careful analysis is required, as weird effects can appear due to the level of quantisation used, leading to unreliable behaviour if not properly investigated. It is this challenge that can slow down a development cycle significantly, in some cases taking months to validate a new algorithm.

Many developers have traditionally considered devices without an FPU (e.g., Cortex-M0/M3) as the best choice for low-power battery applications. However, when comparing a modern Cortex-M7 device manufactured using 40nm semiconductor process technology, to that of a ten-year-old Cortex-M3 using 180nm process technology, the Cortex-M7 device will likely have a lower power profile.

Acceleration of DSP calculations

The Armv7E-M architecture supports a DSP extension that implements an SIMD (single instruction, multiple data) architecture extension that can significantly improve the performance of an algorithm. The basic idea behind SIMD involves parallel execution of an instruction (eg. Add, Subtract, Multiply, Divide, Abs etc) on multiple data elements via the use of 64 or 128-bit registers. These DSP extension intrinsics (SIMD optimised instruction) support a variety of data types, such as integers, floating and fixed-point.

The high efficiency of the Arm compiler allows for the automatic dissemination of your C code in order to break it up into SIMD intrinsics, so explicit definition of any DSP extension intrinsics in your code is usually unnecessary. The net result for your application is much faster code, leading to better power consumption and for wearables, better battery life.

What algorithmic operations would use this?

The following examples give an idea of operations that can be significantly speeded up with SIMD intrinsics:

  • vadd can be used to expedite the calculation of a dataset’s mean. Typical applications include average temperature/humidity readings over a week, or even removing the DC offset from a dataset.
  • vsub can be used to expedite numerical differentiation in peak finding, as discussed in the example above.
  • vabs can be used for expediting the calculation of an envelope of a fullwave rectified signal in EMG biomedical and smartgrid applications.
  • vmul can be used for windowing a frame of data prior to FFT analysis. This is also useful in audio applications using the overlap-and-add method.

The hardware floating point unit is very good for expediting MAC (multiply and accumulate) operations used in digital filtering, requiring just three cycles to complete. Other DSP operations such as add, subtract, multiply and divide require just one cycle to complete.

Combining DSP, low-power and security: The Cortex-M33

The Arm Cortex-M33 is based on the Armv8-M architecture and is a step up from the Cortex-M4 focusing on algorithms and hardware security via Arm’s TrustZone technology and memory-protection units. The Cortex-M33 processor attempts to achieve an optimal blend between real-time algorithmic performance, energy efficiency and system security.

TrustZone technology

Arm TrustZone implements a security paradigm that discriminates between the running and access of untrusted applications running in a Rich Execution Environment (REE) and trusted applications (TAs) running in a secure Trusted Execution Environment (TEE).  The basic idea behind a TEE is that all TAs and associated data are secure as they are completely isolated from the REE and its applications.  As such, this security model provides a high level of security against hacking, stealing of encryption keys, counterfeiting, and provides an elegant way of protecting sensitive client information.

State-of-the art AI microcontrollers

Released in 2020, the Arm Cortex-M55 processor and its bigger brother the Cortex-M85 are targeted for AI applications on microcontrollers. These processors feature Arm’s new Helium vector processing technology based on the Armv8.1-M architecture that brings significant performance improvements to DSP and ML applications. However, as only a few IC vendors (Alif, Samsung, Renesas, HiMax, Bestechnic, Qualcomm) have currently released or are planning to release any devices, Helium processors remain a gem for the future. 

Key takeaways

Arm and its rich ecosystem of partners provide IoT developers with extensive easy-to-use tooling and tried and tested software libraries for designing an implementing IoT algorithms for their smart products. Arm Cortex-MxF processors expedite RAD by virtue of their ease of use and hardware floating-point support, and modern semiconductor technology ensures low-power profiles making the technology an excellent fit for IoT/AIoT mobile/wearables applications.

Author

  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

Recent research suggests that ECG wearables devices (such as smart watches) are now medically suitable for providing predictive insights into serious heart conditions such as atrial fibrillation (A-Fib). These advancements have been facilitated by the availability of low-cost microcontrollers offering algorithmic functionality, allowing developers to implement wearables with excellent battery life and edge-based real-time data analysis.

Although the international research community has produced many innovative high-performance ECG and PPG biomedical algorithms, these are unfortunately limited to offline clinical analysis in Matlab or Python. As such, very little emphasis has been placed on building commercial real-time wearables algorithms on microcontrollers, leading manufacturers to conduct the research themselves and to design suitable candidates. 

This is further complicated by the requirement of manufacturers on how they will implement a developed algorithm in real-time on a low-cost microcontroller and still achieve decent battery life.

Arm Cortex-M microcontrollers

Over 90% of the microcontrollers used in the smart product market are powered by so-called Arm Cortex-M processors that offer a combination of high algorithmic performance, low-power and security. The Arm Cortex-M4 is a very popular choice with hundreds of silicon vendors (including ST, TI, NXP, ADI, Nordic, Microchip, Renesas), as it offers DSP (digital signal processing) functionality traditionally found in more expensive devices and is low-power.

The Cortex-M4F device offers floating point support, helping with RAD (rapid application development) as designs can be easily ported from Matlab/Python to C without the need of performing a detailed quantisation arithmetic analysis. As such, a design cycle can be cut from months to weeks, offering organisations a significant cost saving.

Arm and its rich ecosystem of partners provide developers with easy-to-use tooling and tried and tested software libraries, such as the CMSIS-DSP and CMSIS-NN frameworks and ASN’s DSP filtering library for algorithm development and machine learning.

FDA compliance

The AHA (American Heart Association) provides developers with guidelines for developing FDA-compliant ECG monitoring products. These are broken down into the following three categories: 

  1. Diagnostic: 0.05Hz -150Hz
  2. Ambulatory (wearables): 0.67Hz – 40Hz
  3. ST segment: 0.05Hz

The ECG measurements must be FDA compliant with IEC 60601-2 2-47 standards for ambulatory ECG, but what are the criteria and challenges?

Challenges with ECG/PPG measurements

Modelling the QRS complex found in ECG data is extremely difficult, as to date there is no concrete model available.  This is further complicated by the variety of ECG data depending on the position of the lead on the patient’s body and illnesses. The following list summaries the typical challenges faced by algorithm developers:

  1. Accurate baseline wander (BLW) removal remains one of the most challenging topics in ECG analysis.
  2. The BLW must be removed for accurate clinical analysis.
  3. BLW manifests itself as low-frequency ‘wander’ (typically <0.5Hz) from EMG and torso movement.
  4. QRS width widening and amplitude distortion due to filtering invalidates clinical analysis.
  5. Reducing EMG and measurement noise without altering the temporal biomedical relationships of the ECG signal.
  6. 50/60Hz powerline interference can swamp the ECG signal – this is primarily attributed to pickup by the long high impedance measurement cables. This is typically problematic for extended bandwidth wearable applications that go beyond 40Hz.
  7. Glitches, sudden movement and poor sensor contact with the skin: This is related to BLW, but usually manifests itself as abrupt glitches in the ECG measurement data. The correction algorithm must discriminate between these undesirable events and normal behaviour.
  8. IEC 60601-2 2-47 frequency response specifications:
    • Bandwidth: 0.67Hz – 40Hz.
    • Passband ripple: < ±0.5dB
    • Maximum ±10% amplitude error: most biomedical SoCs make use of a Sigma-Delta ADC, leading to amplitude droop.

Shortcomings with ECG/PPG algorithms

A mentioned in the previous section, much research has been conducted over the years with mixed results. The main shortcomings of these methods are summarised below:

  1. Computationally heavy: most algorithms have been designed for research in Matlab and not for real-time, e.g. wavelets have excellent performance but have high computational cost, leading to poor battery life and the need for an expensive processor.
  2. Large latency and warping: digital filtering chain introduces large latency, computational cost and can warp the characteristics of the biomedical features.
  3. Overlapping frequencies: there are many examples of unwanted noise overlapping the delicate ECG data, hence the popularity of time-frequency analysis, such as wavelets.
  4. Mixed results regarding BLW removal: spline removal is excellent, but it has high computational cost and has the added difficulty of finding good correction points between the QRS complexes. Linear phase FIR filtering is a good compromise but has very high computational cost (typically >1000 filter coefficients) due to the high sampling rate to cut-off frequency ratio. Non-linear phase IIR filter has low computational cost, but warps the ECG features, and is therefore unsuitable for clinical analysis.
  5. AI based kernel filters: ‘black box filter’ based on massive training data. Moderate implementation cost with performance dependent on the variety of training data, leading to unpredictable results in some cases.
  6. PPG analysis: has the added difficulty of eliminating motion from the measurement data, such as when walking or running. Although a range of tentative algorithms has been proposed by various researchers using accelerometer measurement data to correct the PPG data, very few commercial solutions are currently based on this technology.

It would seem that ECG and PPG analysis has some major obstacles to overcome, especially when considering how to deploy the algorithms on low-power microcontrollers.

The future: ASN’s real-time RCF algorithm and Advanced Analytics

Together with cardiologists from Medisch Spectrum Twente, ASN’s advanced analytics team developed the RCF (retrospective collaborative filtering) algorithm that uses time-frequency analysis to enhance the ECG data in real-time.

The essence of RCF algorithm centres around a highly optimised set of polynomial cleaning filters with different frequency characteristics that are applied to different segments of the QRS complex for enhancement. This has some synergy with wavelets, but it does not suffer from the computational burden associated with wavelet analysis.

The polynomial filters are peak preserving, meaning that they preserve the delicate biomedical peaks while smoothing out the unwanted noise/ripple. The polynomial fitting operation also overcomes the challenge of overlapping frequency content, as data within a specified region is smoothed out by the relevant filter.

RCF is further strengthened by the BLW killer IP block that implements a highly computationally efficient linear phase 0.67Hz highpass filter. The net effect is an FDA-compliant signal chain suitable for clinical analysis. The complete signal chain is extremely computationally efficient, and as such is suitable for Arm’s popular M3 and M4 Cortex-M processor families.

Real-time ECG feature extraction

The ECG waveform can be split up into segments, where each wave or segment represents a certain event in the cardiac cycle, as shown below:

As seen, the biomedical features are designated P, Q, R, S, T that define points in time within the cardiac cycle. The RCF algorithm is further strengthened with our state-of-the-art AAE (Advanced Analytics Engine) that automatically cleans and find these features for clinical analysis.

AAE supported analytics

  1. P-wave duration
  2. PR interval
  3. QRS duration
  4. QT duration (Bazett algorithm used for QTc)
  5. HR (RR interval)
  6. HRV (rMSSD algorithm used)

Armed with the real-time features, an ML model can be trained and provide valuable insights into patient health running on an edge processor inside a wearable device.

A-Fib

Atrial fibrillation (A-Fib) is the most frequent cardiac arrhythmia, affecting millions of people worldwide. An arrhythmia is when the heart beats too slowly, too fast, or in an irregular way. Signs of A-Fib are an irregular beating pattern and no p-waves. Our AAE provides developers with all of the relevant features needed to build an ML model for robust A-Fib detection.

Let us help you build your product

By combining advanced low-power processor technology, advanced mathematical algorithmic concepts and medical knowledge, our solution provides developers with an easy way of building wearable products for medical use. The high accuracy of our Advanced Analytics Engine (AAE) has been verified by cardiologists, and can be used with an additional ML model or standalone to provide people with valuable insights into potentially fatal health conditions, such as A-Fib without the need for an expensive medical examination at a hospital.

ASN’s ECG algorithmic solutions are ideal for building next generation ECG and PPG wearable products on Arm Cortex-M microcontrollers (e.g. STM32F4, MAX32660) and bio-sensor SoCs (MAX86150). These algorithms can be easily used with industry standard biomedical AFEs, such as: MAX30003, AFE4500 and AFE4950.

Please contact us for more information and to arrange an evaluation.

Author

  • Dr. Sanjeev Sarpal

    Sanjeev is an AIoT visionary and expert in signals and systems with a track record of successfully developing over 25 commercial products. He is a Distinguished Arm Ambassador and advises top international blue chip companies on their AIoT solutions and strategies for I4.0, telemedicine, smart healthcare, smart grids and smart buildings.

    View all posts

How do you get the best performance from your IoT smart sensor?

The global smart sensor market size is projected to grow from USD 36.6 billion in 2020 to USD 87.6 billion by 2025, at a CAGR of 19.0%. At least 80% of these IoT/IIoT smart sensors (temperature, pressure, gas, image, motion, loadcells) will use Arm’s Cortex-M technology.

IoT sensor measurement challenge

The challenge for most, is that many sensors used in these applications require filtering in order to clean the measurement data in order to make it useful for analysis.

Let’s have a look at what sensor data really is…. All sensors produce measurement data. These measurement data contain two types of components:

  • Wanted components, i.e. information what we want to know
  • Unwanted components, measurement noise, 50/60Hz powerline interference, glitches etc – what we don’t want to know

Unwanted components degrade system performance and need to be removed.

So, how do we do it?

DSP means Digital Signal Processing and is a mathematical recipe (algorithm) that can be applied to IoT sensor measurement data in order to clean it and make it useful for analysis.

But that’s not all! DSP algorithms can also help:

  • In analysing data, producing more accurate results for decision making with ML (machine learning)
  • They can also improve overall system performance with existing hardware. So ther’s no need to redesign your hardware: a massive cost saving!
  • To reduce the data sent off to the cloud by pre-analysing data. So send only the data which is necessary

Nevertheless, DSP has been considered by most to be a black art, limited only to those with a strong academic mathematical background. However, for many IoT/IIoT applications, DSP has been become a must in order to remain competitive and obtain high performance with relatively low cost hardware.

Do you have an example?

Consider the following application for gas sensor measurement (see the figure below). The requirement is to determine the amplitude of the sinusoid in order to get an estimate of gas concentration (bigger amplitude, more gas concentration etc). Analysing the figure, it is seen that the sinusoid is corrupted with measurement noise (shown in blue), and any estimate based on the blue signal will have a high degree of uncertainty about it – which is not very useful if getting an accurate reading of gas concentration!

Algorithms clean the sensor data

After ‘cleaning’ the sinusoid (red line) with a DSP filtering algorithm, we obtain a much more accurate and usable signal. Now we are able to estimate the amplitude/gas concentration. Notice how easy it is to determine the amplitude of red line.

This is only a snippet of what is possible with DSP algorithms for IoT/IIoT applications, but it should give you a good idea as to the possibilities of DSP.

How do I use this in my IoT application?

As mentioned at the beginning of this article, 80% of IoT smart sensor devices are deployed on Arm’s Cortex-M technology. The Arm Cortex-M4 is a very popular choice with hundreds of silicon vendors, as it offers DSP functionality traditionally found in more expensive DSPs. Arm and its partners provide developers with easy to use tooling and a free software framework (CMSIS-DSP). So, you’ll be up and running within minutes.

Continuing with our Analytics team study of the virus on Western European countries, we present our findings for data up to week 15 (14 April).

As discussed in our previous articles, in order to provide an objective comparison per country, the algorithmic results need to be standardised around the population of each country in order to produce a more accurate deaths per million inhabitants rate. The figure shown below summarises the results.

As seen, Belgium’s mortality rate (red) is significantly higher than any of its neighbours. Germany (blue) and the Netherlands (green) have the lowest mortality rates, and appear to be levelling off. This suggests that the Dutch and German governments testing, health care systems and social distancing strategies appear to be paying off.

It’s not completely clear why Belgium’s mortality rate is so much higher than its neighbours, but a possible explanation may be due to insufficient testing and the virus hitting various elderly care homes. We’ll follow Belgium’s progress over the coming weeks, and report our findings.

The UK

As discussed in a previous article, the UK had a one-week head start on its neighbours. Therefore, shifting the UK data left by six days, we obtain an interesting picture of the UK’s situation:

Applying a prediction model to the UK data (dashed magenta line), notice how the UK’s data follows France’s data. Although long term predication models should be viewed with a degree of scepticism (as there are too many unknown factors to consider), the prediction suggests that the UK’s mortality rate should follow France’s mortality rate.  

The good news for the UK population, is that the emergency measures in place, appear to be working and are leading to a decline in deaths!

6 reasons why ASN Filter Designer is a powerful real-time DSP platform e.g. life math scripting, tool creates your technical specification and documentation

The Netherlands is regarded by the International Monetary Fund (IMF) as one the richest countries in the world, with high life expectancy, good infrastructure and a liberal society.  The Dutch have historically been traders, learning multiple foreign languages and trading with the whole world – a practice that is still continued to this date. The Dutch love to travel, which may have been one of main factors for the Covid-19 virus gripping the Netherlands so severely.

The Covid-19 virus has led all European governments to effectively lockdown their countries in the hope of limiting the spread of the virus. Although some see this as a violation of their civil rights, the Dutch government’s ambition is to limit the spread of virus so that the health system can cope with a controlled flow of infections.

Population standardisation and carnival

New research from the University of Massachusetts, suggests that the median incubation period (i.e. the time between exposure to the virus and the appearance of the first symptoms) for Covid-19 is just over five days and that 97.5% of people who develop symptoms will do so within 11.5 days of infection.

John Hopkins University (JHU) provide an open database of confirmed cases, deaths and number of recoveries, obtained from data from the World health organisation (WHO), and various other health intuitions and governments. These datasets are broken down into countries and regions.

Applying our ANNA data modelling algorithms to the raw datasets provided by John Hopkins University (JHU), we were able to plot the mortality rate versus time for the Netherlands, as shown below.

Confirmed deaths for the Netherlands: source JHU database

Many models are based on raw measured values that are not adjusted for comparison with neighbouring countries, so called population standardisation, which can give a false perspective of the situation at hand.

The yearly Carnival festivals that takes place around the 23-Feb, attracts large crowds of people (shown on the right). This incubation period of approximately 12 days can be clearly seen in the data for the Netherlands, where the first deaths are reported around 7-March (14 days after carnival), suggesting that if not adequately treated in hospital, the patient will die within a few days.

Our analysis considered data obtained from the following five European countries (populations shown in parenthesis): Germany (83 million), the Netherlands (17 million), Belgium (11 million), UK (66 million) and France (67 million).

In order to provide an objective comparison per country, the algorithmic results were standardised around the population of each country in order to produce a deaths per million inhabitants rate. The figure shown below summarises the results.

Population standardisation trends: deaths per million inhabitants

Analysing the chart, it can be seen that when viewing the scaled dataset, the Netherlands (green) and France (black) have the highest mortality (death) rate, and Germany (blue) the lowest. France’s high mortality rate may be attributed to many foreigners visiting France for their winter holiday.

A disastrous combination of events

Analysing the various news reports, the Brabant province in the South of the Netherlands was a particular hotspot for the virus. Our findings as to the likely reasons why the contagion rate in Brabant is so high can be attributed to a combination of the following factors:

  • The yearly Carnival festivals taking place around the 23 February, which attract large crowds of people.
  • Frequent foreign travel of people working for large international business, such as Philips and ASML.
  • School holiday.
  • People taking their winter holidays in France and Italy.

Had carnival taken place several weeks earlier, the effects on the Dutch population may have very well been lower.

Another hotspot for the virus was Amsterdam, which like Brabant is a hub for international business, and a very densely populated region of the country.

Conclusions

The Covid-19 incubation period for the Netherlands is around 12 days.   

When standardising the mortality rate population data per million inhabitants with surrounding countries, the Netherlands and France have the highest mortality rate of all of their neighbouring countries. A likely explanation of the explosive outbreak in the Brabant province of the Netherlands, is due to Carnival festival, the school/winter holiday and international business travel. France’s high mortality rate may be attributed to many foreigners visiting France for their winter holiday.

Despite Germany’s large population of 83 million, the data shows that the German government’s handling of the situation has been very effective indeed. The German health system boasts over 25,000 intensive care beds, and respiration equipment. Comparing this this Netherlands, which just has a little over 1,150 beds, and adjusting for the population differences – Germany still has more than 4.5 times more intensive care beds at its desposal.

In terms of prevention: Germany’s National Association of Statutory Health Insurance Physicians, reports that it has capacity for approximately 12,000 Covid-19 tests per day, which surpasses all other European countries.