A recently unveiled machine learning (ML) technique called Faster Objects, More Objects (FOMO) appears to be the first ML model to offer the ability for real-time object detection, tracking, and counting on low-powered microcontrollers.
Object detection is hardly new and is a widely used computer vision technique used for tasks such as face detection, people counting or tracking a ball during a football match. It detects an object from an input image and traditionally draws a bounding box around it.
Developed by ML development platform Edge Impulse, what stands out about FOMO is its performance and tiny footprint – the firm claims it is 30 times faster than the well-established MobileNet-SSD and runs on less than 200K of RAM.
Specifically, FOMO can process inputs at 30 frames per second (fps) on the Arduino Nicla Vision using 245K RAM. In comparison, MobileNet-SSD analyzes a few frames per second on a more powerful Raspberry Pi 4 which comes in configurations of 1GB of RAM or more.
What is TinyML
FOMO was designed as a TinyML model, which is a subfield of ML focused on enabling ML applications on resource and power-constrained devices. From a compute perspective, TinyML doesn’t rely on powerful GPUs or even standard microprocessors used to run the average ML applications.
According to the thinking of TinyML influencer Pete Warden, TinyML should endeavor to operate at a power consumption level of a device that can run off a standard coin battery. This essentially confines it to less capable computing hardware like microcontrollers (MCUs) and digital signal processors (DSPs).
A low-powered or battery-powered microcontroller-based embedded device can perform ML tasks with real-time responsiveness. Such devices can be deployed at locations where supplying power can be challenging, such as factory floors with hundreds of machines, or at scores of retail outlets without troublesome power runs.
And because data don’t have to be sent to the cloud, TinyML can theoretically offer enhanced security due to the local nature of the computing.
As a startup working on deploying ML on edge devices, FOMO could dramatically expand the reach of ML-powered computer vision applications for Edge Impulse.
A middle ground for computer vision
How did Edge Impulse manage to create such a compact ML model? In a blog post, principal machine learning engineer Matthew Kelcey and senior developer relations engineer Louis Moreau noted that image classification is simpler than object detection.
The former takes an image as an input and outputs the type of object and works even on microcontrollers. Object detection, on the other hand, outputs information such as the class, object count, position, and size, and is “hardly” run on MCUs due to their complexity.
FOMO adopts a middle ground by returning the centroid location of objects, but not their size. This offers the ability to identify, count objects, and determine the relative positions of the objects without excessive overheads.
“The FOMO model provides a variant in between; a simplified version of object detection that is suitable for many use cases where the position of the objects in the image is needed but when a large or complex model cannot be used due to resource constraints on the device,” they wrote.
According to the developers, FOMO also performs better on a large number of small objects than MobileNet-SSD and YOLOv5 – another object detection algorithm.
Limitations of FOMO
As an ML model designed to run on microcontrollers, FOMO does suffer from some shortcomings that another model running on a more powerful system is unlikely to encounter.
For instance, Kelcey and Moreau acknowledged that FOMO will not detect distinct objects where their centroids “overlap” and result in them occupying the same cell in the output. Moreover, FOMO operates best when all objects are of a similar size.
Reading between the lines, FOMO is probably more suited for monitoring a video feed of a manufacturing line than say, to count the number of shoppers at a shopping mall.
On the plus side, FOMO is also compatible with any MobileNetV2 model and can leverage transfer learning to work with existing models. And this gives FOMO “the capabilities to scale from the smallest microcontrollers all the way to full gateways or GPUs.”
More detailed documentation about FOMO can be found here.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].
Image credit: iStockphoto/pisittar