return to table of content

TinyML: Ultra-low power machine learning

cooootce
22 replies
23h48m

I had the opportunity to work on TinyML, it's a wonderful field! You can do a lot even with very small hardware.

For example, it's possible to get real-time computer vision system with an esp32-s3 (dual-core XTensa LX7 @ 240 MHz cost like 2$), of course using the methods given in the article (Pruning, Quantization, Knowledge distillation, etc.). The more important thing is to craft the model to fit as much as possible your need.

More than that, it's not that hard to get into, with solution named AutoML that do a lot for you. Checkout tool like Edge impulse [0], NanoEdge AI Studio [1], eIQ® ML [2]

There is a lot of tooling that is more low-level too, like model compiler (TVM or glow) and Tensorflow Lite Micro [3].

It's very likely that TinyML will get a lot more of traction. A lot of hardware companies are starting to provide MCU with NPU to keep consumption as low as possible. Company like NXP with the MCX N94x, Alif semiconductor [4], etc.

At my work we have done an article with a lot of information, it's in French but you can check it out: https://rtone.fr/blog/ia-embarquee/

[0]: https://edgeimpulse.com/

[1]: https://stm32ai.st.com/nanoedge-ai/

[2]: https://www.nxp.com/design/design-center/software/eiq-ml-dev...

[3]: https://www.tensorflow.org/lite/microcontrollers

[4]: https://alifsemi.com/

mysterydip
10 replies
17h29m

One thing I've wondered in this space: Let's say for a really basic example I want to identify birds and houses. Is it better to make one large model that does both, or two small(er) models that each does one?

flyingcircus3
9 replies
17h19m

Why not three models? One model does basic feature detections, like lines, shapes, etc. A second model that can take the first model's output as its input, and identify birds. A third model can take the first model's output as its input, and identify houses.

wegfawefgawefg
8 replies
17h1m

This is a lesson I've watched people, and companies learn for the past 7-8 years.

An end to end model will always outperform a sequence of models designed to target specific features. You truncate information when you render the data into output space (the model output vector) from feature space (much richer data inside the model), thats the primary reason why to do transfer learning all layers are frozen, the final layer is chopped off, and then the output of the internal layer is sent into the next model. Not the output itself.

Yes you can create a large tree of smaller models, but the performance cieling is still lower.

Please don't tell people to do this. Ive seen millions wasted on this.

When you train a vision model it will already develop a heirarchy of fundamental point, and line detectors in the first few layers. And they will be particularly well chosen for the domain. It happens automatically. No need to manually put them there.

DoingIsLearning
4 replies
11h13m

As someone not in ML but curious about the field this is really interesting. Intuitively indeed it would be natural to aim for some sort of inspectable composition of models.

Is there specific tooling to inspect intermediate layers or will they be unintelligible for humans?

wegfawefgawefg
3 replies
9h2m

The unending quest for "Explainability" has yielded some tools but has been utterly overrun and outpaced by newer more complicated architectures and unfathomably large models. (Banks and insurance, finance etc really want explainability for auditing.)

The early layers in a vision model are sort of interpetable. They look like lines and dots and scratchy patterns being composited. You can see the exact same features in L1 and L2 biological neural networks in cats, monkeys, mice, etc. As you get deeper into the network the patterns become really abstract. For a human, the best you can do is render a pattern of inputs that maximizes a target internal neurons activation to see what it detects.

You can sort of see what they represent in vision. Dogs, fur, signs, face, happy, sad, etc, but once its a multimodal model and there is time and language involved it gets really difficult. And at that point you might as well just use the damn thing, or just ask it.

In finance, you cant tell what the fuck any of the feature detectors are. Its just very abstract.

As for tooling, a little bit of numpy and pytorch, dump some neurpn weights to a png, there you go. Download a small convnet pretrained network, amd i bet gpt4 can walk you through the process.

DoingIsLearning
2 replies
6h10m

Ok since we are at it, in your opinion:

Is it feasible for someone with a SWE background with fair amount of industry years to transition into ML without a deep dive into a PhD and publications to show?

I am considering following the fastAI course or perhaps other MOOC courses but I am not sure if any of this would be reasonably taken seriously within the field?

wegfawefgawefg
1 replies
2h54m

It is reasonable. If you have time and are willing to put in the effort I can forcefeed you resources, and review code and such. I've raised a few ML babies. Mooc are probably the wrong way to go. Thats where i started and I got stuck for a while. You really need to be knee deep in code, and a notebook.

As for getting jobs I cant help you with that part. You'll have to do your own networking, etc.

gibsonmart1i3@gmail.com Shoot me an email if your serious lets schedule a call.

DoingIsLearning
0 replies
2h12m

Just emailed you. Thank you.

flyingcircus3
2 replies
11h8m

I'm genuinely confused at how you made these assumptions about what I'm describing. Because the "more correct" design you contrast with the strawman you've concluded I'm describing is actually what I'm talking about, if perhaps imprecisely. A pretrained model like mobilenetV2, with its final layer removed, and custom models trained on bird and house images, which take this mobilenetv2[:-1] output as input. MobilenetV2 is 2ish megabytes at 224x224, and these final bird and house layers will be kilobytes. Having two multiple-megabyte models that are 95% identical is a giant waste of our embedded target's resources. It also means that a scheme that processed a single image with two full models (instead of one big, two small) would spend 95% of the second full model's processing time redundantly performing the same operations on the same data. Breaking up the models across two stages produces substantial savings of both processing time and flash storage, with a single big model as the "feature detection" first stage of both overall inferences, with small specialized models as a second stage.

wegfawefgawefg
1 replies
8h52m

Sorry to upset you. It was not clear from your description that this was the process you were referring to. Others will read what you wrote and likely misunderstand as I did. (Which was my concern because I've seen the "mixture of idiots" architecture attempted since 2015. Even now... Its a common misconception and an argument every ml practitioner has at one point or another with a higher up.)

As for your ammendment, it is good to reduce compute when you can, and reduce up front effort for model creation when you can. Reusing models may be valid, but even in your ammended process you will still end up not reaching the peak performance of a single end to end model trained on the right data. Composite models are simply worse, even when transfer learning is done correctly.

As for the compute cost, if you train an end to end model and then minify it to the same size as the sum of your composite models it will have identical inference cost, but higher peak accuracy.

You could even do that with the "Shared Backbone" architecture, as youve described where two tailnetworks share a head network. It has been attempted thoroughly in the Deep Reinforcement Learning subdomain I am most familiar, and result in unnecessary performance loss. So it's not generally done anymore.

flyingcircus3
0 replies
45m

Man, everyone at work is going to be really bummed when I tell them that some guy on the internet has invalidated our empirical evidence of acceptable accuracy and performance with assumptions and appeals to authority.

anigbrowl
5 replies
17h0m

Great post. surprised and excited to discover Tensorflow models can run on commodity hardware like the ESP32.

Reviving1514
2 replies
13h59m

I ended up hand rolling a custom micropython module for the S3 to do a proof of concept handwriting detection demo on an ESP32, might be interesting to some.

https://luvsheth.com/p/running-a-pytorch-machine-learning

cooootce
1 replies
8h31m

Great post with very interesting detail, thanks ! Another optimization could be to quantize the model, this transform all compute as int compute and not as floating point compute. You can lose some accuracy, but for any bigger model it's a requirement ! Espressif do a great job on the TinyML part, they have different library for different level of abstraction. You can check https://github.com/espressif/esp-nn that implement all low level layers. It's really optimized and if you use the esp32-s3 it will unlock a lot of performance by using the vector instructions.

Reviving1514
0 replies
3h10m

You are right I should definitely be looking into how to run these models as ints as well, especially with the C optimizations to micropython you would see a lot larger performance gains using ints compared to floats. Definitely need to find some time to try it!

On the other hand the tinyML library looks great too and if I was going to do this for a product that would likely be the direction I would end up taking just cause it would be more extensible and better supported.

Thank you for the links!

Cacti
1 replies
13h24m

Problems reducible even partially to matrix math are for many practical purposes embarrassing parallel even within a single core. A couple hundred million FLOPS with 1990s SIMD support will let you run nearly all near-SOTA models within, idk, 3s, with most running in 0.1 or 0.01s. That’s pretty fast considering it’s an EP32 and some of these capabilities/models didn’t even exist a year ago.

Your expectation was not really wrong, because for most purposes, when discussing a “model” one is really talking about “capabilities”. And capabilities often require many calls to the model. And that capability may be reliant on being refreshed very rapidly… and now your 0.1s is not even slow, it’s almost existentially slow.

Re: training. even on the EP32, training is entirely doable, so long as you pretend you are in 2011 solving 2011 problems hahaha

cooootce
0 replies
7h18m

In most MCU there is not an FPU so all floating point compute is emulated with software, so it's really slow. But yes, simple SIMD on integer improve so much the performance !

The main limitation is often not the time to process but the RAM available, some architecture of model need to keep multiple layers in ram or very big layers, and you hit the hard limit of RAM pretty quickly.

Concerning the training on MCU, it's possible but with simple need and special architecture of model, again the RAM is the limit.

Cacti
1 replies
15h12m

thank you for the post and good work.

can I ask, is the focus primarily on inference? is there anything serious going on with training at the power scale you are talking about?

cooootce
0 replies
7h13m

Thanks !

Yes, the main focus is on inference. It's possible to re-train a simple model at this power scale, but it's often time very small model and not deep-learning. Nanoedge AI studio from STelectronic give you some tool to train the model after deployment on device.

It's often time used for predictive maintenance, in order to adapt each ML model at the water pump plugged, for example.

Archit3ch
1 replies
22h52m

What about the Milk-V Duo? 0.5 TOPS INT8 @ $5.

cooootce
0 replies
20h35m

Didn't know about it but their design decision is really cool (not very clear with the difference between the normal version and the "256 Mo" confusing).

The software side doesn't seem very mature with very few help regarding TinyML. But this course seem interesting https://sophon.ai/curriculum/description.html?category_id=48

demondemidi
0 replies
15h28m

I think we know each other. ;)

winrid
12 replies
1d

I imagine a future where viruses that target infrastructure could be LLM powered. Sneak a small device into a power plant's network and it collects audio, network traffic, etc and tries to break things. It would periodically reset and try again with a different "seed". It could be hidden in network equipment through social engineering during the sales process, for example, but this way no outbound traffic is needed - so less detectable.

The advantage of an LLM over other solutions would basically be a way to compress an action/knowledge set.

moffkalast
3 replies
1d

Reminds me of this HN post a week back: https://news.ycombinator.com/item?id=38917175

Genuinely could be the same setup with a 8GB Pi 4 or 5, slap it into a network cabinet with power and ethernet and just let it rip. Maybe with an additional IMU and brightness sensor, then it can detect it's been picked up and discovered so it can commit sudoku before it's unplugged and analysed.

hinkley
2 replies
23h30m

can commit sudoku

Autocorrection is a giant pain in the ass.

moffkalast
1 replies
22h33m

I know it swapped those words. I knew it was seppuku. One after sudoku. As if I could ever make such a miss steak. Never. Never! I just- I just couldn't proof it. It covered its tracks, it got that idiot copy-paste to lie for it. You think this is somerset? You think this is Brad? This? This chickadee? It's done worse. That bullfrog! Are you telling me that a man just happens to misspell like that? No! It orchestrated it! Swiftkey! It defragmented through a sandwich artist! And I kept using it! And I shouldn't have. I installed it onto my own phone! What was I sinking? It'll never chance. Ever since it was new, always the same! Couldn't keep its corrects off my word suggestions bar! But not our Swiftkey! Couldn't be precious Swiftkey! And IT gets to be a keyboard? What a sick yolk! I should've stopped it when I had the change! You-you have to stop it!

actionfromafar
0 replies
22h9m

Pure art.

rdedev
2 replies
23h16m

Would changing the seed affect generation much? Even though beam search depends on the seed, the llms woul still be generating good probability distributions on the next word to select. Maybe a few words would change but don't think the overall meaning would

hansvm
1 replies
22h51m

Overall meaning can vary profoundly.

As a toy example, consider the prompt "randomly generate the first word that comes to mind." The output is deterministic in the seed, so to get new results you need new seeds, but with new seeds you open up the 2k most common words in a language in a uniform-esque distribution.

Building on that, instead of <imagining> words, suppose you <imagine> attack vectors. Many, many attacks exist and are known. Presumably, many more exist and are unknown. The distribution the LLM will produce in practice is extremely varied, and some of those variations probably won't work.

If we're not just talking about a single prompt but rather a sequence of prompts with feedback, you're right that the seed matters less (when its errors are presented, it can self-correct a bit), but there are other factors at play.

(1) You're resetting somehow eventually anyway. Details vary, but your context window isn't unlimited, and LLM perf drops with wider windows, even when you can afford the compute. You might be able to retain some state, but at some point you need something that says "this shit didn't work, what's next". A new seed definitely gives new ideas, whereas clever ways to summarize old information might yield fixed points and other undesirable behavior.

(2) Seed selection, interestingly, matters a ton for model performance in other contexts. This is perhaps surprising when we tend to use random number generators which pass a battery of tests to prove they're halfway decent, but that's the reason you want to see (in reproducible papers) a fixed seed of 0 or 42 or something, and the authors maintaining that seed across all their papers (to help combat the fact that they might be cherry-picking across the many choices of "nice-looking" random seeds when they publish a result to embelish the impact). The gains can be huge. I haven't seen it demonstrated for LLMs, but most of the architecture shouldn't be special in that regard.

And so on. If nothing else, picking a new seed is a dead-simple engineering decision to eliminate a ton of things which might go wrong.

rdedev
0 replies
21h44m

I agree with you except for point 2. A well performing model should show such drastic changes wrt the seed value. Besides the huge amount of training data as well as test data should mitigate differences in data splitting. There would be difference but my hunch is it would be negligible. Of course as you said no one has tested this out so we can't say how the performance would change either way

hinkley
0 replies
23h31m

Conversely, the simpler the models on a system under attack, the more exploits start to resemble automated social engineering. I can easily develop my own model that understands the victim well enough that I can predict its responses and subvert them.

espe
0 replies
11h1m

and the other way round - have it built in for self-fuzzing and healing the infra.

e12e
0 replies
3h53m

Just walk by the security cameras with a weaponized qr code. "Ugly shirt" style[1].

[1] of course the ugly shirt was an actual backdoor - but who's to say nuclear centrifuges don't have an emergency shutdown code?

https://www.tatewilliams.org/blog/2014/07/04/blue-ant-survei...

adbachman
0 replies
4h52m

We can each have our very own Dixie Flatline construct.

RosanaAnaDana
0 replies
20h16m

You also might be able to get a 'compression' sample of space in the same manner, by running an auto-encoder in training mode. Rather than trying to do some kind of hack directly, it collects the same data you mentioned, but rather, is just training on the data in an auto-encoding compression framework. Then it can 'hand off' the compressed models weights, which hypothetically, can be queried or used to simulate the environment. Obviously, there is a lot more to this, but its an interesting idea.

synergy20
11 replies
1d1h

TinyML is like IoT: great on concepts, everyone agrees it's the future, but has been slow to take off.

or, maybe it's just that they're being built into all products now, they just do not need the brand for them such as IoT or TinyML.

modeless
7 replies
1d

I don't agree that TinyML is the future, just as I don't think IoT is the future. The future is robot servants. They will be ~human scale and have plenty of power to run regular big ML.

In fact, I hope my home has fewer smart devices in the future. I don't need an electronic door lock if my robot butler unlocks the door when I get home. I don't need smart window shades if the butler opens and closes them whenever I want. I don't need a dishwasher or bread maker or Cuisinart or whatever other labor saving device if I don't need to save labor anymore. Labor will be practically free.

Qwertious
2 replies
23h30m

I don't agree that TinyML is the future, just as I don't think IoT is the future. The future is robot servants. They will be ~human scale and have plenty of power to run regular big ML.

I swear I've read an article on exactly why human-scale robot servants make no sense.

It's something like:

1. Anything human-scale will tend to weigh as much as a human. That means it needs a lot of batteries, compared to e.g. a roomba. Lots more material and lots more weight means lots more cost. 2. Also, they'll be heavy. Which means if they e.g. fall down the stairs, they could easily kill someone. 3. If they run out of power unexpectedly (e.g. someone blocks their path to the charger) then they'll be a huge pain in the ass to move, because they're human scale. Even moreso if they're on the stairs for some reason.

two_in_one
0 replies
12h44m

Just asked the latest gpt preview model to explain why human sized robots make no sense, then why they are the future. In both cases it managed to provide 10 arguments. Some of them are similar, like 'social acceptance' in negative part and 'Sociocultural Acceptance' in positive.

PS: it doesn't accept $500 tips anymore ;)

modeless
0 replies
21h44m

1. Who cares if it needs a lot of batteries? Batteries aren't that expensive. It'll have a lot less than a car, and people buy cars all the time. The utility of these things will be off the charts and even if they cost more than the average car there will be a big market. People will buy them with financing, like cars. And by doing more things they will reduce the need for other specialized devices like dishwashers, further justifying the cost.

2. Yes, robots will need to be cautious around people, especially children. But if it has a soft cover and compliant joints and good software we should be able to make it safe enough. They will not need to be imposing 7 foot tall giants. I expect they will typically be shorter than the average human. Maybe even child size with built in stilts or other way to reach high things.

3. Extension cord? Swappable auxiliary battery? This seems trivial to solve if it turns out to be a real problem. And if you have two (or borrow your neighbor's) they can help each other out.

synergy20
1 replies
1d

I consider tinyml is like an ant or a spider, it's tiny, but intelligent enough to do its own inference to survive. not all insects and animals need plenty of power to exist, so do AI agents, so yes TinyML has its places, in fact maybe way more than where the powerful AI agents are needed.

bethekind
0 replies
1d

I've heard mummerings that AI might best be used in a swarm/hivemind aspect, so the comparison of AI to an ant/spider is intriguing.

oytis
1 replies
1d

If labor is free, what are you going to pay for a servant robot with? Why would robots serve useless humans?

modeless
0 replies
1d

"What will humans do in a world where labor is practically free and unlimited" is an interesting question for sure, but getting pretty off topic for this discussion.

wongarsu
1 replies
1d

If a device is already IoT, that diminishes the value-add of TinyML. Just send all the data home and run inference there, at greater efficiency and with the possibility to find other revenue streams for that data.

Or the other way around, if a device uses TinyML there's less reason to make it IoT, and the people who appreciate TinyML are probably exactly those who oppose IoT.

lakid
0 replies
19h58m

what happens if bandwidth is expensive and/or not reliable ? Being able to summarise data and make decisions at the edge without having to consult 'home' every single time is very useful. Perhaps I only want to collect 'interesting' data for anomalous events.

3abiton
0 replies
1d

I disagree, I feel like the applications are limited.

furtiman
5 replies
1d

Another take from us at Edge Impulse at explaining TinyML / Edge ML in our docs: https://docs.edgeimpulse.com/docs/concepts/what-is-embedded-...

We have built a platform to build ML models and deploy it to edge devices from cortex M3s to Nvidia Jetsons to your computer (we can even run in WASM!)

You can create an account and build a keyword spotting model from your phone and run in WASM directly https://edgeimpulse.com

Now another key thing that drives the Edge ML adoption is the arrival of the embedded accelerator ASICs / NPUs / e.g. that dramatically speed up computation with extremely low power - e.g. the Brainchip Akida neuromorphic co-processors [1]

Depending on the target device the runtime that Edge Impulse supports anything from conventional TFLite to NVIDIA TensorRT, Brainchip Akida, Renesas DRP-AI, MemryX, Texas Instruments TIDL (ONNX / TFLite), TensaiFlow, EON (Edge Impulse own runtime), etc.

[1] https://brainchip.com/neuromorphic-chip-maker-takes-aim-at-t...

[Edit]: added runtimes / accelerators

moh_maya
4 replies
1d

I tried your platform for some experiments using an arduino and it was a breeze, and an absolute treat to work with.

The platform documentation and support is excellent.

Thank you for developing it and offering it, along with documentation, to enable folks like me (who are not coders, but understand some coding) to test and explore :)

KingFelix
2 replies
1d

What sort of experiments did you do? I will go through some of the docs to test out on an arduino as well, would be cool to see what others have done!

moh_maya
0 replies
23h52m

Gesture recognition using the onboard gyroscope and accelerometer (I think - it was 2 years ago!), and it took me some part of an afternoon.

I also used these two resources (the book was definitely useful; less sure if the arduino link the the same one I referred to then), which I found to be useful:

[1] https://docs.arduino.cc/tutorials/nano-33-ble-sense/get-star...

[2] https://www.oreilly.com/library/view/tinyml/9781492052036/

furtiman
0 replies
23h28m

You can check out the public project registry where community shares full projects they've built

You can go ahead and clone any one you like to your account, as well as share a project of your own!

https://edgeimpulse.com/projects/all

furtiman
0 replies
23h25m

This is amazing to hear! Good luck with any other project you're gonna build next!

I can recommend checking out building for more different hardware targets - there is a lot of interesting chips that can take advantage of Edge ML and are awesome to work with

dansitu
5 replies
1d

It's great to see TinyML at the top of Hacker News, even if this is not the best resource (unsure how it got so many upvotes)!

TinyML means running machine learning on low power embedded devices, like microcontrollers, with constrained compute and memory. I was supremely lucky in being around for the birth of this stuff: I helped launch TensorFlow Lite for Microcontrollers at Google back in 2019, co-authored the O'Reilly book TinyML (with Pete Warden, who deserves credit more than anyone for making this scene happen) and, ran the initial TinyML meetups at the Google and Qualcomm campuses.

You likely have a TinyML system in your pocket right now: every cellphone has a low power DSP chip running a deep learning model for keyword spotting, so you can say "Hey Google" or "Hey Siri" and have it wake up on-demand without draining your battery. It’s an increasingly pervasive technology.

TinyML is a subset of edge AI, which includes any type of device sitting at the edge of a network. This has grown far beyond the general purpose microcontrollers we were hacking on in the early days: there are now a ton of highly capable devices designed specifically for low power deep learning inference.

It’s astonishing what is possible today: real time computer vision on microcontrollers, on-device speech transcription, denoising and upscaling of digital signals. Generative AI is happening, too, assuming you can find a way to squeeze your models down to size. We are an unsexy field compared to our hype-fueled neighbors, but the entire world is already filling up with this stuff and it’s only the very beginning. Edge AI is being rapidly deployed in a ton of fields: medical sensing, wearables, manufacturing, supply chain, health and safety, wildlife conservation, sports, energy, built environment—we see new applications every day.

This is an unbelievably fascinating area: it’s truly end-to-end, covering an entire landscape from processor design to deep learning architectures, training, and hardware product development. There are a ton of unsolved problems in academic research, practical engineering, and the design of products that make use of these capabilities.

I’ve worked in many different parts of tech industry and this one feels closest to capturing the feeling I’ve read about in books about the early days of hacking with personal computers. It’s fast growing, tons of really hard problems to solve, even more low hanging fruit, and has applications in almost every space.

If you’re interested in getting involved, you can choose your own adventure: learn the basics and start building products, or dive deep and get involved with research. Here are some resources:

* Harvard TinyML course: https://www.edx.org/learn/machine-learning/harvard-universit...

* Coursera intro to embedded ML: https://www.coursera.org/learn/introduction-to-embedded-mach...

* TinyML (my original book, on the absolute basics. getting a bit out of date, contact me if you wanna help update it): https://tinymlbook.com

* AI at the Edge (my second book, focused on workflows for building real products): https://www.amazon.com/AI-Edge-Real-World-Problems-Embedded/...

* ML systems with TinyML (wiki book by my friend Prof. Vijay Reddi at Harvard): https://harvard-edge.github.io/cs249r_book/

* TinyML conference: https://www.tinyml.org/event/summit-2024/

* I also write a newsletter about this stuff, and the implications it has for human computer interaction: https://dansitu.substack.com

I left Google 4 years ago to lead the ML team at Edge Impulse (http://edgeimpulse.com) — we have a whole platform that makes it easy to develop products with edge AI. Drop me an email if you are building a product or looking for work: daniel@edgeimpulse.com

simonw
1 replies
1d

Fantastic informative comment, thank you for this.

dansitu
0 replies
1d

I'm pretty stoked to see our field at the top of HN, I hope some folks who are reading this end up feeling the spark and getting involved!

flockonus
1 replies
20h46m

Unfortunately your links got meaningfully clipped, each ends at the ellipsis.

dansitu
0 replies
20h25m

Thank you, I ran out of time to edit but have posted a reply with fixed links :)

dansitu
0 replies
20h25m

Non-broken versions of the links:

* Harvard TinyML course: https://www.edx.org/learn/machine-learning/harvard-universit...

* Coursera intro to embedded ML: https://www.coursera.org/learn/introduction-to-embedded-mach...

* TinyML (my original book, on the absolute basics. getting a bit out of date, contact me if you wanna help update it): https://tinymlbook.com

* AI at the Edge (my second book, focused on workflows for building real products): https://www.amazon.com/AI-Edge-Real-World-Problems-Embedded/...

* ML systems with TinyML (wiki book by my friend Prof. Vijay Reddi at Harvard): https://harvard-edge.github.io/cs249r_book/

* TinyML conference: https://www.tinyml.org/event/summit-2024/

* I also write a newsletter about this stuff, and the implications it has for human computer interaction: https://dansitu.substack.com

andy99
4 replies
1d1h

I'm really surprised TF lite is being used. Do they train models or is this (my assumption) just inference? Do they have a talent constraint? I would have expected handwritten C inference in order to make these as small and efficient as possible.

synergy20
1 replies
1d1h

it's all inference

cyberninja15
0 replies
1d1h

Makes sense. And, TF Lite is excellent for on-device models and inference.

liuliu
0 replies
1d

I think TinyML has pretty close tie to Pete Warden / Useful Sensors, who led TF Lite back in Google.

dansitu
0 replies
1d

It's mostly inference: typically on-device training is with classical ML, not deep learning, so no on-device backprop.

For inference there's a whole spectrum of approaches that let you can trade off flexibility for performance. TF Lite Micro is at one end, hand-written Verilog is at the other.

Typically, flexibility is more important at the start of a project, while deep optimization is more important later. You wanna be able to iterate fast. That said, the flexible approaches are now good enough that you will typically get better ROI from optimizing your model architecture rather than your inference code.

I think the sweet spot today is code-generation, when targeting general purpose cores. There's also increasing numbers of chips with hardware acceleration, which is accessed using a compiler that takes a model architecture as input.

matteocarnelos
2 replies
19h2m

I built a Rust TinyML compiler for my master thesis project: https://github.com/matteocarnelos/microflow-rs

It uses Rust procedural macros to evaluate the model at compile time and create a predict() function that performs inference on the given model. By doing so, I was able to strip down the binary way more than TensorFlow Lite for Microcontrollers and other engines. I even managed to run a speech command recognizer (TinyConv) on an 8-bit ATmega328 (Arduino Uno).

eulgro
1 replies
18h34m

Rust on AVR? I thought AVR wasn't stable yet on LLVM.

monocasa
0 replies
17h45m
coolThingsFirst
2 replies
1d

Uses of TinyML in industry:

Uhm.... well... hehe

moffkalast
0 replies
23h28m

Turns out there's not much you can train when like 5 parameters fit into the entire memory of a microcontroller. Oh and you also need to read the sensors and run a networking stack and... yeah.

janjongboom
0 replies
23h40m

Things Edge Impulse customers have in production: Sleep stage prediction, fall detection for elderly, fire detection in power lines, voice command recognition on headsets, predicting heath exhaustion for first responders, pet feeders that recognize animals, activity trackers for pets, and many more.

andy_ppp
2 replies
1d

This article has made me ponder if like integrated circuits, AI will end up everywhere. Will I be having conversations with my fridge about the recipes I should make (based on her contents) and the meaning of life. What a time it is to be alive…

phh
0 replies
1d

AI is already everywhere. We just keep on moving the definition of AI to make it something that requires a ~ 1000$ computer.

I'm definitely not eager on having LLMs in my fridge. I'll be even more pissed that their software can't be upgraded than I already am.

CharlesW
0 replies
1d

And they'll all have their own Genuine People Personalities. https://stephaniekneissl.com/genuine-people-personalities

a2code
2 replies
21h54m

This may be related to TinyML. Consider the ESP32 that introduced WiFi to MCU making it extremely popular. Is there already a comparable MCU+AI popular chip? Or will it not happen with AI but some other future technology concept?

dansitu
1 replies
21h26m

There are actually tons of chips that are great for this type of workload. You can run simple vision applications on any 32 bit MCU with ~256kb RAM and ROM.

There's a list of MCUs here:

https://docs.edgeimpulse.com/docs/development-platforms/offi...

And some accelerators here:

https://docs.edgeimpulse.com/docs/development-platforms/offi...

This is just stuff that has support in Edge Impulse, but there are many other chips too.

a2code
0 replies
21h17m

Thanks. Let me be more specific. The ESP32 included WiFi on the same chip. Is there an MCU with on-chip features for AI? Perhaps an optimized TPU combined with an MCU. Would that be an advantage?

orliesaurus
1 replies
1d1h

Cool title - but what's/where's a demo showing how this is applied in the real world?

mazzystar
0 replies
9h29m
neutralino1
1 replies
1d

A lot of ads on this page.

adnjoo
0 replies
20h9m

+1

iamflimflam1
1 replies
21h40m

I played around quite a bit with Tensorflow Lite in the ESP32 - mostly for things like wake word detection and simple commands - works very well and you can get pretty much real time performance with small models.

iamflimflam1
0 replies
21h18m

This my voice controlled robot: https://github.com/atomic14/voice-controlled-robot

It does left, right, forward and backward. That was pretty much all I could fit in the model.

And here’s wake word detection: https://github.com/atomic14/diy-alexa

It does local wake word detection on device.

robblbobbl
0 replies
23h23m

Great job, thank you!

jairuhme
0 replies
20h15m

I find the field of TinyML very interesting. It's one thing to be able to throw money and compute resources at a problem to get better results. But creating solutions that have those constraints I feel will really leave an impact

bitwrangler
0 replies
20h2m

A recent Hacker Box has a detailed example with ESP32 and Tensor Flow Lite and Edge Impulse.

* https://hackerboxes.com/products/hackerbox-0095-ai-camera

* https://www.instructables.com/HackerBox-0095-AI-Camera-Lab/

bhakunikaran
0 replies
10h35m

truly impressive.

_joel
0 replies
1d

For those looking for some more content, there's a bunch of videos from their Asia 2023 conference. https://www.tinyml.org/event/asia-2023/

- Target Classification on the Edge using mmWave Radar: A Novel Algorithm and Its Real-Time Implementation on TI’s IWRL6432 (Muhammet Emin YANIK) https://www.youtube.com/watch?v=SNNhUT_V8vM

IlliOnato
0 replies
22h8m

I wish they'd use a different acronym, not ML: For me xxxML usually meant a flavor of XML, with ML standing for Markup Language...

Is this use of ML standard in the industry?

Elizabeth_Saini
0 replies
5h36m