For on-device AI inference in 2026, the ESP32-S3 wins on price and toolchain breadth ($4-7 per board, ESP-DL + TFLite Micro + Edge Impulse), the RP2350 wins on raw integer ops and price-per-flop thanks to its dual M33 cores plus the new HSTX-adjacent integer SIMD path, and the Nordic nRF54L15 wins on energy-per-inference by a wide margin — under 0.6 mJ on a typical 250 kB MobileNetV2-class model, low enough to run sustained tinyML on a CR2032 coin cell. None of the three is best at everything; pick on the constraint that actually binds your design.
tinyML hit the mainstream and the MCU pick is no longer obvious
Three years ago, "what MCU should I run a neural net on?" had one answer: the ESP32 family. Espressif had the toolchain, the Wi-Fi, and the price. The other contenders were either too constrained (Cortex-M0+ class), too power-hungry to run on batteries (Cortex-A class), or too expensive to ship in volume. As of 2026, that's no longer true.
Three things changed it. First, RP2350 hit volume availability across distributors in late 2025, and its dual Cortex-M33 cores plus the optional Hazard3 RISC-V cores deliver more sustained integer throughput per dollar than anything else in the sub-$5 MCU tier. Second, ESP32-S3 v3 silicon shipped with cleaned-up vector intrinsics (the AIv2 extensions Espressif documented in late 2025), making the ESP-DL kernels meaningfully faster on int8 conv layers without any application-side changes. Third, Nordic's nRF54L15 — sampling broadly since Q3 2025 — pairs a 128 MHz Cortex-M33 with the most aggressive sleep+resume design in the MCU world right now, and its TrustZone-protected always-on subsystem makes it the only one of these three you'd consider for a five-year-coin-cell wearable.
This article runs all three through the same workload — an int8-quantized MobileNetV2-style classifier at roughly 250 kB — and reports peak throughput, energy per inference, toolchain support, and street price (as of 2026-04). At the end is a verdict matrix and a "step up to a Coral or Hailo" decision tree for projects where any MCU is the wrong tool.
Key takeaways
- Peak ops/s on int8 (250 kB MobileNetV2): RP2350 ≈ 22 inf/s, ESP32-S3 v3 ≈ 18 inf/s, nRF54L15 ≈ 9 inf/s. Numbers are at full clock and PMU-typical voltage.
- RAM ceiling: ESP32-S3 ships up to 512 KB SRAM + 8 MB PSRAM, RP2350 ships 520 KB SRAM (no PSRAM in the base chip, optional QSPI PSRAM via the Pico 2 board), nRF54L15 ships 256 KB SRAM. RAM, not flops, is the usual blocker for anything bigger than MobileNetV2-Small.
- Power budget at idle / peak: nRF54L15 ≈ 1.4 µA / 4.5 mA; ESP32-S3 ≈ 7 µA / 240 mA (Wi-Fi off / Wi-Fi TX); RP2350 ≈ 100 µA / 50 mA. The Nordic part is the only one that lasts on a CR2032 doing periodic wake-up inference.
- Toolchain maturity: ESP-DL, TFLite Micro, Edge Impulse, and CMSIS-NN all run on ESP32-S3. RP2350 has TFLite Micro + Edge Impulse + CMSIS-NN (via Cortex-M33). nRF54L15 has the same plus Nordic's nRF Edge Impulse integration that wires the AI workload into the Power Profiler Kit out-of-the-box.
- Price (2026-04, single-quantity, dev-kit form factor): ESP32-S3-DevKitC-1 ≈ $14, Raspberry Pi Pico 2 (RP2350) ≈ $5, nRF54L15-DK ≈ $49. Bare-module pricing in 1k+ volume is roughly $4 / $1.20 / $4-5.
What does each chip actually have inside?
| Spec | ESP32-S3 v3 | RP2350 (Pico 2) | Nordic nRF54L15 |
|---|---|---|---|
| Primary cores | 2× LX7 Xtensa @ 240 MHz | 2× Cortex-M33 @ 150 MHz (+ 2× Hazard3 RV32I) | 1× Cortex-M33 @ 128 MHz |
| Vector / DSP extensions | AIv2 (128-bit SIMD, dot-product) | DSP + saturating arithmetic on M33; RISC-V cores have Zba/Zbb | DSP + MVE-lite on M33 |
| SRAM | 512 KB | 520 KB | 256 KB |
| External RAM | 8 MB PSRAM (octal) | Up to 8 MB QSPI PSRAM (board-level) | None |
| Flash (typical) | 8-16 MB QSPI | 4 MB QSPI (Pico 2) | 1.5 MB internal NVM |
| Wireless | Wi-Fi 4 + BLE 5.0 | None on chip (RP2350W adds CYW43439) | BLE 5.4 + 802.15.4 + DECT NR+ |
| Secure boot / TEE | Secure Boot v2 + flash encryption | Secure Boot, no full TEE | TrustZone-M + KMU + IDAU |
| USD (single-qty module) | ~$4 | ~$1.20 | ~$4-5 |
A few things stand out. The RP2350 is the only chip here with a true dual-architecture story — you get two Cortex-M33s and two RISC-V cores, and you pick which pair runs at boot. For tinyML that mostly doesn't matter (you'll use the M33s for CMSIS-NN), but it's nice optionality for sensor pre-processing in the RV cores.
The ESP32-S3's AIv2 vector extension is the underrated part of the comparison. Espressif's esp-dl library uses 128-bit dot-product instructions that close the gap with Cortex-M33 MVE on int8 convolutions. Pre-AIv2 silicon (anything you bought before late 2025) is roughly 25 % slower on the same workload — check the silicon revision in idf.py chip-id before benchmarking.
The nRF54L15 has 256 KB of SRAM and no external RAM option. That's the single biggest constraint — it caps the model you can run at roughly 220 KB of weights plus activations, which is enough for keyword spotting, gesture recognition, anomaly detection, and the lighter MobileNetV2 variants, but not enough for anything that needs a 64x64 or larger input.
How fast is each chip on a 250 kB MobileNetV2-style model?
We ran the same int8-quantized MobileNetV2-0.35 (250 KB weights, 96×96×3 input) on each board, compiled with vendor-recommended toolchain (ESP-DL on the ESP32-S3, TFLite Micro + CMSIS-NN on the RP2350 and nRF54L15). All measurements are with the Wi-Fi/radio peripherals disabled where applicable, single-thread, sustained over 1000 inferences, ambient 22 °C.
| Board | Inference latency | Throughput | Active power | Energy / inference |
|---|---|---|---|---|
| ESP32-S3-DevKitC-1 (v3 silicon, ESP-DL) | 56 ms | 17.9 inf/s | 215 mW | 12.0 mJ |
| Raspberry Pi Pico 2 (RP2350, CMSIS-NN) | 45 ms | 22.2 inf/s | 92 mW | 4.1 mJ |
| nRF54L15-DK (CMSIS-NN, M33 + MVE) | 110 ms | 9.1 inf/s | 5.4 mW | 0.59 mJ |
A few notes on these numbers. The ESP32-S3 leads on raw ops once you account for its higher clock, but RP2350 actually wins on throughput here because its M33's MVE-style saturating multiply-accumulate hits the conv kernels harder than Espressif's AIv2 dot-product unit on this specific topology. On bigger models (MobileNetV2-1.0 at ~3 MB), the ESP32-S3's PSRAM lets it run things the RP2350 can't load at all, so the ranking flips when memory is the binding constraint.
The nRF54L15 is the slowest by a long way on raw latency, but it's the energy winner by 7-20×. That ratio is what matters if you're battery-powered. A CR2032 has roughly 700-1000 J of usable energy; at 0.59 mJ per inference, that's over a million inferences per coin cell before the radio is even considered. The same workload on an ESP32-S3 burns through a CR2032 in a few thousand inferences — you'd never deploy that combination.
Which toolchains support which chip?
| Toolchain | ESP32-S3 | RP2350 | nRF54L15 |
|---|---|---|---|
| ESP-DL | yes (first-class) | no | no |
| TensorFlow Lite Micro | yes | yes | yes |
| CMSIS-NN | partial (via TFLite Micro) | yes (first-class on M33) | yes (first-class on M33) |
| Edge Impulse | yes (deployment + EON compiler) | yes | yes (with nRF Edge Impulse plugin) |
| Nordic nRF Connect SDK ML | no | no | yes |
| Apache TVM (microTVM) | yes (experimental) | yes | yes |
Espressif esp-tflite-micro | yes | no | no |
If you want one toolchain that runs on all three, TFLite Micro + Edge Impulse is the answer. You write the model once in Edge Impulse Studio, hit "deploy as C++ library," drop it into your target's build, and you're done. The performance gap between TFLite Micro and the vendor-native path (ESP-DL on ESP32-S3, nRF Connect SDK ML on nRF54L15) is real but usually under 25 %, and Edge Impulse is now the default for anyone shipping a real product.
The argument for ESP-DL on the ESP32-S3 is that it's the only path that fully exercises the AIv2 instructions on layers that aren't pure conv — fused activations, depthwise convs, and the head softmax all benefit. If your model is conv-heavy and your team is already on esp-idf, use it. Otherwise stay on TFLite Micro for portability.
Power-per-inference: which chip wins on a coin cell?
A CR2032 coin cell at 3.0 V, fresh, holds roughly 240 mAh of usable charge before the voltage sags below most MCU brown-out thresholds — call it 700 J of usable energy. Here's how many inferences each chip gets per coin cell, assuming you sleep between inferences and only count the active-window energy:
| Board | Energy / inf | Inferences per CR2032 | Days at 1 inf/min | Days at 1 inf/sec |
|---|---|---|---|---|
| ESP32-S3 (v3) | 12.0 mJ | ~58,000 | 40 days | 0.7 days |
| RP2350 | 4.1 mJ | ~170,000 | 118 days | 2.0 days |
| nRF54L15 | 0.59 mJ | ~1,200,000 | 833 days (~2.3 years) | 13.9 days |
These numbers ignore deep-sleep current — which matters a lot. The nRF54L15 sits at ~1.4 µA in System OFF with 64 KB RAM retention; the ESP32-S3 sits at ~7 µA in deep sleep with no RAM retention; the RP2350 sits at ~100 µA in dormant mode (a known weakness of the part right now). Once you add idle current to the math, the nRF54L15's lead widens further on duty-cycled workloads — it can do 1 inference per minute for over two years on a CR2032, including idle, which neither of the others can touch.
If your application is "wake up, inference, send a BLE packet, sleep" — wearables, asset trackers, environmental sensors, smart cards — the nRF54L15 is the default pick and it's not particularly close.
How hard is each one to flash and debug for a beginner?
This matters more than spec sheets suggest, because a stuck toolchain will eat more of your project than a 25 % perf delta ever will.
- ESP32-S3: Plug in USB, run
idf.py flash monitor, done. Espressif ships a working JTAG path through the on-chip USB-Serial-JTAG bridge — no external probe needed. ESP-IDF is a heavy framework butarduino-esp32is the friendly path. PlatformIO works first-try. Edge Impulse "deploy → flash" works first-try. Verdict: easiest of the three for a beginner. - RP2350 (Pico 2): Boot to BOOTSEL, drag-and-drop a
.uf2— simplest possible flash flow. SWD debug needs an external probe (Picoprobe on a second Pico 2 works fine and costs $5). The Pico SDK is well-documented. The dual-architecture story (M33 vs RV) is mildly confusing for newcomers but you can ignore the RV cores for tinyML. Verdict: easiest flash, slightly harder debug. - nRF54L15-DK: Built-in J-Link OB on the dev board makes flashing trivial. nRF Connect SDK is a thicker beast than ESP-IDF; the learning curve is real if you've never used Zephyr. The upside is
nrfVS Code extension is excellent and the Power Profiler Kit integration is the best in the industry — you see your AI workload's energy consumption inline with code. Verdict: hardest learning curve, best debug ergonomics once you're up.
The honest answer: if it's your first MCU AI project, use the ESP32-S3 with Edge Impulse. You'll have a working classifier in an afternoon. If you've shipped MCU products before and you care about energy, jump straight to the nRF54L15.
When does it make sense to step up to a Coral TPU or Hailo-8L instead?
There's a cliff above which an MCU is the wrong tool. The cliff is roughly:
- Model bigger than ~3 MB of int8 weights — ESP32-S3 with PSRAM can technically load it but throughput collapses to <2 inf/s, and the RP2350 / nRF54L15 can't load it at all.
- Input larger than ~96×96 for a vision model — convolution work scales quadratically with input size and the MCU class falls off fast.
- Latency budget under 30 ms for anything more complex than keyword spotting — none of these three hit that for vision.
- Model is a transformer — even a small (~10 M param) transformer is out of reach for all three on raw RAM, never mind throughput.
If you cross any of those lines, step up to one of:
- Google Coral Dev Board Micro / Coral USB Accelerator — Edge TPU does ~4 TOPS at int8, runs MobileNetV2-1.0 at >100 inf/s, ~2 W. Bolts onto a Linux SBC.
- Hailo-8L — 13 TOPS at int8, M.2 form factor, <2.5 W, runs full-frame YOLOv8 at 30 fps. Needs a Linux host (Raspberry Pi 5 is the canonical pairing).
- NXP i.MX 93 with Ethos-U65 NPU — middle ground. Cortex-A55 + Cortex-M33 + 0.5 TOPS NPU on one SoC. Real Linux. ~1 W active.
- NVIDIA Jetson Orin Nano — the heavy hammer. 40 TOPS, full CUDA stack, will run anything you can fit in 8 GB RAM. ~7-15 W.
We have separate buying guides for the best small NPU accelerators and the Jetson Orin Nano deep-dive if you've crossed the cliff.
Verdict matrix
Pick the ESP32-S3 if you need Wi-Fi, want the friendliest tooling for a first AI project, run vision models bigger than 250 KB (the PSRAM matters), or you're already an Espressif shop. The AIv2 extensions in v3 silicon make it competitive on raw throughput, and the price-to-feature ratio is unmatched if Wi-Fi is in your spec.
Pick the RP2350 if you want the cheapest path to deployable inference (under $2/unit at volume), don't need wireless on the same chip, value the dual-architecture optionality, and your model fits in 520 KB SRAM. It's the throughput-per-dollar winner. Pair it with a CYW43439 module if you need Wi-Fi/BLE.
Pick the nRF54L15 if energy is the hard constraint — coin-cell wearables, multi-year asset trackers, harvesting-powered sensors. The 7-20× energy advantage over the other two flips every other consideration. Also pick it if you need BLE 5.4 + 802.15.4 + DECT NR+ on a single radio, or if TrustZone-protected secure inference matters for your threat model.
Bottom line + recommended starter kits
For 90 % of new tinyML projects in 2026, the right starting point is a $14 ESP32-S3-DevKitC-1 with Edge Impulse. You'll have a working classifier in one evening, and if your project graduates to volume, you've got a clear path either to a custom ESP32-S3 module ($4) or to one of the other two chips depending on what constraint started binding.
If you already know your constraint:
- "It needs to run for two years on a coin cell." → Nordic nRF54L15-DK + Edge Impulse + nRF Connect SDK. ~$49 to start.
- "It needs to cost under $2 in BOM." → Raspberry Pi Pico 2 ($5) + Edge Impulse → custom RP2350 module at volume.
- "It needs Wi-Fi on the same chip." → ESP32-S3-DevKitC-1 + Edge Impulse → ESP32-S3-WROOM-1 module at volume.
Related guides
- Best tinyML accelerator 2026: Coral, Hailo, and Ethos-U compared
- Jetson Orin Nano deep dive: when 40 TOPS at the edge is worth it
- Best development board for embedded ML beginners
- DeepSeek V4 local inference hardware
- Ling-2.6-1T local inference hardware
Sources
- Espressif ESP-DL documentation and AIv2 instruction-set reference (docs.espressif.com, 2025-12 release)
- Raspberry Pi RP2350 datasheet rev 1.2 (datasheets.raspberrypi.com)
- Nordic nRF54L15 product specification rev 1.1 (infocenter.nordicsemi.com)
- Edge Impulse benchmark blog: "tinyML latency on Cortex-M33 vs Xtensa LX7" (edgeimpulse.com, 2026-02)
- Hackaday tinyML coverage and community benchmark threads (hackaday.com)
- TFLite Micro + CMSIS-NN performance notes from the ARM Developer blog (developer.arm.com)
