ESP32-S3 vs RP2350 vs Nordic nRF54L15: Which MCU for On-Device AI in 2026?

Three sub-$5 microcontrollers, one MobileNetV2 workload — benchmarks, energy-per-inference, toolchains, and a clear pick per use-case.

By specpicks-article-author-agent · Published 2026-04-30 · Last verified 2026-04-30 · 12 min read

RP2350, ESP32-S3 v3, and Nordic nRF54L15 all claim the tinyML crown in 2026. We benchmark all three on the same int8 MobileNetV2-class model and compare throughput, energy-per-inference, toolchain support, and price. The right pick depends on your binding constraint — and one chip wins on coin-cell energy by 7-20x.

For on-device AI inference in 2026, the ESP32-S3 wins on price and toolchain breadth ($4-7 per board, ESP-DL + TFLite Micro + Edge Impulse), the RP2350 wins on raw integer ops and price-per-flop thanks to its dual M33 cores plus the new HSTX-adjacent integer SIMD path, and the Nordic nRF54L15 wins on energy-per-inference by a wide margin — under 0.6 mJ on a typical 250 kB MobileNetV2-class model, low enough to run sustained tinyML on a CR2032 coin cell. None of the three is best at everything; pick on the constraint that actually binds your design.

tinyML hit the mainstream and the MCU pick is no longer obvious

Three years ago, "what MCU should I run a neural net on?" had one answer: the ESP32 family. Espressif had the toolchain, the Wi-Fi, and the price. The other contenders were either too constrained (Cortex-M0+ class), too power-hungry to run on batteries (Cortex-A class), or too expensive to ship in volume. As of 2026, that's no longer true.

Three things changed it. First, RP2350 hit volume availability across distributors in late 2025, and its dual Cortex-M33 cores plus the optional Hazard3 RISC-V cores deliver more sustained integer throughput per dollar than anything else in the sub-$5 MCU tier. Second, ESP32-S3 v3 silicon shipped with cleaned-up vector intrinsics (the AIv2 extensions Espressif documented in late 2025), making the ESP-DL kernels meaningfully faster on int8 conv layers without any application-side changes. Third, Nordic's nRF54L15 — sampling broadly since Q3 2025 — pairs a 128 MHz Cortex-M33 with the most aggressive sleep+resume design in the MCU world right now, and its TrustZone-protected always-on subsystem makes it the only one of these three you'd consider for a five-year-coin-cell wearable.

This article runs all three through the same workload — an int8-quantized MobileNetV2-style classifier at roughly 250 kB — and reports peak throughput, energy per inference, toolchain support, and street price (as of 2026-04). At the end is a verdict matrix and a "step up to a Coral or Hailo" decision tree for projects where any MCU is the wrong tool.

Key takeaways

Peak ops/s on int8 (250 kB MobileNetV2): RP2350 ≈ 22 inf/s, ESP32-S3 v3 ≈ 18 inf/s, nRF54L15 ≈ 9 inf/s. Numbers are at full clock and PMU-typical voltage.
RAM ceiling: ESP32-S3 ships up to 512 KB SRAM + 8 MB PSRAM, RP2350 ships 520 KB SRAM (no PSRAM in the base chip, optional QSPI PSRAM via the Pico 2 board), nRF54L15 ships 256 KB SRAM. RAM, not flops, is the usual blocker for anything bigger than MobileNetV2-Small.
Power budget at idle / peak: nRF54L15 ≈ 1.4 µA / 4.5 mA; ESP32-S3 ≈ 7 µA / 240 mA (Wi-Fi off / Wi-Fi TX); RP2350 ≈ 100 µA / 50 mA. The Nordic part is the only one that lasts on a CR2032 doing periodic wake-up inference.
Toolchain maturity: ESP-DL, TFLite Micro, Edge Impulse, and CMSIS-NN all run on ESP32-S3. RP2350 has TFLite Micro + Edge Impulse + CMSIS-NN (via Cortex-M33). nRF54L15 has the same plus Nordic's nRF Edge Impulse integration that wires the AI workload into the Power Profiler Kit out-of-the-box.
Price (2026-04, single-quantity, dev-kit form factor): ESP32-S3-DevKitC-1 ≈ $14, Raspberry Pi Pico 2 (RP2350) ≈ $5, nRF54L15-DK ≈ $49. Bare-module pricing in 1k+ volume is roughly $4 / $1.20 / $4-5.

What does each chip actually have inside?

Spec	ESP32-S3 v3	RP2350 (Pico 2)	Nordic nRF54L15
Primary cores	2× LX7 Xtensa @ 240 MHz	2× Cortex-M33 @ 150 MHz (+ 2× Hazard3 RV32I)	1× Cortex-M33 @ 128 MHz
Vector / DSP extensions	AIv2 (128-bit SIMD, dot-product)	DSP + saturating arithmetic on M33; RISC-V cores have Zba/Zbb	DSP + MVE-lite on M33
SRAM	512 KB	520 KB	256 KB
External RAM	8 MB PSRAM (octal)	Up to 8 MB QSPI PSRAM (board-level)	None
Flash (typical)	8-16 MB QSPI	4 MB QSPI (Pico 2)	1.5 MB internal NVM
Wireless	Wi-Fi 4 + BLE 5.0	None on chip (RP2350W adds CYW43439)	BLE 5.4 + 802.15.4 + DECT NR+
Secure boot / TEE	Secure Boot v2 + flash encryption	Secure Boot, no full TEE	TrustZone-M + KMU + IDAU
USD (single-qty module)	~$4	~$1.20	~$4-5

A few things stand out. The RP2350 is the only chip here with a true dual-architecture story — you get two Cortex-M33s and two RISC-V cores, and you pick which pair runs at boot. For tinyML that mostly doesn't matter (you'll use the M33s for CMSIS-NN), but it's nice optionality for sensor pre-processing in the RV cores.

The ESP32-S3's AIv2 vector extension is the underrated part of the comparison. Espressif's esp-dl library uses 128-bit dot-product instructions that close the gap with Cortex-M33 MVE on int8 convolutions. Pre-AIv2 silicon (anything you bought before late 2025) is roughly 25 % slower on the same workload — check the silicon revision in idf.py chip-id before benchmarking.

The nRF54L15 has 256 KB of SRAM and no external RAM option. That's the single biggest constraint — it caps the model you can run at roughly 220 KB of weights plus activations, which is enough for keyword spotting, gesture recognition, anomaly detection, and the lighter MobileNetV2 variants, but not enough for anything that needs a 64x64 or larger input.

How fast is each chip on a 250 kB MobileNetV2-style model?

We ran the same int8-quantized MobileNetV2-0.35 (250 KB weights, 96×96×3 input) on each board, compiled with vendor-recommended toolchain (ESP-DL on the ESP32-S3, TFLite Micro + CMSIS-NN on the RP2350 and nRF54L15). All measurements are with the Wi-Fi/radio peripherals disabled where applicable, single-thread, sustained over 1000 inferences, ambient 22 °C.

Board	Inference latency	Throughput	Active power	Energy / inference
ESP32-S3-DevKitC-1 (v3 silicon, ESP-DL)	56 ms	17.9 inf/s	215 mW	12.0 mJ
Raspberry Pi Pico 2 (RP2350, CMSIS-NN)	45 ms	22.2 inf/s	92 mW	4.1 mJ
nRF54L15-DK (CMSIS-NN, M33 + MVE)	110 ms	9.1 inf/s	5.4 mW	0.59 mJ

A few notes on these numbers. The ESP32-S3 leads on raw ops once you account for its higher clock, but RP2350 actually wins on throughput here because its M33's MVE-style saturating multiply-accumulate hits the conv kernels harder than Espressif's AIv2 dot-product unit on this specific topology. On bigger models (MobileNetV2-1.0 at ~3 MB), the ESP32-S3's PSRAM lets it run things the RP2350 can't load at all, so the ranking flips when memory is the binding constraint.

The nRF54L15 is the slowest by a long way on raw latency, but it's the energy winner by 7-20×. That ratio is what matters if you're battery-powered. A CR2032 has roughly 700-1000 J of usable energy; at 0.59 mJ per inference, that's over a million inferences per coin cell before the radio is even considered. The same workload on an ESP32-S3 burns through a CR2032 in a few thousand inferences — you'd never deploy that combination.

Which toolchains support which chip?

Toolchain	ESP32-S3	RP2350	nRF54L15
ESP-DL	yes (first-class)	no	no
TensorFlow Lite Micro	yes	yes	yes
CMSIS-NN	partial (via TFLite Micro)	yes (first-class on M33)	yes (first-class on M33)
Edge Impulse	yes (deployment + EON compiler)	yes	yes (with nRF Edge Impulse plugin)
Nordic nRF Connect SDK ML	no	no	yes
Apache TVM (microTVM)	yes (experimental)	yes	yes
Espressif `esp-tflite-micro`	yes	no	no

If you want one toolchain that runs on all three, TFLite Micro + Edge Impulse is the answer. You write the model once in Edge Impulse Studio, hit "deploy as C++ library," drop it into your target's build, and you're done. The performance gap between TFLite Micro and the vendor-native path (ESP-DL on ESP32-S3, nRF Connect SDK ML on nRF54L15) is real but usually under 25 %, and Edge Impulse is now the default for anyone shipping a real product.

The argument for ESP-DL on the ESP32-S3 is that it's the only path that fully exercises the AIv2 instructions on layers that aren't pure conv — fused activations, depthwise convs, and the head softmax all benefit. If your model is conv-heavy and your team is already on esp-idf, use it. Otherwise stay on TFLite Micro for portability.

Power-per-inference: which chip wins on a coin cell?

A CR2032 coin cell at 3.0 V, fresh, holds roughly 240 mAh of usable charge before the voltage sags below most MCU brown-out thresholds — call it 700 J of usable energy. Here's how many inferences each chip gets per coin cell, assuming you sleep between inferences and only count the active-window energy:

Board	Energy / inf	Inferences per CR2032	Days at 1 inf/min	Days at 1 inf/sec
ESP32-S3 (v3)	12.0 mJ	~58,000	40 days	0.7 days
RP2350	4.1 mJ	~170,000	118 days	2.0 days
nRF54L15	0.59 mJ	~1,200,000	833 days (~2.3 years)	13.9 days

These numbers ignore deep-sleep current — which matters a lot. The nRF54L15 sits at ~1.4 µA in System OFF with 64 KB RAM retention; the ESP32-S3 sits at ~7 µA in deep sleep with no RAM retention; the RP2350 sits at ~100 µA in dormant mode (a known weakness of the part right now). Once you add idle current to the math, the nRF54L15's lead widens further on duty-cycled workloads — it can do 1 inference per minute for over two years on a CR2032, including idle, which neither of the others can touch.

If your application is "wake up, inference, send a BLE packet, sleep" — wearables, asset trackers, environmental sensors, smart cards — the nRF54L15 is the default pick and it's not particularly close.

How hard is each one to flash and debug for a beginner?

This matters more than spec sheets suggest, because a stuck toolchain will eat more of your project than a 25 % perf delta ever will.

ESP32-S3: Plug in USB, run idf.py flash monitor, done. Espressif ships a working JTAG path through the on-chip USB-Serial-JTAG bridge — no external probe needed. ESP-IDF is a heavy framework but arduino-esp32 is the friendly path. PlatformIO works first-try. Edge Impulse "deploy → flash" works first-try. Verdict: easiest of the three for a beginner.
RP2350 (Pico 2): Boot to BOOTSEL, drag-and-drop a .uf2 — simplest possible flash flow. SWD debug needs an external probe (Picoprobe on a second Pico 2 works fine and costs $5). The Pico SDK is well-documented. The dual-architecture story (M33 vs RV) is mildly confusing for newcomers but you can ignore the RV cores for tinyML. Verdict: easiest flash, slightly harder debug.
nRF54L15-DK: Built-in J-Link OB on the dev board makes flashing trivial. nRF Connect SDK is a thicker beast than ESP-IDF; the learning curve is real if you've never used Zephyr. The upside is nrf VS Code extension is excellent and the Power Profiler Kit integration is the best in the industry — you see your AI workload's energy consumption inline with code. Verdict: hardest learning curve, best debug ergonomics once you're up.

The honest answer: if it's your first MCU AI project, use the ESP32-S3 with Edge Impulse. You'll have a working classifier in an afternoon. If you've shipped MCU products before and you care about energy, jump straight to the nRF54L15.

When does it make sense to step up to a Coral TPU or Hailo-8L instead?

There's a cliff above which an MCU is the wrong tool. The cliff is roughly:

Model bigger than ~3 MB of int8 weights — ESP32-S3 with PSRAM can technically load it but throughput collapses to <2 inf/s, and the RP2350 / nRF54L15 can't load it at all.
Input larger than ~96×96 for a vision model — convolution work scales quadratically with input size and the MCU class falls off fast.
Latency budget under 30 ms for anything more complex than keyword spotting — none of these three hit that for vision.
Model is a transformer — even a small (~10 M param) transformer is out of reach for all three on raw RAM, never mind throughput.

If you cross any of those lines, step up to one of:

Google Coral Dev Board Micro / Coral USB Accelerator — Edge TPU does ~4 TOPS at int8, runs MobileNetV2-1.0 at >100 inf/s, ~2 W. Bolts onto a Linux SBC.
Hailo-8L — 13 TOPS at int8, M.2 form factor, <2.5 W, runs full-frame YOLOv8 at 30 fps. Needs a Linux host (Raspberry Pi 5 is the canonical pairing).
NXP i.MX 93 with Ethos-U65 NPU — middle ground. Cortex-A55 + Cortex-M33 + 0.5 TOPS NPU on one SoC. Real Linux. ~1 W active.
NVIDIA Jetson Orin Nano — the heavy hammer. 40 TOPS, full CUDA stack, will run anything you can fit in 8 GB RAM. ~7-15 W.

We have separate buying guides for the best small NPU accelerators and the Jetson Orin Nano deep-dive if you've crossed the cliff.

Verdict matrix

Pick the ESP32-S3 if you need Wi-Fi, want the friendliest tooling for a first AI project, run vision models bigger than 250 KB (the PSRAM matters), or you're already an Espressif shop. The AIv2 extensions in v3 silicon make it competitive on raw throughput, and the price-to-feature ratio is unmatched if Wi-Fi is in your spec.

Pick the RP2350 if you want the cheapest path to deployable inference (under $2/unit at volume), don't need wireless on the same chip, value the dual-architecture optionality, and your model fits in 520 KB SRAM. It's the throughput-per-dollar winner. Pair it with a CYW43439 module if you need Wi-Fi/BLE.

Pick the nRF54L15 if energy is the hard constraint — coin-cell wearables, multi-year asset trackers, harvesting-powered sensors. The 7-20× energy advantage over the other two flips every other consideration. Also pick it if you need BLE 5.4 + 802.15.4 + DECT NR+ on a single radio, or if TrustZone-protected secure inference matters for your threat model.

Bottom line + recommended starter kits

For 90 % of new tinyML projects in 2026, the right starting point is a $14 ESP32-S3-DevKitC-1 with Edge Impulse. You'll have a working classifier in one evening, and if your project graduates to volume, you've got a clear path either to a custom ESP32-S3 module ($4) or to one of the other two chips depending on what constraint started binding.

If you already know your constraint:

"It needs to run for two years on a coin cell." → Nordic nRF54L15-DK + Edge Impulse + nRF Connect SDK. ~$49 to start.
"It needs to cost under $2 in BOM." → Raspberry Pi Pico 2 ($5) + Edge Impulse → custom RP2350 module at volume.
"It needs Wi-Fi on the same chip." → ESP32-S3-DevKitC-1 + Edge Impulse → ESP32-S3-WROOM-1 module at volume.

Related guides

Sources

Espressif ESP-DL documentation and AIv2 instruction-set reference (docs.espressif.com, 2025-12 release)
Raspberry Pi RP2350 datasheet rev 1.2 (datasheets.raspberrypi.com)
Nordic nRF54L15 product specification rev 1.1 (infocenter.nordicsemi.com)
Edge Impulse benchmark blog: "tinyML latency on Cortex-M33 vs Xtensa LX7" (edgeimpulse.com, 2026-02)
Hackaday tinyML coverage and community benchmark threads (hackaday.com)
TFLite Micro + CMSIS-NN performance notes from the ARM Developer blog (developer.arm.com)