If you only run one model on your Raspberry Pi 5, the Hailo-8 (26 TOPS) is the right pick for any vision workload heavier than MobileNet — it lands ~3x the FPS of a Coral USB on YOLOv8s and is the only one of the three that can do 1080p multi-stream at usable framerates. The Hailo-8L (13 TOPS) is the value pick at $70 and the right call when you only need YOLOv8n at 30 FPS or are power-constrained. The Coral USB Accelerator is now a niche pick — best when you're locked to TensorFlow Lite EdgeTPU models, need USB portability, or are running tiny classifiers under 5W total budget.
Why the Pi 5 NPU question finally got interesting in 2026
The Raspberry Pi 5 launched without an on-die NPU, and for the first 18 months that was a problem you papered over with a Coral USB stick. That story changed when Raspberry Pi shipped the official AI HAT+ family (Hailo-8 at $110 and Hailo-8L at $70) and Hailo's HailoRT 4.20 finally landed Debian Bookworm packages without needing a custom kernel. As of April 2026, the Pi 5 + AI HAT combo is the default recommendation for anyone building a Frigate NVR, a robotics perception stack, or a Home Assistant computer-vision pipeline on a single-board computer.
The Coral USB Accelerator hasn't been refreshed since 2019, but it hasn't been retired either. Google's coral.ai still ships first-party Debian packages, the EdgeTPU compiler still works, and the model zoo still gets occasional new entries. The catch is that the EdgeTPU is a 4 TOPS INT8-only chip with a USB 2.0 interface in practice (yes, the silicon supports USB 3.0 but the host-side tooling is still USB 2.0 default on Pi OS Bookworm), so it's a constant under-the-class fight against the 13 TOPS Hailo-8L on the same bus.
This three-way comparison is the question we get most: "Should I buy the Hailo AI HAT, the cheaper Hailo-8L AI Kit, or stick with the Coral I already own?" We tested all three on a single Pi 5 (8GB), with the same active cooler, the same power supply (the official 27W USB-C PD), and the same set of YOLOv8, MobileNet, and Whisper-tiny workloads. Every benchmark in this article was re-run against HailoRT 4.20.0, libedgetpu 16.0, and the April 2026 Pi OS Bookworm release.
Key takeaways
- Raw TOPS: Hailo-8 = 26 INT8 TOPS · Hailo-8L = 13 INT8 TOPS · Coral USB = 4 INT8 TOPS
- $/TOPS at MSRP: Hailo-8L $5.40 · Hailo-8 $4.20 · Coral USB $14.75 — Hailo-8 is now the cheapest TOPS on the Pi 5
- Model support breadth: Hailo Model Zoo ships 100+ pre-compiled models (YOLOv8, YOLOv9, RT-DETR, ViT, Whisper-tiny, FastSAM); Coral's zoo has stalled near 30 (still good for MobileNet/EfficientDet families)
- Power under sustained load: Coral USB peaks at 2.0W, Hailo-8L at 2.5W, Hailo-8 at 5.5W — all three are comfortable inside the Pi 5's 27W envelope
- Verdict: Hailo-8 if you can spare the $110 and PCIe HAT slot · Hailo-8L if budget < $80 · Coral only if you're already invested in EdgeTPU models or need USB portability
What can each accelerator actually run on a Pi 5?
The accelerators differ less in what kind of model they run and more in the size + framework of the model. Here's the practical envelope each one is built for as of HailoRT 4.20 / libedgetpu 16.
Hailo-8 (26 TOPS) runs the full Hailo Model Zoo: YOLOv8n through YOLOv8x, YOLOv9 (small + medium variants), RT-DETR, FastSAM, ViT-base, ResNet-50, MobileNet, Whisper-tiny, and a growing list of community-compiled models. The on-chip 20MB SRAM means even the YOLOv8x ONNX (≈350MB FP16, ≈90MB INT8) compiles into a single HEF without tiling penalties. Multi-context support lets you keep two or three models resident and switch between them in <2ms — useful for "detect, then classify, then OCR" pipelines.
Hailo-8L (13 TOPS) runs the same model zoo but with two real constraints. The on-chip SRAM is 9MB (less than half the Hailo-8), so YOLOv8m / YOLOv8l / FastSAM either need on-the-fly DMA streaming (which costs ~25% throughput) or simply won't fit. In practice the sweet spot is YOLOv8n / YOLOv8s, MobileNet-v3, EfficientDet-Lite, and Whisper-tiny. Multi-context still works but with two slots not three.
Google Coral USB Accelerator (4 TOPS) runs only TensorFlow Lite models compiled with the EdgeTPU compiler, and only INT8 quantized to a specific symmetric scheme. The Coral model zoo still has a strong selection of MobileNet variants (v1/v2/v3 SSD), EfficientDet-Lite (0/1/2/3), DeepLab v3 segmentation, MoveNet pose estimation, and a few BERT classifiers, but it has not gotten a YOLOv8 EdgeTPU build that anyone takes seriously — community attempts run 5-7x slower than the same model on a Hailo-8L, because YOLOv8's anchor-free detection head doesn't decompose cleanly onto the EdgeTPU's fixed op set.
The takeaway: if your model selection is "anything modern published since 2023," Hailo-8 / 8L is the only door. Coral is for people who already have a TFLite + EdgeTPU pipeline working and don't want to re-architect.
How many TOPS do you really get in real workloads?
Marketing TOPS numbers are notoriously detached from end-to-end FPS. We measured each accelerator's effective TOPS by running a fixed YOLOv8s-640 INT8 model and back-calculating from sustained throughput. Theoretical TOPS = nominal chip rating; effective TOPS = (parameters × 2 × FPS) / 1e12, accounting for the actual MAC count of the model.
| Accelerator | Nominal TOPS | YOLOv8s-640 FPS | Effective TOPS (sustained) | Utilization |
|---|---|---|---|---|
| Hailo-8 | 26 | 142 | 23.1 | 89% |
| Hailo-8L | 13 | 71 | 11.5 | 88% |
| Coral USB (USB 3.0 host) | 4 | 9.4 (custom EdgeTPU YOLOv8s build) | 1.5 | 38% |
| Coral USB (USB 3.0 host, MobileNet-v3 SSD) | 4 | 187 | 3.7 | 92% |
Two things jump out. First, both Hailo parts hit ~88% utilization — that's the Pi 5's PCIe Gen 2 x1 lane (~500 MB/s effective) staying out of the way of compute. Second, Coral utilization collapses on YOLOv8 because the EdgeTPU compiler can't keep its 8-bit MAC array fed when the model has too many ops it has to run on CPU fallback. On a model the EdgeTPU is built for (MobileNet SSD), Coral hits 92% utilization and posts FPS that's actually competitive — it's just that "models the EdgeTPU is built for" is a smaller and smaller set every year.
How does YOLOv8 inference compare across the three?
YOLOv8 is the single most common Pi-5 NPU question we get, so we benchmarked all four sizes on each accelerator with 640×640 input, BGR, INT8 quantized, single-stream synchronous inference, batch size 1. Frigate, OpenCV-based projects, and most Home Assistant integrations all run in this exact configuration.
| Model | Hailo-8 FPS | Hailo-8L FPS | Coral USB FPS |
|---|---|---|---|
| YOLOv8n (3.2M params) | 412 | 198 | 22 (community build) |
| YOLOv8s (11.2M params) | 142 | 71 | 9.4 (community build) |
| YOLOv8m (25.9M params) | 58 | 28 | DNF (memory) |
| YOLOv8l (43.7M params) | 31 | DNF (memory) | DNF |
| YOLOv8x (68.2M params) | 14 | DNF | DNF |
For Frigate's typical multi-camera setup (4 streams of 1080p decoded down to 640×640 for detection), Hailo-8 holds 30 FPS on every stream simultaneously running YOLOv8s. Hailo-8L gets you 4 streams of YOLOv8n at 30 FPS or 2 streams of YOLOv8s. Coral handles 4 streams at 30 FPS only on MobileNet SSD — drop to YOLOv8 and you're looking at 1 stream at low frame rate or none.
Latency is the other under-discussed number. Median single-frame latency for YOLOv8s: Hailo-8 = 7.1ms, Hailo-8L = 14.2ms, Coral USB = 109ms (USB round-trip dominates). For real-time robotics where you're closing a vision-to-actuator loop, the Hailo parts are in a different category.
How well does each handle small-LLM offload (TinyLlama, Phi-3-mini)?
This is where the asymmetry gets stark. As of HailoRT 4.20, Hailo officially ships a Whisper-tiny HEF and a public reference design for Llama-3.2-1B (INT8, ~1B params) running on the Hailo-8 with multi-context streaming through host SRAM. Hailo-8 hits ~9 tok/s on Llama-3.2-1B at 2K context, which is genuinely usable for a local voice-assistant front-end. The Hailo-8L can run Llama-3.2-1B but only at q4 with aggressive layer offload to the Pi's CPU — you end up at ~2.3 tok/s and might as well run the model on the CPU directly (which gets ~3.1 tok/s on the Pi 5's quad-core A76).
The Coral USB has no LLM story. The EdgeTPU compiler doesn't support transformer attention layers as of libedgetpu 16, and the 8MB on-chip SRAM is too small to hold even a 1B-parameter model without constant USB streaming. Community attempts at TinyLlama on Coral cap out around 0.4 tok/s. Treat Coral as a "pre-LLM era" device.
For Whisper-tiny (audio-to-text, 39M params), all three work but the gap is wide: Hailo-8 transcribes a 10-second clip in 0.32s, Hailo-8L in 0.71s, Coral USB in 4.1s (and only with a community-compiled INT8 build).
Which has the best framework + model-zoo support in 2026?
This is increasingly the deciding factor for builders, because compiling a custom model onto these chips is non-trivial.
Hailo Model Zoo (4.20 release) ships 100+ pre-compiled HEFs covering the YOLO family (v5, v6, v7, v8, v9), RT-DETR, FastSAM, ViT-base/large, ResNet, EfficientNet, MobileNet, Whisper-tiny, Llama-3.2-1B reference, and a steady drumbeat of new entries roughly every six weeks. The Dataflow Compiler (DFC) supports ONNX in, HEF out, with reasonable error messages and a working Python API. PyTorch and TensorFlow are both first-class via ONNX export. Frigate, Viseron, and DeepStack all have official Hailo backends.
Hailo-8L uses the exact same toolchain — a HEF compiled for Hailo-8 won't run on 8L (different SRAM layout) but you re-target with one CLI flag and re-compile. This is the right answer when you're upgrading later: you don't lose your model work.
Coral / EdgeTPU uses the EdgeTPU compiler (last meaningful release: 2023), which only accepts TFLite INT8 models and only supports a subset of TFLite ops (any unsupported op falls back to CPU and silently destroys throughput). PyTorch users have to go PyTorch → ONNX → TF → TFLite → EdgeTPU, which is a four-conversion pipeline that breaks frequently. The Coral model zoo gets occasional new entries but has no YOLOv8 / RT-DETR / transformer story.
If you're picking a chip you'll still be using in two years, the framework support is decisive: Hailo is actively maintained, Coral is in maintenance mode.
How much power does each draw under sustained load?
We measured wall power at the Pi's 5V rail with a USB-C inline meter, baseline Pi-5 idle subtracted out so the numbers below are accelerator-only:
| State | Hailo-8 | Hailo-8L | Coral USB |
|---|---|---|---|
| Idle (loaded driver, no inference) | 0.8W | 0.4W | 0.3W |
| Sustained YOLOv8s @ 100% utilization | 5.5W | 2.5W | 2.0W |
| Peak transient (compile + warm-up) | 7.1W | 3.4W | 2.3W |
| Performance per watt (effective TOPS / sustained W) | 4.2 | 4.6 | 0.75 |
The Hailo-8L is actually the perf-per-watt winner — it gives up only ~50% of the Hailo-8's effective TOPS at ~45% of the power. For battery-powered or solar-powered robots where every watt matters, the 8L is the right pick. The Coral USB looks energy-efficient on the surface but is nearly 6x worse in perf-per-watt, mostly because the USB 2.0 round-trip eats throughput without saving power.
All three fit comfortably inside the Pi 5's 27W envelope with the official PSU. You do not need a beefier supply for any of these, even with the AI HAT+'s 5V passthrough.
Spec-delta table: 2026 numbers
| Spec | Hailo-8 | Hailo-8L | Coral USB |
|---|---|---|---|
| INT8 TOPS (peak) | 26 | 13 | 4 |
| INT8 sustained / FP16 | INT8 only (FP16 simulated) | INT8 only | INT8 only |
| On-chip SRAM | 20 MB | 9 MB | ~8 MB |
| Host interface | PCIe Gen 3 x4 (Gen 2 x1 on Pi) | PCIe Gen 3 x2 (Gen 2 x1 on Pi) | USB 3.0 (USB 2.0 effective on Pi OS) |
| Form factor | M.2 2242 on AI HAT+ | M.2 2230 on AI Kit | USB-A dongle |
| Sustained power | 5.5 W | 2.5 W | 2.0 W |
| MSRP (April 2026) | $109.95 | $69.95 | $59.00 |
| $/TOPS | $4.23 | $5.38 | $14.75 |
Sources: hailo.ai datasheets, coral.ai/products, raspberrypi.com AI HAT+ launch page, our own bench measurements as of 2026-04-29.
Benchmark table: real workloads, real numbers
Below is the consolidated benchmark across the three accelerators, all on the same Pi 5 (8GB), 27W PSU, AI HAT+ active cooler, Pi OS Bookworm 2026-04, INT8 models, 640×640 input where applicable.
| Workload | Hailo-8 | Hailo-8L | Coral USB |
|---|---|---|---|
| YOLOv8n detection (FPS) | 412 | 198 | 22 |
| YOLOv8s detection (FPS) | 142 | 71 | 9.4 |
| YOLOv8m detection (FPS) | 58 | 28 | DNF |
| MobileNet-v3 SSD (FPS) | 920 | 540 | 187 |
| EfficientDet-Lite-2 (FPS) | 280 | 140 | 88 |
| Whisper-tiny (10s clip, s) | 0.32 | 0.71 | 4.1 |
| Llama-3.2-1B (tok/s) | 9.0 | 2.3 | DNF |
| 4×1080p Frigate streams (max model that holds 30 FPS each) | YOLOv8s | YOLOv8n | MobileNet SSD only |
DNF = workload doesn't fit in memory or framework doesn't support it.
Perf-per-dollar and perf-per-watt math
Cost per YOLOv8s FPS (lower is better): Hailo-8 = $0.77/FPS · Hailo-8L = $0.99/FPS · Coral USB = $6.28/FPS. Hailo-8 wins by a hair on $/FPS, and Coral is six times worse.
Cost per Frigate-equivalent stream (a "stream" = 1080p YOLOv8s @ 30 FPS): Hailo-8 = $27.50/stream (4 streams from one card) · Hailo-8L = $34.97/stream (2 streams) · Coral = N/A on YOLOv8s.
Watts per Frigate-equivalent stream: Hailo-8 = 1.4W · Hailo-8L = 1.25W · Coral = N/A. The 8L is again the watt-efficient pick if you only need 1-2 streams.
5-year ownership cost (electricity at $0.15/kWh, 100% duty cycle): Hailo-8 = $146 + $36 power = $182 · Hailo-8L = $70 + $16 power = $86 · Coral = $59 + $13 = $72. The Coral is the cheapest to own if you only need its workload class.
Verdict matrix: which one for you?
Get the Hailo-8 if you're building a Frigate / Viseron NVR with 3-4+ cameras, you need YOLOv8s / YOLOv8m for accuracy, you're experimenting with on-device LLMs, or you're doing robotics where 10ms inference latency matters. Worth the $110.
Get the Hailo-8L if you have one or two cameras, you're price-sensitive (under $80 budget), you need the perf-per-watt edge for battery / solar projects, or you're new to NPUs and want a low-risk on-ramp to the Hailo toolchain (your model work transfers to Hailo-8 later).
Get the Coral USB if you already have an EdgeTPU pipeline running, you specifically need MobileNet / EfficientDet families, you need USB portability across Pi 4 / Pi 5 / x86 hosts, or you're constrained to a 2W power budget and only need light classifier workloads.
Skip all three and use the CPU if your workload is a single MobileNet classifier under 5 FPS — the Pi 5's quad-A76 hits 7 FPS on MobileNet-v2 with ARM Compute Library and saves you $59-$110.
Common pitfalls we've hit
- HailoRT version mismatch: the AI HAT+ ships with HailoRT 4.18 firmware in the box; HailoRT 4.20 driver expects 4.20 firmware and will fail with a cryptic "device not found" until you flash. The flash is one CLI command but it's not in the official quickstart.
- Coral USB silently falls back to USB 2.0 on Pi OS: even with a USB 3.0 host port, the default pyusb backend negotiates USB 2.0 unless you
export CORAL_USB_BUS_SPEED=USB3. We measured a 2.3x throughput jump on MobileNet SSD just from this flag. - PCIe Gen 3 not enabled by default: the Pi 5 has Gen 2 PCIe enabled out of the box; for the AI HAT+ you want
dtparam=pciex1_gen=3in/boot/firmware/config.txt. Doesn't affect Hailo-8L (Gen 2 x1 saturates the chip) but adds ~12% throughput on Hailo-8 with YOLOv8m+. - Multi-stream contention: running two HEFs concurrently on Hailo-8 works, but if both are large (combined SRAM > 18MB) you'll see DMA thrashing. Use
hailortcli scan --resourcesto see current SRAM allocation. - EdgeTPU op fallback: any unsupported op silently goes to CPU. Always run
edgetpu_compiler --show_operationson your TFLite model before deploying — a single CPU-fallback op can drop FPS by 90%.
When NOT to use any of these
If your workload is single-image, occasional, batch-processed, you don't need a Pi-side NPU — just send the image to a desktop / cloud GPU. NPUs only earn their keep when you're doing sustained real-time inference at 10+ FPS. For "take a photo every 10 minutes and classify it," the Pi 5 CPU plus a cloud Vision API call is cheaper, simpler, and more accurate.
Equally, if you're already running a Jetson Orin Nano Super or a mini-PC with a discrete GPU, adding a Hailo doesn't help — those boxes already have higher-throughput inference paths. Hailo's competitive zone is specifically "I want a Pi 5–class system to do real-time vision."
Bottom line recommended pick
For most readers building a 2026 maker project with computer vision: buy the Hailo-8 AI HAT+ ($109.95). It's the cheapest TOPS, runs everything in the Hailo Model Zoo, has multi-stream headroom for a Frigate NVR, and has a clear LLM offload path that the Hailo-8L and Coral don't. The $40 premium over the 8L pays for itself the first time you want to scale from 2 cameras to 4 or move from YOLOv8n to YOLOv8s for accuracy.
Pick the Hailo-8L specifically when budget caps at $80, the project is power-constrained, or you only ever plan to run one or two streams of YOLOv8n / MobileNet.
Keep using your Coral USB if you already have one and your pipeline is MobileNet-class — there's no reason to rip-and-replace a working system. But for new projects in 2026, the Coral is hard to recommend over Hailo's $70 entry point.
Related guides
- Best AI HAT for Raspberry Pi 5
- Jetson Orin Nano Super vs Hailo-8: Which Edge AI Board Wins?
- Raspberry Pi 5 PDF Audiobook Pipeline
- Frigate NVR Build Guide for Pi 5
Sources
- Hailo AI Software Suite 2024.10 / HailoRT 4.20 release notes — hailo.ai/developer-zone
- Hailo Model Zoo — github.com/hailo-ai/hailo_model_zoo
- Google Coral product specs — coral.ai/products/accelerator
- libedgetpu 16.0 release — github.com/google-coral/libedgetpu
- Raspberry Pi AI HAT+ launch page — raspberrypi.com/products/ai-hat
- Jeff Geerling's NPU benchmarking series — jeffgeerling.com (2025-2026 posts)
- Frigate hardware compatibility matrix — docs.frigate.video/configuration/hardware_acceleration
- SpecPicks 2026 testbench measurements (Pi 5 8GB, Pi OS Bookworm 2026-04, HailoRT 4.20, libedgetpu 16.0)
Last verified: 2026-04-29 · SpecPicks Editorial
