The Raspberry Pi AI HAT+ 26 TOPS is the higher-tier variant of Raspberry Pi's M.2-based neural accelerator HAT, built around the Hailo-8 NPU and rated at 26 trillion INT8 operations per second. As of 2026 it sells for roughly $110, drops onto a Pi 5 over PCIe, and is purpose-built for real-time vision inference — object detection, pose estimation, segmentation — not for running large language models. The lower-tier 13 TOPS version uses the Hailo-8L and costs around $70.
What's actually inside the AI HAT+?
The AI HAT+ is a small daughterboard that mates to the Raspberry Pi 5's PCIe x1 connector through the M.2 HAT+ form factor. Underneath the heatsink sits a single Hailo silicon part: either the Hailo-8L (13 TOPS, 8 megapixels per second of throughput at INT8) on the entry-tier $70 board, or the full Hailo-8 (26 TOPS) on the $110 board you came here to read about. There is no DRAM on the HAT itself — the Hailo silicon ships with on-die SRAM measured in single-digit megabytes, and your compiled model graph plus its weights have to fit inside that envelope after Hailo's compiler quantizes and tiles it. Anything that doesn't fit gets streamed across PCIe from Pi system memory, which is where your real-world frame rates start sliding off the published headline number.
You wire it the same way Raspberry Pi documents on the Raspberry Pi AI HAT+ product page: unscrew the GPIO standoffs on a Raspberry Pi 5, seat the HAT+ on the PCIe ribbon, screw it down, install hailo-all from apt, and the device shows up as a PCIe accelerator that GStreamer's hailonet element can target. Importantly, this is a Pi 5 part. If you are still running a Raspberry Pi 4 Model B 8GB, the AI HAT+ is not your upgrade path — there is no PCIe lane exposed on the Pi 4. Pi 4 owners shopping the maker shelf today should either accept CPU-only inference, hang a USB accelerator off a port, or plan a board upgrade.
Hailo-8L architecture at 26 TOPS — how it compares to Coral and Jetson
Hailo's pitch with the 8-series is dataflow over the von Neumann model: instead of a fixed matrix-multiply unit that you feed serialized tensor ops, the Hailo-8 spreads a model graph across a grid of small compute clusters that pass activations to each other directly, with weights pinned next to the clusters that need them. The result, in 2026, is the best TOPS-per-watt figure you can buy on a hobby budget — about 2.6 TOPS per watt at the 10 W package envelope.
| Accelerator | INT8 TOPS | Approx. price | Power envelope | Host requirement |
|---|---|---|---|---|
| Pi AI HAT+ (Hailo-8L) | 13 | $70 | 5 W | Raspberry Pi 5 |
| Pi AI HAT+ 26 TOPS (Hailo-8) | 26 | $110 | 10 W | Raspberry Pi 5 |
| Google Coral Edge TPU (USB) | 4 | $60 | 2 W | Any USB 3 host |
| NVIDIA Jetson Orin Nano 8GB | 40 (INT8) | $250 | 7–15 W | Standalone SoM |
The Coral is the cheap, easy comparison — but Google deprecated active development of the Edge TPU stack in 2024, the compiler tops out at TensorFlow Lite, and 4 TOPS is now a quarter of what the 13 TOPS Hailo-8L pulls for $10 more. Coral is still fine for "detect a person every two seconds on a doorbell" hobby builds, but if you are starting a project in 2026, the AI HAT+ obsoletes the Coral on raw throughput and on toolchain longevity.
The Jetson Orin Nano is the serious comparison. NVIDIA's $250 module hits 40 TOPS INT8, gives you a full CUDA stack, and — crucially — has enough on-board LPDDR5 to run small quantized LLMs like Phi-3 Mini and Qwen2 1.5B at usable token rates, which the AI HAT+ cannot do at all. You pay for it in money (more than 2× the Hailo-8 board), in power (up to 15 W vs. ~10 W), and in form factor (the Orin Nano is a system-on-module, not a HAT — you also buy a carrier board). For a fleet of vision-only edge nodes the AI HAT+ wins on $/TOPS and $/node; for a single workstation that needs flexibility, Jetson still wins. Tom's Hardware tracks the broader board market if you want a non-vendor pulse-check on where each platform sits this quarter.
Real benchmarks: object detection FPS on Pi 5 + AI HAT+
Hailo's own published numbers for YOLOv8n at 640×640 are 140 FPS on the Hailo-8L and 270 FPS on the Hailo-8. Those are pure-inference figures — model already loaded, batch of one, no decode, no draw, no tracker. When you build a real pipeline you eat that budget fast.
A realistic 1080p security-camera pipeline on a Pi 5 + AI HAT+ 26 TOPS looks like this: RTSP stream pulled in via FFmpeg/GStreamer, H.264 hardware-decoded by the Pi 5's VideoCore VII, scaled to 640×640, run through YOLOv8n on hailonet, then handed to a ByteTrack tracker on the Pi 5 CPU, then drawn onto the frame and re-encoded. Sustained throughput on that loop sits around 30 FPS per camera, with two cameras per HAT possible before tracker overhead on the four Cortex-A76 cores becomes the bottleneck. The Hailo NPU is mostly idle in that scenario — you are paying for headroom, not for raw frames.
Where the 26 TOPS variant earns its $40 premium over the 13 TOPS sibling is when you stack heavier models: YOLOv8s instead of n, or pair a detector with a re-identification head, or run pose estimation alongside detection. The 8L runs out of compile-time SRAM budget on those stacked graphs; the 8 finishes them with margin. If your project is single-model, single-camera, the 8L is fine. If it is multi-model or multi-camera, buy the 8.
Use cases that finally make sense locally
Three project categories cross the line from "interesting demo" to "actually deploy" with the 26 TOPS HAT.
The first is local security cameras. A Raspberry Pi 5, the AI HAT+ 26 TOPS, and an enclosure like the Pironman 5 case gives you a self-contained, fanless-or-quiet-fan box that ingests two 1080p RTSP feeds, runs YOLOv8 with person/vehicle/package classes, and writes only events — not 24/7 footage — to local disk. No Frigate-on-an-RTX-3060 desktop. No paid cloud detection tier. Frigate, Viseron, and the Hailo-supported hailo_rt GStreamer path all support this configuration today.
The second is edge speech wake-word and keyword spotting. Hailo's compiler accepts the small Conformer and Wav2Vec2 encoder variants that the wake-word and keyword-spotting community has standardized on. You will not run Whisper-large on it, but openWakeWord and pocketsphinx-replacement keyword spotters run with single-digit-millisecond latency and free up the Pi CPU for the rest of the assistant stack.
The third is retro-gaming and video upscalers — a niche that has quietly become one of the most popular Pi 5 use cases. The 26 TOPS budget is enough to run a small ESRGAN or RealESRGAN variant at 480p input → 1080p output in real time, which is exactly the workload a Pi-based MiSTer-style emulation box or a CRT-feed digitizer wants. The retro maker scene has been wiring these into the Raspberry Pi Zero W kit form factor for years as proof-of-concepts; the AI HAT+ is the first piece of hardware that makes the upscaler real-time at 1080p output on a Pi-class machine.
How to wire it: M.2 HAT compatibility, power budget, thermals
Three gotchas burn first-time AI HAT+ buyers.
First, the M.2 connector on the AI HAT+ board is an M-key NVMe-style slot — but it is not for your SSD. The Hailo silicon is the M.2 card. If you also want NVMe boot, you need a different HAT (the M.2 HAT+ for storage) and you cannot stack both on the same PCIe lane without a switch. In practice most builders boot from a fast microSD or USB SSD on the AI HAT+ build and put NVMe on a second Pi.
Second, the power budget. A Pi 5 alone draws about 5 W idle and 8 W under all-core load. Adding the AI HAT+ 26 TOPS at full inference duty adds another ~3.5 W on top, so a real build is sustaining 8.5 W active and peaking higher during model load. Raspberry Pi's official 27 W USB-C PSU is not a recommendation, it is the floor. Third-party 5 V/3 A bricks will throttle the Pi under inference load and you will spend a week debugging "random frame drops" that are actually voltage warnings.
Third, thermals. The AI HAT+ ships with a passive heatsink that is enough for short bursts. Sustained 24/7 vision inference in a closed enclosure needs airflow. The official Pi 5 active cooler handles the CPU side; the HAT runs hot independently. The Pironman 5 case is one of the few off-the-shelf enclosures with adequate top-side airflow for the HAT specifically — it became briefly hard to find in early 2026 because of exactly this use case.
Limitations: no LLM inference, INT8/INT4 only, model conversion friction
Be honest about what this board cannot do.
It will not run an LLM. Hailo's compiler targets fixed-graph CNN and transformer-encoder workloads — image classifiers, detectors, segmenters, audio encoders. It does not handle autoregressive decode, which is the entire computational shape of a language model. There is no llama.cpp Hailo backend in 2026 and the architecture does not lend itself to one. If LLMs are on your roadmap, buy a Jetson Orin Nano or a small x86 box with a used GPU; do not buy the AI HAT+.
It is INT8-first, with some INT4 support on newer compiler builds. Your model gets quantized. For YOLO-family detectors that is fine — Hailo's Model Zoo ships pre-quantized variants and the mAP loss versus FP32 is in the 1–2 percentage-point range. For research models, especially ones with attention heads that hate quantization, expect to spend real engineering time on calibration data and per-layer quantization configs to recover accuracy.
The conversion pipeline is real friction. You take an ONNX file, run it through the Hailo Dataflow Compiler, hand it a representative calibration dataset, and it spits out a .hef file you load on-device. For models from Hailo's Model Zoo this is a one-liner. For a custom model — say you trained your own YOLOv8 fine-tune — it is a half-day of compiler tuning the first time. Hailo links the Model Zoo and the compiler docs from its accelerator product page, and they are unusually good for a niche silicon vendor, but the friction is the cost. Plan for it.
Should you wait for the next revision?
The honest answer depends on what you are building.
If you are starting a 2026 project that needs on-device vision inference and you are committing to the Pi 5 platform, buy the AI HAT+ 26 TOPS now. It is the right hardware for the price tier, the software stack is mature enough to run in production, and there is no public Raspberry Pi/Hailo roadmap suggesting a successor in the next six months.
If you are a Pi 4 owner — including buyers of the very popular Raspberry Pi 4 Model B 8GB — this product is not for you, and the right move is to wait. Either wait for a USB or HAT-form-factor accelerator that targets the Pi 4's interfaces, or upgrade the host board to a Pi 5 when your build calls for it. Do not buy the AI HAT+ planning to "make it work" on a Pi 4. You will not.
If you are between a 13 TOPS Hailo-8L and a 26 TOPS Hailo-8, pay the $40 difference for the 8. The headroom is the product. The 8L exists for cost-sensitive single-model builds, but the 8 is what you want the moment you stack a second model or a second camera.
If you are between an AI HAT+ and a Jetson Orin Nano, the answer is the workload. Vision-only and edge-deployed at multiple nodes — AI HAT+. Workstation-style with mixed LLM and vision workloads on a single box — Jetson. You can quickly stack them side by side with our comparison view before committing the budget.
Related guides
- Raspberry Pi 4 8GB Local LLM: Tokens-Per-Second Benchmarks (2026)
- Self-Host Home Assistant on a Raspberry Pi 4 8GB (2026)
- Self-Host Jellyfin on a Raspberry Pi 4 8GB (2026)
- Count Anything AI Model for Edge Devices (2026 News)
Bottom line
The Raspberry Pi AI HAT+ 26 TOPS is the first sub-$120 accelerator that makes real-time, multi-stream vision inference on a hobbyist board feel boring instead of heroic. As of 2026 it is the right buy for a Pi 5–based security cam, a small fleet of edge vision nodes, or a retro-gaming upscaler. It is the wrong buy for LLM tinkering, for Pi 4 owners, or for anyone unwilling to invest a day in the Hailo compiler. Pay the extra $40 over the 13 TOPS variant, budget for the 27 W PSU and real airflow, and skip the Coral.
