Skip to main content
Raspberry Pi AI HAT+ (26 TOPS): What It Actually Runs in 2026

Raspberry Pi AI HAT+ (26 TOPS): What It Actually Runs in 2026

The Hailo-8L based AI HAT+ adds 26 TOPS to a Pi 5. Here is the real workload coverage - vision, sub-3B LLMs, audio - and the rig that pairs with it.

The Raspberry Pi AI HAT+ adds a 26 TOPS Hailo-8L NPU to the Pi 5 via PCIe. Here is what it actually runs, the model toolchain, and the storage and power kit that completes the build.

What does the Raspberry Pi AI HAT+ at 26 TOPS actually run in production? Per the official Hailo-8L module page, the HAT pairs a Hailo-8L NPU with a Pi 5 over PCIe 2.0 x1 and delivers real-world performance well above what the Pi's Cortex-A76 cores can manage on inference. The catch: it runs models that have been compiled for the Hailo runtime, not arbitrary Hugging Face checkpoints.

What is actually on the board

The AI HAT+ is a small PCB that slots over the Pi 5's PCIe FFC connector. The compute silicon is Hailo's Hailo-8L, an int8-focused inference NPU rated at 13 TOPS at lower power and 26 TOPS in performance mode. The HAT itself includes nothing else of consequence - it is a clean coprocessor with the Pi providing CPU, RAM, networking, and storage.

The communication path matters: PCIe 2.0 x1 delivers roughly 500 MB/s of bandwidth between the Pi 5 SoC and the Hailo accelerator. That is fast enough for streaming-vision workloads but is the bottleneck for any model whose weights need to round-trip from system RAM.

Key takeaways

  • 26 TOPS is a peak figure; real workload throughput lands at 40-100 percent of peak depending on model.
  • Vision CNNs hit highest utilization - YOLO and ResNet families are the canonical fits.
  • LLMs run only via Hailo's compiled runtime, not as drop-in PyTorch models.
  • The HAT plus Pi 5 plus a SATA SSD lands near $225 total - roughly half a Jetson Orin Nano build.
  • The PCIe lane is shared - you cannot put NVMe and the AI HAT+ on the same standard interface without stacked HATs.

What it actually runs: model coverage

The Hailo-8L's practical workload coverage comes from the Hailo Model Zoo - a curated set of pre-compiled models that ship with documented benchmarks. The official numbers separate cleanly into three buckets.

Bucket 1: Vision detection and classification (the strong fit)

This is where the Hailo-8L shines. Real-world community measurements consistently land near published numbers.

ModelTaskInput resolutionThroughput
YOLOv8sobject detection640x640200+ FPS
YOLOv8mobject detection640x640~85 FPS
ResNet-50classification224x224~600 FPS
EfficientNet-B0classification224x2241,000+ FPS
MobileNet-V2classification224x2241,400+ FPS
OpenPosepose estimation256x256~120 FPS

The interesting takeaway: even mid-sized vision models hit 60+ FPS comfortably. That covers real-time video analytics on a single camera stream with headroom for two or three concurrent cameras.

Bucket 2: Audio and signal processing (decent fit)

Speech recognition and small audio classification models port cleanly. Wav2Vec2-Base for speech-to-text runs at roughly 5-10x real-time on short utterances. Audio event classification models like YAMNet run at hundreds of inferences per second. The fit is solid but less remarkable than vision because audio workloads have lower compute density.

Bucket 3: Small LLMs (the weak fit, with caveats)

Hailo publishes a small-LLM example for the Hailo-10H (the bigger sibling chip) but the Hailo-8L's LLM story is limited. Sub-1B parameter encoder models port; modern 3B+ generative chat models do not run on the HAT directly. For LLM work on a Pi 5, the HAT mostly sits idle while llama.cpp runs on the CPU.

Hardware build: the rig around the HAT

The complete kit looks like this:

  • Pi 5 (8 GB) board - ~$80
  • Official Raspberry Pi AI HAT+ (26 TOPS variant) - ~$70
  • Active cooler or official Pi 5 fan - ~$10
  • 1 TB SATA SSD on USB 3.0 - the Crucial BX500 1TB at ~$60
  • Official 27 W USB-C power supply - ~$15
  • USB camera or Pi Camera Module 3 - ~$25-35

Total before camera: ~$235. Add ~$30 for a decent camera and the kit lands near $265.

For builders working on tiny portable inference rigs rather than tethered ones, swap the Pi 5 + HAT for a Pi Zero W-based setup - lower power, smaller footprint, no NPU but adequate for non-real-time workloads. The Pi 4 8 GB (B0899VXM8F) does not host the HAT but remains the right pick for builds that need a battle-tested non-AI Linux board.

Quantization and model precision

The Hailo-8L is an int8-focused accelerator. Models are compiled to int8 (and sparingly int4 for specific layers) by the Hailo SDK before deployment. Precision tradeoffs are typically minor for mature vision models - ResNet-50 loses under 0.5 percent top-1 accuracy from fp32 to Hailo int8 in published benchmarks. Newer transformer-based vision models lose more (1-3 percent typical) and are still being improved by Hailo's compiler team.

For builders coming from a PyTorch background, the quantization step is the most surprising friction. You cannot just pip install the model. The Hailo SDK toolchain takes a trained model in ONNX or TFLite form and compiles it through a quantization-aware step into a Hailo-specific binary. The Model Zoo provides pre-compiled binaries that bypass this work for popular architectures.

Prefill vs streaming: workload patterns

For streaming workloads (live cameras, audio streams), the HAT acts as a continuous inference engine - frames in, predictions out, sub-100 ms latency. For prefill-heavy batch workloads (process 10,000 images once), the bottleneck is usually disk I/O from the SSD to the Pi's USB 3.0 bus, not the HAT itself.

For builders processing offline corpora, an upgrade from the budget BX500 SATA SSD to a faster NVMe drive via a different HAT can roughly double end-to-end batch throughput. The catch is that the PCIe lane is shared - dedicating it to NVMe means losing the AI HAT+. Real builders running both buy a stacked HAT solution from a third party.

Context-length impact: not applicable here

Vision and small audio workloads do not have a "context length" in the LLM sense. For builders trying to push small generative LLMs to the HAT, context length is sharply bounded by the Hailo runtime's memory model - typically 1K-2K tokens. The Pi 5's RAM is the limit, not the HAT's compute.

Local vs cloud for edge AI

For real-time vision tasks, edge wins decisively on latency and cost:

DimensionPi 5 + AI HAT+Cloud vision API
Per-frame latencysub-50 msround-trip + provider latency, 200-500 ms typical
Per-inference costelectricity (~$0.00001)$0.0001-$0.0015 per call
Network dependencynonealways
Privacyfullprovider-dependent
Model flexibilityHailo Model Zoo setany cloud model

For sensor analytics, multi-camera home or shop monitoring, and any workload where latency to first prediction matters, edge wins clearly. Cloud still wins for one-off batch analysis where you need flexibility on model choice.

Performance per dollar and per watt

The AI HAT+ adds roughly 4-6 W under sustained load to the Pi 5's baseline. The complete rig draws 10-14 W typical and peaks near 18 W during simultaneous CPU and NPU load. Compared to a Jetson Orin Nano dev kit at 15 W peak, the perf-per-watt picture is roughly equivalent at this workload class.

Where the AI HAT+ wins is initial cost. A complete Pi 5 + HAT + storage + camera setup lands near $265 versus $450-550 for a Jetson Orin Nano kit with comparable storage and camera. For builders deploying multiple identical units (small fleet of edge cameras), the cost delta is the deciding factor.

Common pitfalls

  • Expecting drop-in PyTorch compatibility. The Hailo toolchain compiles models. Plan for the compile step or stick to Model Zoo binaries.
  • Pairing the HAT with a Pi 4. The Pi 4 lacks the PCIe interface the HAT needs. The HAT requires a Pi 5.
  • Trying to also run NVMe storage off the same PCIe lane. Pick one, or buy a stacked-HAT solution.
  • Skipping the active cooler. The Pi 5 throttles under sustained CPU load. Add the official cooler before you deploy.
  • Loading too many concurrent camera streams. The Pi's USB 3.0 bus bandwidth becomes the bottleneck at four cameras for typical resolutions.

When the AI HAT+ is the wrong choice

Pass on the HAT if your workload is not vision or audio inference. For general-purpose Linux servers, NAS work, or any project that does not have an AI inference step on the critical path, you are paying $70 for silicon that will sit idle. For full generative LLM work, the HAT does not help - the Pi 5 CPU runs llama.cpp adequately but the HAT does not accelerate it.

Bottom line

The Pi AI HAT+ at 26 TOPS turns a Pi 5 into a credible edge-AI inference appliance for vision and audio tasks at a fraction of the cost of a Jetson kit. Pair it with the Pi 5 board's predecessor or successor as you prefer, a reliable 1 TB SSD, and a decent camera, and the complete kit lands near $265. Compile your models through the Hailo SDK or pick from the Model Zoo, and the HAT delivers what the spec sheet promises. Stay on a Pi Zero class board if your project does not need NPU acceleration in the first place.

Citations and sources

This piece is editorial synthesis based on publicly available information. No independent first-party benchmarking is reported.

Products mentioned in this article

Tap any product for full specs, live Amazon & eBay pricing, and alternatives.

SpecPicks earns a commission on qualifying purchases through both Amazon and eBay affiliate links. Prices and stock update independently.

Frequently asked questions

What does 26 TOPS actually mean for real workloads?
TOPS is a peak theoretical throughput number for int8 operations. Real-world model performance lands at a fraction of peak because models are memory-bound, not compute-bound, on most edge NPUs. For the Hailo-8L specifically, expect roughly 40-100 percent of peak depending on the model architecture - vision CNNs hit higher utilization than transformer encoders or generative LLMs.
Can I run a real LLM on the AI HAT+?
Yes, but with significant caveats. The Hailo toolchain supports specific compiled model architectures rather than running arbitrary Hugging Face checkpoints. Sub-3B encoder models and small generative models that have been ported to the Hailo runtime work. General Llama-3-8B-class chat models do not run on the HAT directly today; they run on the Pi 5 CPU with the HAT idle.
Does the AI HAT+ work on the Pi 4?
Officially no. The HAT requires the PCIe interface that the Pi 5 added; the Pi 4 lacks PCIe entirely. The [Pi 4 8GB](/product/B0899VXM8F?tag=specpicks-articles-20) is still a great general-purpose board but cannot host the AI HAT+ without the kind of unsupported adapters that defeat the point of using official Pi hardware.
What is the right SSD pair with a Pi 5 and AI HAT+?
A SATA SSD via USB 3.0 like the [Crucial BX500 1TB](/product/B07YD579WM?tag=specpicks-articles-20) is the simplest pairing - it leaves the PCIe lane available for the AI HAT+ rather than burning it on an M.2 NVMe HAT. For builders who want both NVMe storage and the AI HAT+, the official PoE+HAT path or a stacked-HAT solution exists but adds physical complexity to the build.
How does this compare to a Jetson Orin Nano?
The Jetson Orin Nano's GPU-NPU combo delivers higher raw throughput (40 TOPS vs 26 TOPS) and natively runs PyTorch and CUDA, so model porting is dramatically easier. The AI HAT+ wins on price (about half the Jetson's cost when paired with a Pi 5) and on power draw (sub-12 W combined vs 15 W for the Jetson). For straightforward classification and detection workloads, the AI HAT+ is the better value.

Sources

— SpecPicks Editorial · Last verified 2026-06-10

More guides & deep dives from the SpecPicks archive

Browse all articles & guides →

More reviews from the SpecPicks archive

Browse all reviews →