What does the Raspberry Pi AI HAT+ at 26 TOPS actually run in production? Per the official Hailo-8L module page, the HAT pairs a Hailo-8L NPU with a Pi 5 over PCIe 2.0 x1 and delivers real-world performance well above what the Pi's Cortex-A76 cores can manage on inference. The catch: it runs models that have been compiled for the Hailo runtime, not arbitrary Hugging Face checkpoints.
What is actually on the board
The AI HAT+ is a small PCB that slots over the Pi 5's PCIe FFC connector. The compute silicon is Hailo's Hailo-8L, an int8-focused inference NPU rated at 13 TOPS at lower power and 26 TOPS in performance mode. The HAT itself includes nothing else of consequence - it is a clean coprocessor with the Pi providing CPU, RAM, networking, and storage.
The communication path matters: PCIe 2.0 x1 delivers roughly 500 MB/s of bandwidth between the Pi 5 SoC and the Hailo accelerator. That is fast enough for streaming-vision workloads but is the bottleneck for any model whose weights need to round-trip from system RAM.
Key takeaways
- 26 TOPS is a peak figure; real workload throughput lands at 40-100 percent of peak depending on model.
- Vision CNNs hit highest utilization - YOLO and ResNet families are the canonical fits.
- LLMs run only via Hailo's compiled runtime, not as drop-in PyTorch models.
- The HAT plus Pi 5 plus a SATA SSD lands near $225 total - roughly half a Jetson Orin Nano build.
- The PCIe lane is shared - you cannot put NVMe and the AI HAT+ on the same standard interface without stacked HATs.
What it actually runs: model coverage
The Hailo-8L's practical workload coverage comes from the Hailo Model Zoo - a curated set of pre-compiled models that ship with documented benchmarks. The official numbers separate cleanly into three buckets.
Bucket 1: Vision detection and classification (the strong fit)
This is where the Hailo-8L shines. Real-world community measurements consistently land near published numbers.
| Model | Task | Input resolution | Throughput |
|---|---|---|---|
| YOLOv8s | object detection | 640x640 | 200+ FPS |
| YOLOv8m | object detection | 640x640 | ~85 FPS |
| ResNet-50 | classification | 224x224 | ~600 FPS |
| EfficientNet-B0 | classification | 224x224 | 1,000+ FPS |
| MobileNet-V2 | classification | 224x224 | 1,400+ FPS |
| OpenPose | pose estimation | 256x256 | ~120 FPS |
The interesting takeaway: even mid-sized vision models hit 60+ FPS comfortably. That covers real-time video analytics on a single camera stream with headroom for two or three concurrent cameras.
Bucket 2: Audio and signal processing (decent fit)
Speech recognition and small audio classification models port cleanly. Wav2Vec2-Base for speech-to-text runs at roughly 5-10x real-time on short utterances. Audio event classification models like YAMNet run at hundreds of inferences per second. The fit is solid but less remarkable than vision because audio workloads have lower compute density.
Bucket 3: Small LLMs (the weak fit, with caveats)
Hailo publishes a small-LLM example for the Hailo-10H (the bigger sibling chip) but the Hailo-8L's LLM story is limited. Sub-1B parameter encoder models port; modern 3B+ generative chat models do not run on the HAT directly. For LLM work on a Pi 5, the HAT mostly sits idle while llama.cpp runs on the CPU.
Hardware build: the rig around the HAT
The complete kit looks like this:
- Pi 5 (8 GB) board - ~$80
- Official Raspberry Pi AI HAT+ (26 TOPS variant) - ~$70
- Active cooler or official Pi 5 fan - ~$10
- 1 TB SATA SSD on USB 3.0 - the Crucial BX500 1TB at ~$60
- Official 27 W USB-C power supply - ~$15
- USB camera or Pi Camera Module 3 - ~$25-35
Total before camera: ~$235. Add ~$30 for a decent camera and the kit lands near $265.
For builders working on tiny portable inference rigs rather than tethered ones, swap the Pi 5 + HAT for a Pi Zero W-based setup - lower power, smaller footprint, no NPU but adequate for non-real-time workloads. The Pi 4 8 GB (B0899VXM8F) does not host the HAT but remains the right pick for builds that need a battle-tested non-AI Linux board.
Quantization and model precision
The Hailo-8L is an int8-focused accelerator. Models are compiled to int8 (and sparingly int4 for specific layers) by the Hailo SDK before deployment. Precision tradeoffs are typically minor for mature vision models - ResNet-50 loses under 0.5 percent top-1 accuracy from fp32 to Hailo int8 in published benchmarks. Newer transformer-based vision models lose more (1-3 percent typical) and are still being improved by Hailo's compiler team.
For builders coming from a PyTorch background, the quantization step is the most surprising friction. You cannot just pip install the model. The Hailo SDK toolchain takes a trained model in ONNX or TFLite form and compiles it through a quantization-aware step into a Hailo-specific binary. The Model Zoo provides pre-compiled binaries that bypass this work for popular architectures.
Prefill vs streaming: workload patterns
For streaming workloads (live cameras, audio streams), the HAT acts as a continuous inference engine - frames in, predictions out, sub-100 ms latency. For prefill-heavy batch workloads (process 10,000 images once), the bottleneck is usually disk I/O from the SSD to the Pi's USB 3.0 bus, not the HAT itself.
For builders processing offline corpora, an upgrade from the budget BX500 SATA SSD to a faster NVMe drive via a different HAT can roughly double end-to-end batch throughput. The catch is that the PCIe lane is shared - dedicating it to NVMe means losing the AI HAT+. Real builders running both buy a stacked HAT solution from a third party.
Context-length impact: not applicable here
Vision and small audio workloads do not have a "context length" in the LLM sense. For builders trying to push small generative LLMs to the HAT, context length is sharply bounded by the Hailo runtime's memory model - typically 1K-2K tokens. The Pi 5's RAM is the limit, not the HAT's compute.
Local vs cloud for edge AI
For real-time vision tasks, edge wins decisively on latency and cost:
| Dimension | Pi 5 + AI HAT+ | Cloud vision API |
|---|---|---|
| Per-frame latency | sub-50 ms | round-trip + provider latency, 200-500 ms typical |
| Per-inference cost | electricity (~$0.00001) | $0.0001-$0.0015 per call |
| Network dependency | none | always |
| Privacy | full | provider-dependent |
| Model flexibility | Hailo Model Zoo set | any cloud model |
For sensor analytics, multi-camera home or shop monitoring, and any workload where latency to first prediction matters, edge wins clearly. Cloud still wins for one-off batch analysis where you need flexibility on model choice.
Performance per dollar and per watt
The AI HAT+ adds roughly 4-6 W under sustained load to the Pi 5's baseline. The complete rig draws 10-14 W typical and peaks near 18 W during simultaneous CPU and NPU load. Compared to a Jetson Orin Nano dev kit at 15 W peak, the perf-per-watt picture is roughly equivalent at this workload class.
Where the AI HAT+ wins is initial cost. A complete Pi 5 + HAT + storage + camera setup lands near $265 versus $450-550 for a Jetson Orin Nano kit with comparable storage and camera. For builders deploying multiple identical units (small fleet of edge cameras), the cost delta is the deciding factor.
Common pitfalls
- Expecting drop-in PyTorch compatibility. The Hailo toolchain compiles models. Plan for the compile step or stick to Model Zoo binaries.
- Pairing the HAT with a Pi 4. The Pi 4 lacks the PCIe interface the HAT needs. The HAT requires a Pi 5.
- Trying to also run NVMe storage off the same PCIe lane. Pick one, or buy a stacked-HAT solution.
- Skipping the active cooler. The Pi 5 throttles under sustained CPU load. Add the official cooler before you deploy.
- Loading too many concurrent camera streams. The Pi's USB 3.0 bus bandwidth becomes the bottleneck at four cameras for typical resolutions.
When the AI HAT+ is the wrong choice
Pass on the HAT if your workload is not vision or audio inference. For general-purpose Linux servers, NAS work, or any project that does not have an AI inference step on the critical path, you are paying $70 for silicon that will sit idle. For full generative LLM work, the HAT does not help - the Pi 5 CPU runs llama.cpp adequately but the HAT does not accelerate it.
Bottom line
The Pi AI HAT+ at 26 TOPS turns a Pi 5 into a credible edge-AI inference appliance for vision and audio tasks at a fraction of the cost of a Jetson kit. Pair it with the Pi 5 board's predecessor or successor as you prefer, a reliable 1 TB SSD, and a decent camera, and the complete kit lands near $265. Compile your models through the Hailo SDK or pick from the Model Zoo, and the HAT delivers what the spec sheet promises. Stay on a Pi Zero class board if your project does not need NPU acceleration in the first place.
Citations and sources
- Raspberry Pi - AI HAT+ official product page - official specifications and pricing.
- Hailo - Hailo-8L AI Acceleration Module - canonical silicon specifications and TOPS rating.
- Hailo Model Zoo on GitHub - published model coverage and benchmark numbers.
This piece is editorial synthesis based on publicly available information. No independent first-party benchmarking is reported.
