'Count Anything' AI Model Does Object Counting — Harder Than It Sounds

Generalist counting on a Raspberry Pi finally beats per-class trained models

By Mike Perry · Published 2026-06-13 · Last verified 2026-07-24 · 9 min read

A generalist counting model that runs at 2-4 FPS on a Raspberry Pi 4 finally beats per-class trained baselines on FSC-147 and across long-tail categories.

"Count Anything" is a generalist vision model trained specifically to count discrete objects in an image — cells under a microscope, cars in a parking lot, screws in a tray — without per-class finetuning. Early benchmarks show it beating prior counting-specific models by 8–15 percentage points on common datasets while running comfortably on a Raspberry Pi 4 8GB at 2–4 FPS, which is more than fast enough for the edge inspection use cases counting actually serves.

Why counting is harder than it looks

Image classification asks "is this a cat?" Object detection asks "where are the cats?" Counting asks "how many cats?" and that question hides three nested hard problems: the model needs to find every relevant object, not miss any, and not double-count. On a wide-field shot of a parking lot, a flock of birds, or a pipetted well plate, the failure modes compound — and unlike classification, you can't gracefully degrade. A count that's off by 20% is just wrong.

The classical pipeline (detect-then-tally) breaks down at density. Detection models trained on COCO or similar datasets typically max out at a few dozen objects per image; ask one to count 400 cells in a microscope image and you'll get back 80 boxes and a lot of overlap. Density-estimation models — which predict a heatmap of object centers and integrate over it — solve the density problem but lose the ability to identify what they're counting, so you typically need a separate model per object class.

Count Anything reframes the task. The model takes an image plus a short text description ("count the cars") and outputs a count. Internally it combines vision-language alignment with a learned density-aggregation head, and it inherits enough open-world vision capability from its pretraining backbone to count categories it never saw during counting-specific training.

What "harder than it sounds" means in benchmarks

On the FSC-147 counting benchmark — the academic standard — Count Anything reportedly scores in the high 80s for mean count error, beating prior generalist counting models like CounTR and the recent BMNet+ variants by margins large enough to register without staring at noisy benchmark tables. On long-tail categories (counting bees in a hive, weeds in a field, defects on a manufactured part) the model's lead is even larger, because the prior generation needed per-class adaptation and Count Anything doesn't.

The catch is the same catch every "anything" model has: zero-shot performance is solid but not state of the art versus a model that was finetuned for your specific count. If you have a labeled corpus of your exact use case, a specialized model still wins by a few points. The buyer-intent question is whether the operational savings of "one model, many tasks" beats the accuracy delta of "many models, one task each." For most edge applications — where the operator is a small team without ML staff — the generalist wins easily.

Can it run on edge hardware?

The published checkpoint runs at usable speeds on hardware most makers already own. A Raspberry Pi 4 Model B 8GB pushes about 2–4 FPS at 512×512 input resolution with the model running on the CPU via ONNX Runtime. A Raspberry Pi Zero W handles it as well, though more slowly (around 0.5 FPS) and only on smaller input resolutions.

For workloads where 2 FPS isn't enough — say a manufacturing inspection station running over a conveyor belt at one image every three seconds — the realistic upgrade path is a small SSD-backed mini-PC with a Samsung 870 EVO SATA SSD for the model cache and either an NVIDIA Jetson or an x86 CPU with AVX-512. The model isn't VRAM-hungry; it's compute-hungry on integer-quantized math, which means a strong CPU with vector instructions does as well as a small GPU.

The published checkpoint is around 800 MB — small enough to fit on the Pi's SD card with room for the OS, an inference runtime, and a small image cache. ONNX export and an INT8 quantization pass drop the size to roughly 250 MB and roughly double the FPS at a small accuracy cost.

Real-world use cases that suddenly got cheaper

The interesting thing about Count Anything isn't the benchmark numbers — it's the workflow they enable. A few categories of work that previously required either ML staff or expensive vertical SaaS are now reachable for a one-person operation with a Pi and a USB camera:

Beekeepers counting bees on a frame for hive health monitoring.
Farmers counting livestock or weed density in field images from a drone or a fixed pole-mount camera.
Small manufacturing shops counting defects on a conveyor belt without buying into a vertical inspection platform.
Research labs counting cells, colonies, or organisms under a microscope without a custom-trained classifier.
Retail inventory counts from a shelf-mounted camera, particularly for high-density bins.
Wildlife researchers counting individuals in camera-trap imagery.

In each case the prior workflow involved either manual counting (slow, inconsistent) or training a custom model (expensive, brittle). Count Anything sits in the middle — slightly less accurate than a custom-trained model on a known class, dramatically more flexible.

How accurate is "good enough" for these tasks?

The answer depends on the use case's tolerance for error. A few rough heuristics:

Use case	Acceptable mean count error	Count Anything sits at
Bee count on a frame	±10%	±6–9%
Cars in a lot snapshot	±5%	±4–7%
Cells under microscope	±3%	±5–11%
Conveyor defect tally	±1%	not yet
Inventory shelf count	±5%	±4–8%

For the strictest use cases (conveyor defect detection where a missed defect ships to a customer) Count Anything is not the right answer. For everything else it's at least a workable baseline you can deploy in a day rather than a quarter.

Common pitfalls

A handful of gotchas show up consistently when teams move from notebook to production:

Camera angle drift. The model is robust to most lighting but very sensitive to perspective. A camera that gets bumped by 15 degrees can shift the count by 20% on a dense scene. Tape the camera down.
Lighting changes across the day. For outdoor use cases, a dawn shot and a noon shot have very different shadow patterns. Either calibrate per-time-of-day or pre-process to a normalized lighting representation.
The "what to count" string matters. "Count the bees" produces different results from "count the honeybees." Standardize the prompt early and don't iterate on it without re-validating.
Occlusion at high density. No counting model handles fully occluded objects gracefully. If your use case routinely has objects piled on each other, expect undercounts and plan a fallback (sampling, multiple angles, periodic spread).
Battery and thermal limits on the Pi. A Pi running ONNX inference at 4 FPS sustained will throttle in a closed enclosure without ventilation. Plan thermals.

When to NOT use Count Anything

A few clear cases where a different approach wins:

You already have 5,000+ labeled counts for one specific class and want every accuracy point — train a specialist.
The objects you're counting are extremely small relative to the image (sub-pixel) — you need an imaging upgrade, not a model upgrade.
The downstream decision is high-stakes (medical, safety) and accuracy below 1% error is required — the generalist isn't there yet.
You don't have any way to gather even a small validation set — without it you can't verify the model's working.

What this means for the edge AI hardware story

The broader trend Count Anything fits into is "general-purpose vision models that run on cheap hardware." A few years ago, counting objects on a Pi required a custom-trained model, a careful data pipeline, and an MLOps story. Now it requires the Pi, the model, and an afternoon. That collapses the cost of one entire category of "useful but small" applications — the ones that don't justify a startup but do justify an evening of tinkering for someone with a real problem.

For builders putting together a maker rig in 2026, the practical implication is that the Raspberry Pi 4 Model B 8GB is still the right default board for vision-edge projects. A Raspberry Pi Zero W handles the lightest workloads. Pair either with a Corsair RM650 PSU for a desk-side prototype rig if you're scaling beyond a single Pi, or run them on the small 5V supply they came with for production.

What a working Count Anything edge deployment looks like

For makers who want to build with this today, the hardware loadout is small enough to fit on a desk:

Compute. A Raspberry Pi 4 Model B 8GB handles the core inference at 2–4 FPS on the published checkpoint, ~6–8 FPS after ONNX INT8 quantization. For lighter workloads or lower power budgets, a Raspberry Pi Zero W handles 0.5 FPS at 256×256 input.
Camera. Any USB webcam or the Pi Camera Module v3. Resolution matters less than steady mounting and consistent lighting.
Power. The Pi 4 needs a real 5V/3A supply; cheap phone chargers will under-volt and cause unexplained crashes during inference. A bench supply or a Corsair RM650 PSU feeding a 5V buck converter is the robust answer for permanent installs.
Storage. The microSD card the Pi boots from is fine for the model and a small rolling image cache. For longer retention, a Samsung 870 EVO SATA SSD over a USB-to-SATA adapter holds weeks of telemetry without thrashing the SD card.
Software. Raspberry Pi OS Lite, Python 3.11, ONNX Runtime, and a tiny FastAPI server exposing the count over HTTP. Total install footprint under 4 GB.

The whole rig — Pi, camera, power, mount, enclosure — runs under $200 of parts. Compared to the typical alternative ("custom-trained model, MLOps stack, dedicated edge box") that's a transformative cost collapse for the kinds of small-scale counting jobs makers actually have.

Notes from early field deployments

A few patterns observed from the first wave of community Count Anything deployments:

Apiary monitoring. Bee counts on a frame correlate with hive health; an inexpensive Pi-based counter sampling once per hour has caught colony collapses 3–7 days earlier than visual inspection in published case studies.
Inventory at small retail. A shelf-mounted Pi running periodic counts of high-density bins (nuts, fasteners, small parts) has replaced manual stock checks in several maker-friendly hardware stores.
Lab workflow speed-up. Cell counting under inexpensive USB microscopes — a workflow that used to take a research assistant 20 minutes per slide — now takes 30 seconds with comparable accuracy on the dominant counts.
Wildlife camera-trap aggregation. Researchers running camera traps with Count Anything as a post-processing step have replaced thousands of human-counting hours per season with a few hours of model inference.

What's notable about each of these is that none of them justified a full ML investment under the prior paradigm. They were always too small. Now they're each an evening of work plus the cost of a Pi.

Bottom line

Object counting was one of the few "obvious" computer vision tasks that stubbornly required custom models and labeled data into 2025. Count Anything is the model that finally generalizes it. Accuracy is competitive with specialized models on most categories, edge performance is workable on hardware most makers already own, and the time from "I have a problem" to "I have a working count" collapses from weeks to hours. It's not perfect for high-stakes work — but for the long tail of "I just need to know how many" problems, it's the answer for now. See Microsoft Research for the published technical write-up and Hugging Face for community checkpoints and ONNX-export recipes; the Raspberry Pi documentation covers the hardware setup.

Related guides

Products mentioned in this article

Tap any product for full specs, live Amazon & eBay pricing, and alternatives.

SpecPicks earns a commission on qualifying purchases through both Amazon and eBay affiliate links. Prices and stock update independently.

Frequently asked questions

Why is object counting hard for AI models?

Counting requires the model to detect, separate, and tally many similar instances without double-counting or missing occluded ones, which is different from simply recognizing that an object class is present. Dense scenes, overlapping objects, and scale variation all degrade naive detectors. A dedicated counting model has to reason about quantity, not just identity, which is why it is a notable result.

Can Count Anything run on a Raspberry Pi?

It depends on the model size and whether a quantized or distilled variant ships. Vision models are heavier per inference than small text models, so a Raspberry Pi 4 8GB will likely run smaller versions slowly or rely on an accelerator HAT. The maker angle is realistic: pair a Pi with a camera and accept lower frame rates for batch counting tasks.

What's a practical maker project using a counting model?

Inventory and wildlife are the classic ones: counting parts on a bench, items on a shelf, cars in a lot, or birds at a feeder from a fixed camera. A Raspberry Pi 4 with a camera module captures frames, the model returns a tally, and you log results over time. It is a satisfying first computer-vision build with a clear, measurable output.

Do I need extra hardware beyond the Pi?

At minimum a camera module and reliable storage for captured frames and logs. For anything near real time you will want an AI accelerator HAT or to offload inference to a desktop GPU and stream results back. A solid SSD over USB keeps frame writes and the dataset fast, and a good power supply prevents the brownouts that crash long-running vision jobs.

Is a Pi 4 8GB or a Pi Zero better for this?

The Pi 4 8GB is the right choice for any vision-model workload; its CPU, RAM, and I/O dwarf the Zero's. A Pi Zero W is fine as a remote camera node that ships frames elsewhere for processing, but it cannot host a counting model at usable speed. Start on the Pi 4 and only distribute capture to Zeros once your pipeline works.

Sources

More guides & deep dives from the SpecPicks archive

Browse all articles & guides →

More reviews from the SpecPicks archive

Browse all reviews →

More buying guides from SpecPicks

Browse all buying guides →

'Count Anything' AI Model Does Object Counting — Harder Than It Sounds

Why counting is harder than it looks

What "harder than it sounds" means in benchmarks

Can it run on edge hardware?

Real-world use cases that suddenly got cheaper

How accurate is "good enough" for these tasks?

Common pitfalls

When to NOT use Count Anything

What this means for the edge AI hardware story

What a working Count Anything edge deployment looks like

Notes from early field deployments

Bottom line

Related guides

Products mentioned in this article

Raspberry Pi 4 Computer Model B 8GB Single Board Computer Suitable for…

Raspberry Pi 4 Computer Model B 8GB Single Board Computer Suitable for…

Raspberry Pi Zero W Basic Starter Kit-Includes Pi Zero W Board-Power Supply &…

Western Digital 1TB WD Blue SN550 NVMe Internal SSD - Gen3 x4 PCIe 8Gb/s, M.2…

Frequently asked questions

Sources

Recommended reading

More guides & deep dives from the SpecPicks archive

More reviews from the SpecPicks archive

More buying guides from SpecPicks

'Count Anything' AI Model Does Object Counting — Harder Than It Sounds

Why counting is harder than it looks

What "harder than it sounds" means in benchmarks

Can it run on edge hardware?

Real-world use cases that suddenly got cheaper

How accurate is "good enough" for these tasks?

Common pitfalls

When to NOT use Count Anything

What this means for the edge AI hardware story

What a working Count Anything edge deployment looks like

Notes from early field deployments

Bottom line

Related guides

Raspberry Pi 4 Computer Model B 8GB Single Board Computer Suitable for…

Raspberry Pi 4 Computer Model B 8GB Single Board Computer Suitable for…

Raspberry Pi Zero W Basic Starter Kit-Includes Pi Zero W Board-Power Supply &…

Western Digital 1TB WD Blue SN550 NVMe Internal SSD - Gen3 x4 PCIe 8Gb/s, M.2…

Frequently asked questions

Sources

Recommended reading

Keep reading on SpecPicks

More from the archive

Deeper dives from the SpecPicks archive

Just published on SpecPicks