Inside reusable-agents: A Self-Hostable LLM Agent Framework Driving Two Live Consumer Sites in 2026

Inside reusable-agents: A Self-Hostable LLM Agent Framework Driving Two Live Consumer Sites in 2026

An open-source MIT-licensed framework powers SpecPicks + AislePrompt — synthesizing the public design with home-lab GPU recommendations from $2K to $14K.

reusable-agents (github.com/voidsstr/reusable-agents) is the open-source framework powering SpecPicks's content pipeline. We synthesize its public architecture, validated benchmarks, and a hardware buying guide for self-hosting it from a single RTX 5090 up to multi-node Ollama clusters.

As an Amazon Associate, SpecPicks earns from qualifying purchases. Some hardware listed is sold primarily on eBay; affiliate links to both marketplaces are included where each is appropriate. See our review methodology.

Inside reusable-agents: A Self-Hostable LLM Agent Framework Driving Two Live Consumer Sites in 2026

By Mike Perry · Published May 6, 2026 · Last verified May 6, 2026 · 18 min read

The short answer

reusable-agents (github.com/voidsstr/reusable-agents) is an open-source MIT-licensed framework for running scheduled, LLM-driven agents with shared memory, human-in-the-loop confirmations, and a control dashboard. The repository's public README cites aisleprompt.com (an AI meal planner that builds your Instacart cart) as a live deployment running entirely on the framework's automation pipeline — agents curate recipes, optimize SEO, generate articles, and ship code edits to the Vite/React frontend on a 2-hour cron cycle. This article synthesizes the framework's public design (sourced from its README, docs/architecture.md, and recent commit history) with publicly available benchmarks for the GPUs that make sense for self-hosting it at home — from a single RTX 5090 box up to a multi-node Ollama cluster.

Key takeaways

  • Architecture (per the public README): every registered agent subclasses AgentBase, gets a per-run lifecycle (status updates → decision log → run-index entry → goal-progress recording), and writes all state to either Azure Blob Storage or local filesystem.
  • LLM provider chain (per framework/core/code_editor.py in the repo): a fall-through chain of code-editor backends — jcode-copilotaider-copilot-proxyaider-github-copilotopencode-azurecrush-azureaider-azurecodex-azurejcode-azurejcode-ollama. The chain runs first to last; the first backend that returns rc=0 with a non-empty file diff wins.
  • Local fallback works. Per commit d92d687 (May 6, 2026) in the public repo, the maintainer documented an end-to-end validation run: with the cloud chain skipped (IMPLEMENTER_FORCE_FALLBACK=1) and jcode-ollama pinned, devstral-small-2:24b running on a local Ollama server applied a 1-line meta-description trim through jcode's tool-calling harness, the implementer committed the change, and the deployer pushed a new container app revision. The commit message records the full pipeline timeline.
  • Hardware-tier reality: the public commit messages reference an RTX 5090 32 GB primary box with qwen3-coder:30b and devstral-small-2:24b pulled as the maintainer's documented dev rig. Per the framework's CLAUDE.md, this hardware is sufficient for the cloud chain (Copilot → Azure → Ollama fallback) plus local Ollama for the final fallback step.
  • Scaling paths covered below: (1) per-box VRAM upgrade (RTX 5090 → RTX A6000 → RTX PRO 6000 Blackwell), (2) multi-box Ollama distribution (cheap consumer cards aggregated), and (3) the Mac Studio M3 Ultra alternative for capacity-bound workloads.

What reusable-agents is, in plain terms

The repo's README.md describes the framework as "a self-hostable framework for running LLM-driven agents with shared memory, scheduled execution, human-in-the-loop confirmations, and a control dashboard."

Key public design decisions (sourced from docs/architecture.md):

  1. Agents live in their own repos and POST a manifest.json to the framework instance's HTTP API. The framework auto-creates a systemd --user timer + service unit so the agent runs on its declared cron schedule.
  2. All state is persisted to a pluggable storage backend (Azure Blob in production, LocalFilesystem for tests). This includes status, decision logs, per-run artifacts (recommendations, emails, code diffs), confirmation records, and goal-progress history.
  3. A FastAPI service + React dashboard ship in framework/api/ and framework/ui/ respectively. The dashboard surfaces live agent status via WebSocket, a dependency graph, an inbox for confirmations, an implementer queue, and a goals view.
  4. The implementer agent is the framework's code-editor — it takes approved recommendations and applies them as commits to the target site repo. It uses a fallback chain of LLM-driven code-editor binaries (the chain is configurable in storage at config/code-editor-config.json).

The repo's first paragraph in the README states the design intent clearly: "Most agent systems are monoliths. You install one product and your agents have to live inside it. This framework inverts the relationship: your agent code lives in your own repo. The framework runs next to your agents and provides the cross-cutting infrastructure."

For a home-lab operator, the practical implication is that you do not need to write any framework code to publish agents. You write your agent's agent.py (subclassing AgentBase) plus a manifest.json, register it once with bash install/register-agent.sh, and the framework takes over.


The dispatch pipeline — what actually happens when an agent finishes a run

Per framework/core/dispatch.py (cite: the source file in the public repo), the canonical end-of-run path for an agent that produces work for the implementer looks like this:

Producer agent → dispatch_now() → site lock → systemd-run scope → implementer
                                                                      ↓
                                                                   deployer (per site)

Step-by-step (sourced from the file's docstring + the related CLAUDE.md):

  1. Producer agent finishes its run() and decides which recommendations should ship. It calls framework.core.dispatch.dispatch_now(...) with a list of rec ids and a path to the run-dir.
  2. Site lock acquired via framework.core.locks.site_dispatch_lock so two dispatches against the same site don't trample each other; different sites run in parallel.
  3. systemd-run --user --scope spawns the implementer in a detached scope so the implementer outlives the producer's process.
  4. Implementer's run.sh drives an LLM session against AGENT.md (the implementer's runbook) using either the claude-pool path (subscription-billed Claude Max accounts via the claude CLI) or the framework's code_editor fallback chain.
  5. code_editor chain tries each backend in order until one returns rc=0 with a non-empty file diff. The chain (per the public default config) is jcode-copilotaider-copilot-proxyaider-github-copilotopencode-azurecrush-azureaider-azurecodex-azurejcode-azurejcode-ollama.
  6. Commit + tag under agents/<id>/release/<run-ts> so every shipped change is traceable.
  7. Deployer chain runs test → build → push → deploy → smoke per the recipe in examples/deployer/azure-container-apps.yaml. The deployer is cloud-agnostic: any shell command works in the recipe.

Validated outcomes from the public commit log: commit f9d6415 in the repository's connected example deployment documents a 1-line meta-description trim that fully ran through the framework's local-only chain (jcode-ollama + devstral-small-2:24b). Commit bd0b0e8 shows the deployer's shipped-marker bug fix arriving via the same pipeline. Both are MIT-licensed open-source records and reproducible by anyone forking the repo.


Hardware — what does it take to self-host?

This is the section most home-lab operators care about. The framework itself is light on compute (the FastAPI service + dashboard run on a small Container App; the heavy work is the LLM calls). What matters is the hardware you use for the local fallback step in the code-editor chain and for any agents that prefer self-hosted models over the subscription/cloud paths.

The maintainer's setup, per public commit history

The repo's CLAUDE.md and the recent commits (specifically d92d687 switching the jcode-ollama default model to devstral-small-2:24b) document the maintainer's primary box as having an NVIDIA RTX 5090 (32 GB GDDR7), 32 GB VRAM total. The local Ollama instance has the following coding-relevant models pulled per the commit message:

  • qwen3-coder:30b (~18 GB at Q4) — Qwen 3 coder, 30B total params with MoE 3B active
  • qwen2.5-coder:32b (~19 GB at Q4) — proven older coder
  • devstral-small-2:24b (~15 GB) — Mistral Devstral 2, agentic-coding-tuned. Promoted to default on May 6, 2026 after qwen3-coder:30b returned rc=0 / files_changed=0 in 60s on a 1-line edit while devstral-small-2 made the edit cleanly.

For a single-box self-host today, the public design supports anything from a 16 GB consumer GPU running a smaller coder model up through a 96 GB pro card running 70B+ inference natively. The chain only invokes the local model when the cloud chain has exhausted its retries, so a smaller local card is acceptable.

Tier 1 — the maintainer's tier (RTX 5090 32 GB)

SpecRTX 5090 (Blackwell)
VRAM32 GB GDDR7
Memory bandwidth1,792 GB/s
TDP575 W
MSRP$1,999
What it runs locallyqwen3-coder:30b, devstral-small-2:24b, qwen2.5-coder:32b — all with headroom
Where to buyAmazon (ZOTAC RTX 5090 Solid OC, B0F1YG5STN) · eBay search

This is the documented dev-box tier. According to commit d92d687's benchmark notes, devstral-small-2:24b on this card completes a multi-step agentic edit in roughly 7 minutes wall-time inside jcode's harness; qwen3-coder:30b returned in 60s but failed to engage the Edit tool on the same prompt (cited as the reason for the model swap).

Tier 2 — capacity scaling per box (RTX A6000 48 GB)

The next step up is more VRAM in a single slot, not raw speed. The NVIDIA RTX A6000 is the cheapest single-card path to 48 GB and the only sub-$5,000 workstation card with NVLink for two-card scaling.

SpecRTX A6000 (Ampere)
VRAM48 GB GDDR6
Memory bandwidth768 GB/s
TDP300 W (1U-blower form factor)
MSRP$4,650 (used market $3,800–$4,800 per active eBay listings)
NVLinkYes (112 GB/s peer-to-peer)
What it runs nativelyLlama 3.3 70B Q4_K_M (~43 GB used), DeepSeek-R1 70B, Qwen 72B — all without offload
Public benchmarksPer DatabaseMart's A6000 Ollama benchmark: Llama 3.3 70B at 13.56 tok/s gen, DeepSeek-R1 70B at 13.65 tok/s. Per OpenLLM Benchmarks: Llama 3 8B Q4_K_M at 102.22 tok/s.
Where to buyeBay (primary, used $3,800-$4,800) · Amazon (intermittent third-party listings)

For agent self-hosting, the A6000 is overkill on raw speed (Ampere is two architectures behind Blackwell) but transformative on capacity: the framework's article-author agent can run a 70B local model as the proposer for long-form content generation without ever touching the cloud chain.

Tier 3 — top-of-stack per box (RTX PRO 6000 Blackwell 96 GB)

SpecRTX PRO 6000 Blackwell
VRAM96 GB GDDR7
Memory bandwidth1,792 GB/s
TDP300 W
FP4 / FP8Yes (5th-gen tensor cores)
MSRP$8,499
What it runs nativelyLlama 3.1 405B Q4 on a single card (~99 GB used per llm-tracker's RTX PRO 6000 measurements), Qwen3 235B MoE, DeepSeek V3-0324 685B at MLX 4-bit
Where to buyAmazon (PNY Server Edition B0FXMY871V) · eBay search

This is the only single-card desktop GPU available at retail in 2026 that loads 405B Q4 in one box without offload, per the cited llm-tracker source.


Multi-box scaling — distributing agents across machines with Ollama

The framework's design has nothing in it that requires a single host. Per docs/architecture.md, the API service speaks HTTP to anything that POSTs a manifest, and the host-worker that dispatches agents is independent of where individual LLM calls land. The cleanest multi-box pattern is to run Ollama on a separate machine and point the framework's code-editor-config.json at it via an OLLAMA_HOST env var.

Public Ollama distributed-inference patterns

Ollama's public docs (https://ollama.com/blog/openai-compatibility) describe a model-serving server that listens on :11434 and exposes an OpenAI-compatible /v1/chat/completions endpoint. The framework's JcodeBackend consumes that endpoint via jcode --provider ollama per the code_editor.py source. To distribute load:

  1. Run one Ollama server per GPU on a dedicated inference box.
  2. Front them with a small reverse proxy (caddy / nginx / a custom round-robin Python service) listening on :11434 from the perspective of the framework host.
  3. Set OLLAMA_HOST=http://<your-proxy>:11434 before the framework's code-editor chain dispatches.

The public reference for this pattern is Ollama's distributed-inference issue tracker — multiple users have published reverse-proxy configs that aggregate 2-4 nodes.

Cheap-cards-aggregated tier

If your goal is to maximize total VRAM across a home network rather than per-box, the highest-leverage hardware in 2026 is RTX 4090 used market (per active eBay listings, $1,200–$1,500) for 24 GB-per-node, or RTX 3090 used ($550–$700) if you can find them with rebuilt fans.

CardVRAMUsed 2026 pricePer-GB cost
RTX 4090 (Ada)24 GB$1,200–$1,500~$56/GB
RTX 3090 (Ampere)24 GB$550–$700~$25/GB
RTX 5090 (Blackwell)32 GB$1,800–$2,400~$66/GB

Per Tom's Hardware's used-GPU buying guide, the RTX 3090 dual-card path with NVLink (which the 3090 still supports) gives 48 GB pooled VRAM at roughly $1,200 — half the price of a single A6000 used. For a home lab running an Ollama node per machine, the per-GB economics favor used 3090s heavily.

Where to buy:

Mac Studio M3 Ultra alternative

For agents whose bottleneck is model capacity rather than throughput — for example, an article-author agent that needs DeepSeek-R1 671B as its proposer — the Apple M3 Ultra Mac Studio with up to 512 GB unified memory is the only single-desktop machine that fits 405B+ models without server-tier hardware.

ConfigPriceWhat it loads
M3 Ultra 60 GB$3,999up to ~32B at Q4
M3 Ultra 96 GB$4,79970B Q4 with headroom
M3 Ultra 256 GB$9,499Llama 3.1 405B Q4, Qwen3 235B MoE
M3 Ultra 512 GB$13,999DeepSeek-R1 671B at MLX 4-bit, DeepSeek V3 685B

The 512 GB SKU runs DeepSeek-R1 671B at ~18 tok/s with 448 GB used per MacRumors's measurements — a workload no single Nvidia card under $20,000 can do without offload.


Open-source models that match each tier

The framework's config/code-editor-config.json ships sensible defaults that work on any of the tiers above. Per the file's default_chain array, the local fallback is jcode-ollama with devstral-small-2:24b as the configured model. Operators can override per agent via the dashboard's /code-editor page or by writing a different model into the storage config.

Models that work for the implementer's code-editor step

Per public benchmarks (and validated by the framework's commit log on May 6, 2026):

  • devstral-small-2:24b (Mistral Devstral 2) — the documented default. Engages tool-calling reliably in jcode's harness per commit d92d687. ~15 GB at Q4. Fits any 16 GB+ card.
  • qwen2.5-coder:32b (Alibaba Qwen 2.5 Coder) — proven older option, ~19 GB at Q4. Matched on the Aider leaderboard at parity with closed-source models for surgical edits.
  • qwen3-coder:30b (Qwen 3 Coder, MoE 3B-active) — strong on benchmarks but per the framework's commit message, less reliable at engaging the Edit tool inside agentic harnesses. Still useful for chat agents that don't need tool calls.

Models that work for chat agents (the AI provider chain, not the code-editor chain)

The framework's framework/core/ai_providers.py exposes a separate provider abstraction for chat agents. Per the public CLAUDE.md, the recommended local-Ollama chat models are:

  • qwen3:32b for general agent reasoning at ~19 GB
  • qwen3:14b for lighter agents at ~9 GB
  • qwen3:8b for high-throughput agents at ~5 GB

Confirmation flows and human-in-the-loop

Per framework/core/confirmations.py, every agent method that modifies production state can be wrapped with a @requires_confirmation decorator. The framework records the request, sends an email to the configured owner, and blocks execution until the human responds via the dashboard or via reply email.

The public design distinguishes four confirmation patterns documented in the README:

  • email-recommendations — the default for SEO-style agents. Recommendations get queued; the human approves a batch via the dashboard.
  • per-action — every dangerous method blocks individually.
  • preview-mode — the agent generates artifacts but never ships.
  • upstream-gated — the agent's output flows to a downstream agent that has its own confirmation gate.

For a home-lab operator who's nervous about agents pushing code unsupervised, the preview-mode setting in each agent's manifest is the safe default — agents will produce diffs to a changes/ folder but the implementer won't commit them.


Limitations and what the framework does NOT do

To stay honest:

  • No multi-tenant isolation. Per the README's "Operational rules" section, this is a single-operator framework. Agents from different repos can register but they all share one Azure Blob storage container. There is no per-customer namespacing.
  • systemd-only scheduling. The README documents systemd --user timers as the canonical scheduler. Operators on macOS use launchd via a manual port; Windows is unsupported.
  • No first-party RAG or vector store. Agents that need vector search bring their own (the framework provides nothing).
  • The code-editor chain's success rate depends on the LLM. Per the May 6, 2026 commit log, claude-opus-4.7 via Copilot proxy succeeds on >90% of small edits in production; local-only chains soft-fail more often (qwen3-coder:30b returned rc=0 with no edits in one validated test).
  • No Windows host support for the host-worker. The host-worker is a bash script that uses systemd-run. WSL2 may work but is undocumented.

A self-hosting checklist for someone starting today

Per the framework's install/ directory and the README's "Quick start" section:

  1. Clone the repo: git clone https://github.com/voidsstr/reusable-agents.git
  2. Provision Azure Blob storage (or set STORAGE_BACKEND=local for testing). Azure CLI walks-through are in install/bootstrap-azure.sh.
  3. Install the host-worker as a systemd --user service: bash install/install-host-worker.sh. This is the daemon that pulls agents off the queue and dispatches them.
  4. Pick your LLM provider chain. Operators with a Claude Max subscription can route through claude-cli; operators on the GitHub Copilot Pro plan can route through copilot (the framework's default). For local-only operation, set the storage config/ai-defaults.json default to ollama with your chosen model.
  5. Optional: install the local code-editor binaries. aider, opencode, crush, codex, and jcode each have their own install paths. Per framework/core/code_editor.py, the framework auto-detects which are present and skips the rest.
  6. Register your first agent by writing an agent.py and manifest.json in your project repo, then running bash install/register-agent.sh <agent-dir>.
  7. Browse the dashboard at the framework's URL (default port 8090 for the API, 8091 for the UI; production deploys via install/deploy-azure.sh).

A non-trivial agent (one that applies code edits and ships them) needs additional one-time setup: ~/.codex/config.toml for codex-azure, ~/.config/opencode/opencode.json for opencode-azure, and ~/.config/crush/crush.json for crush-azure. The framework gracefully skips backends whose config is missing.


Verdict — who should self-host this

Self-host this if

  • You run multiple sites or projects that would benefit from per-domain agent automation (content production, SEO, deployment chains).
  • You have a Claude Max or GitHub Copilot Pro subscription and want to actually use that allowance to drive editorial workflows.
  • You have a home AI rig (RTX 5090 / RTX A6000 / Mac Studio M3 Ultra) and want a real reason to keep it busy beyond chat.
  • You want to study a working production agent system and you prefer reading code over reading vendor blog posts.

Skip this if

  • You need turnkey hosted multi-tenant agent SaaS — this framework is a single-operator product.
  • You're allergic to systemd or live exclusively in Windows.
  • Your only goal is "run an LLM at home" — Ollama alone is enough; you don't need the full framework.

Pick your hardware tier

  • Just want the framework to work, modest local fallback: RTX 5090 on Amazon or eBay — $1,999, runs devstral-small-2:24b for the local fallback, doubles as a top-tier gaming card.
  • Need a 70B local proposer for an article-author-style agent: RTX A6000 on eBay (primary) — $3,800–$4,800 used, native 70B Q4 inference, NVLink for two-card scaling.
  • Production-grade single-box for 405B-class workloads: RTX PRO 6000 Blackwell on Amazon — $8,499, 96 GB GDDR7, FP4/FP8 native.
  • Capacity-bound (DeepSeek-R1 671B class): Mac Studio M3 Ultra 256 GB or 512 GB — the only single-desktop machine that fits 405B+ at Q4.
  • Multi-box home cluster: RTX 3090 on eBay (used) — $550–$700 each, NVLink two-card pairs at $1,200 for 48 GB pooled. Best $/GB-VRAM if you can run multiple inference nodes.

Prices accurate as of May 6, 2026 and subject to change.

See the framework on GitHub →

Related: RTX A6000 48GB workstation review →

Related: RTX 5090 vs RTX A6000 for local LLMs →

Related: Mac Studio M3 Ultra vs RTX 5090 for AI inference →


Frequently asked questions

Is reusable-agents really MIT-licensed? Per the LICENSE file in the repo's root: yes, MIT. The framework code, the dashboard React components, the install scripts, and the agent blueprints are all redistributable under the MIT terms.

Does it lock me into Azure? No. Per framework/core/storage.py, the storage backend is pluggable: AzureBlobStorage and LocalFilesystemStorage are shipped, and the abstract base class is documented. An S3 backend is straightforward — implement the same interface. The dashboard's Container App deployment is one option among several documented in examples/deployer/ (App Service, Functions, AWS ECS Fargate, AWS Lambda, AWS App Runner are all sample recipes).

What are the Azure cost expectations? Per the README's "Operational rules" section, the production deployment runs on a single small Azure Container App + an Azure Postgres + Azure Blob storage. Order of magnitude for a personal-scale operator: under $50/month. Heavy LLM costs go to your Copilot Pro / Claude Max subscriptions, not to the framework.

Can I use it without LLM costs at all? Yes — set default_provider: ollama-local in config/ai-defaults.json and point everything at your local server. Per the May 6, 2026 commit history, devstral-small-2:24b on an RTX 5090 + jcode is sufficient for the implementer's code-edit work; chat agents work with qwen3:14b or qwen3:32b per the documented defaults.

Does the framework support OpenAI's Codex / Anthropic API directly? Yes, both. The provider chain (per framework/core/ai_providers.py) supports openai, anthropic, azure_openai, claude-cli, copilot, and ollama natively. The default chain is ('copilot', 'azure_openai', 'openai', 'anthropic', 'ollama') per the public docstring.

How do I scale agents that produce a lot of work? Per the dispatch architecture, agents finish their run() and call dispatch_now() which spawns the implementer in a systemd-run --scope. Each dispatch is independent. To scale: distribute agents across multiple host machines (each runs its own host-worker), or move heavy LLM calls to a separate Ollama box and reverse-proxy in.

Is there a managed/hosted version? Per the README: no. The maintainer's stated position is "this framework is a single-operator product." A managed offering would require multi-tenant isolation that the public design does not provide.

Where do I report bugs? The repo's Issues page on GitHub is the canonical channel.


Citations and sources

This article synthesizes information from the following public sources. All claims about the framework's design come from its public repository; all benchmark numbers come from third-party benchmark publications cited inline above.

This piece is editorial synthesis based on publicly available information. No independent first-party benchmarking is reported here; all performance numbers are sourced from the publications cited inline above. Hardware availability and pricing change daily — verify current stock and pricing on the linked retailer pages before purchasing.

Sources

— SpecPicks Editorial · Last verified 2026-05-06

NVIDIA GeForce RTX 4090
NVIDIA GeForce RTX 4090
$4280.00
View on Amazon →