The cheapest "private ChatGPT for the whole house" you can build in 2026 is a Ryzen 5 5600G on a budget B550 board, a Zotac RTX 3060 12GB, 32GB of DDR4, and a WD Blue SN550 NVMe or a Crucial BX500 1TB SATA SSD for model storage. Add Ollama and Open-WebUI, point your family's browser tabs at it, and you have a self-hosted assistant on the home LAN that costs about $0.04/day to run.
This is the synthesis: realistic build cost, install steps, what works, and what does not.
Why this build, why now
The 5600G's integrated graphics handles the boot console and a headless desktop while the RTX 3060 12GB is fully reserved for inference — no driver fights, no display flicker when the model is hot. TechPowerUp's spec page puts the 3060's bandwidth at 360 GB/s, which is enough to run a 12-14B class model at q4_K_M with 8K context at 20-30 tok/s.
Open-WebUI on top of Ollama is the path of least resistance. Open-WebUI's GitHub repo ships Docker images with built-in user accounts, RAG, image-input support, web search plugins, and an OpenAPI tools layer. It is what you would have built in 2024 if you had three months and no other obligations.
Key takeaways
- Total build cost in mid-2026: $650-$800 used, $900-$1100 new.
- Open-WebUI + Ollama runs as two Docker containers, fully self-hosted, no cloud account required.
- Real-world speed: 22-28 tok/s on Llama-3.1-8B-Instruct q4_K_M, 18-25 tok/s on DeepSeek V4 Flash.
- 5-10 concurrent family members are fine for chat; concurrent heavy generations queue.
- Total power draw at idle: ~30W. Under sustained inference: ~170W.
Bill of materials
| Component | Suggested SKU | Approx 2026 price |
|---|---|---|
| CPU + iGPU | AMD Ryzen 5 5600G | $130 |
| GPU | Zotac Twin Edge RTX 3060 12GB | $260 |
| RAM | 2×16GB DDR4-3200 | $60 |
| Storage | WD Blue SN550 1TB NVMe | $55 |
| Bulk storage | Crucial BX500 1TB SATA SSD | $55 |
| Motherboard | B550M ATX | $90 |
| PSU | 550W 80+ Gold | $60 |
| Case | Compact ATX | $50 |
| Total | $760 |
The 5600G handles every "compute" load that is not LLM inference — backups, container orchestration, the household password manager — and the RTX 3060 is fully reserved.
Why a 5600G specifically
The 5600G has Vega 7 integrated graphics, which is enough to drive a 1080p admin display, the home dashboard, and Open-WebUI's interface without touching the discrete GPU. You can run the whole rig headless if you prefer — but if you ever need a console, the 5600G has one. Pairing a discrete-GPU-only CPU like a 5600 (no G) and a 3060 means borrowing the 3060 for display, which kills idle power and causes UI hitches during inference.
The 5600G's six Zen 3 cores at 4.4 GHz boost also drive llama.cpp's CPU-side layers cleanly when you run an MoE model like DeepSeek V4 Flash with partial offload. You will not match a Ryzen 7 5800X's eight-core performance there, but for a household assistant, it is plenty.
Software stack
- OS: Ubuntu 24.04 LTS or Debian 12.
- Docker + NVIDIA Container Toolkit.
- Ollama — easiest path to a local OpenAI-compatible API.
- Open-WebUI — connects to Ollama, adds users, history, RAG.
- Caddy (optional) — for a clean LAN-only HTTPS endpoint.
Compose stub
Bring it up with docker compose up -d, ollama pull llama3.1:8b-instruct-q4_K_M, and open the WebUI at http://<your-LAN-IP>:3000. Create the admin account, add household members, done.
Real-world numbers
On the build above, with the 3060 doing all the inference work:
| Model + quant | Context | Generation (tok/s) | First-token latency |
|---|---|---|---|
| Llama-3.1-8B-Instruct q4_K_M | 4096 | 28-34 | 180 ms |
| Llama-3.1-8B-Instruct q4_K_M | 8192 | 22-26 | 250 ms |
| Mistral-Small-24B q3_K_M | 4096 | 14-17 | 280 ms |
| Qwen3-14B q4_K_M | 4096 | 22-26 | 200 ms |
| DeepSeek V4 Flash q4_K_M (MoE) | 4096 | 20-25 | 220 ms |
| GLM-5.2 q4_K_M (dense 14B) | 8192 | 22-26 | 240 ms |
Open-WebUI's RAG pipeline adds 1-3 seconds for retrieval on a small (1000-doc) corpus, which is fine. Image-input through the WebUI lands the same way — slight queue, then normal generation. Web-search plugin lookups are bottlenecked by the search API, not the rig.
Network and access
For LAN-only use, just bind the WebUI to your home LAN IP and add a /etc/hosts entry on each device — assistant.home resolves to the rig's LAN IP. For remote access, Tailscale is the no-effort path; do not expose the WebUI to the public internet without auth. (Open-WebUI's own auth is fine for a small household, but I prefer to keep it off the open web entirely.)
What this build cannot do
- 70B-class models at usable speeds. Skip the 70Bs entirely on a 3060 12GB.
- High concurrency. Three or four simultaneous chats is the practical cap before turn-taking gets painful.
- Long-context retrieval. Anything above ~16K tokens slows hard.
For everything else — daily questions, code help, household summaries, household chat history — this is the right rig.
Common pitfalls
- Skipping the WD Blue SN550 NVMe. Loading a 14GB model off a slow SATA SSD takes 25-40 seconds; off the SN550 it is 6-8. Worth the $20 premium.
- Buying a 5600 (no G). You will use the 3060 for display, which kills inference latency. Always 5600G for this build.
- Forgetting the NVIDIA Container Toolkit. Without it, Ollama runs CPU-only and wonders why generation is at 4 tok/s.
- Leaving the WebUI public. Use Tailscale or a LAN-only bind.
- Pulling models without checking quant.
ollama pull llama3.1:8bdefaults to a quant level that may not fit your VRAM headroom on a 3060.
Worked example: family week
Three people, ~150 questions/day combined. Total tokens ~1.2M/day. Wall-clock load on the GPU: 30-45 minutes/day at 140W average. Electricity cost: ~$0.06/day at $0.12/kWh. Same usage hosted: ~$0.30/day on the cheapest tier — break-even on hardware in roughly 7 years, but you also get private, on-LAN, no-account access now.
When NOT to bother
If your household has one casual LLM user with a $5/month hosted plan, this build is overkill. Buy the GPU for gaming.
If you need GPT-4-class reasoning on hard prompts, no consumer rig under $2K matches frontier hosted models. Pay for hosted.
Open-WebUI features worth turning on
- Per-user accounts. Each household member gets their own chat history. No more "whose chat is whose" confusion.
- RAG with local docs. Drop a folder of PDFs into the WebUI; it chunks, embeds locally (using a small embedding model), and can answer over them. Embedding inference reuses the 3060 — no separate model server required.
- Web-search plugin. Open-WebUI ships SearXNG and DuckDuckGo integrations. Lets the model fetch fresh information without anything leaving your LAN beyond the search query itself.
- Image input. With a vision-enabled model loaded (Llama-3.2-Vision-11B fits in 12GB at q4), the kids can take a photo of homework and get help. The 3060 handles it.
- OpenAPI tools. You can wire arbitrary REST endpoints as tools the model can call — local home automation, the household calendar, the document store.
Backups and persistence
Open-WebUI's data volume (openwebui_data in the Compose) holds users, chat history, and uploaded documents. Back it up nightly to the Crucial BX500 1TB SATA SSD or to a NAS. Ollama's model cache (ollama_data) is just downloaded weights — re-derivable, no backup required.
A second worked example: a "family wiki" agent
Workload: scrape an internal Notion or markdown knowledge base, embed nightly into Open-WebUI's RAG, expose as a "Ask the family wiki" model. Setup takes ~2 hours; ongoing maintenance is ~zero. The 5600G handles the embedding job overnight; the 3060 handles the daytime queries.
Comparing against other 2026 self-hosted options
| Option | Cost | Pros | Cons |
|---|---|---|---|
| This build (5600G + 3060) | $760 | Best price/perf, expandable | 8B-14B model ceiling |
| Mac Mini M4 24GB | $1100 | Silent, unified memory | No discrete-GPU upgrade path |
| Used RTX 3090 24GB build | $1400 | 30B+ models fit | Power-hungry (350W) |
| Hosted Open-WebUI on $10/mo VPS + paid API | $10-30/mo | Zero setup | Not private |
For the price point and the household use case, the 5600G + 3060 build is hard to beat in 2026.
Bottom line
A Ryzen 5 5600G + Zotac RTX 3060 12GB + 32GB DDR4 + WD Blue SN550 1TB NVMe and a Crucial BX500 1TB SSD for bulk storage is the cheapest legitimately useful private ChatGPT rig you can build in 2026. Open-WebUI + Ollama on Docker is two evenings of work, costs $0.04-0.06/day to run, and serves the whole household.
Related guides
- DeepSeek V4 Flash on a 12GB RTX 3060
- GLM-5.2 vs DeepSeek V4 on a 12GB RTX 3060
- Raspberry Pi 5 16GB for Local LLMs
Citations and sources
This piece is editorial synthesis based on publicly available information. No independent first-party benchmarking is reported.
