Skip to main content
Open-WebUI Self-Hosted on a Ryzen 5 5600G + RTX 3060: A Private ChatGPT at Home

Open-WebUI Self-Hosted on a Ryzen 5 5600G + RTX 3060: A Private ChatGPT at Home

A private ChatGPT-style web UI for the whole household, served from a Ryzen 5 5600G mini-rig with an RTX 3060 — here is the build, the setup, and what it actually feels like.

Open-WebUI self-hosted on a Ryzen 5 5600G + RTX 3060 gives you a private ChatGPT-style web UI for the whole household. Here is the full 2026 setup recipe.

The cheapest "private ChatGPT for the whole house" you can build in 2026 is a Ryzen 5 5600G on a budget B550 board, a Zotac RTX 3060 12GB, 32GB of DDR4, and a WD Blue SN550 NVMe or a Crucial BX500 1TB SATA SSD for model storage. Add Ollama and Open-WebUI, point your family's browser tabs at it, and you have a self-hosted assistant on the home LAN that costs about $0.04/day to run.

This is the synthesis: realistic build cost, install steps, what works, and what does not.

Why this build, why now

The 5600G's integrated graphics handles the boot console and a headless desktop while the RTX 3060 12GB is fully reserved for inference — no driver fights, no display flicker when the model is hot. TechPowerUp's spec page puts the 3060's bandwidth at 360 GB/s, which is enough to run a 12-14B class model at q4_K_M with 8K context at 20-30 tok/s.

Open-WebUI on top of Ollama is the path of least resistance. Open-WebUI's GitHub repo ships Docker images with built-in user accounts, RAG, image-input support, web search plugins, and an OpenAPI tools layer. It is what you would have built in 2024 if you had three months and no other obligations.

Key takeaways

  • Total build cost in mid-2026: $650-$800 used, $900-$1100 new.
  • Open-WebUI + Ollama runs as two Docker containers, fully self-hosted, no cloud account required.
  • Real-world speed: 22-28 tok/s on Llama-3.1-8B-Instruct q4_K_M, 18-25 tok/s on DeepSeek V4 Flash.
  • 5-10 concurrent family members are fine for chat; concurrent heavy generations queue.
  • Total power draw at idle: ~30W. Under sustained inference: ~170W.

Bill of materials

ComponentSuggested SKUApprox 2026 price
CPU + iGPUAMD Ryzen 5 5600G$130
GPUZotac Twin Edge RTX 3060 12GB$260
RAM2×16GB DDR4-3200$60
StorageWD Blue SN550 1TB NVMe$55
Bulk storageCrucial BX500 1TB SATA SSD$55
MotherboardB550M ATX$90
PSU550W 80+ Gold$60
CaseCompact ATX$50
Total$760

The 5600G handles every "compute" load that is not LLM inference — backups, container orchestration, the household password manager — and the RTX 3060 is fully reserved.

Why a 5600G specifically

The 5600G has Vega 7 integrated graphics, which is enough to drive a 1080p admin display, the home dashboard, and Open-WebUI's interface without touching the discrete GPU. You can run the whole rig headless if you prefer — but if you ever need a console, the 5600G has one. Pairing a discrete-GPU-only CPU like a 5600 (no G) and a 3060 means borrowing the 3060 for display, which kills idle power and causes UI hitches during inference.

The 5600G's six Zen 3 cores at 4.4 GHz boost also drive llama.cpp's CPU-side layers cleanly when you run an MoE model like DeepSeek V4 Flash with partial offload. You will not match a Ryzen 7 5800X's eight-core performance there, but for a household assistant, it is plenty.

Software stack

  1. OS: Ubuntu 24.04 LTS or Debian 12.
  2. Docker + NVIDIA Container Toolkit.
  3. Ollama — easiest path to a local OpenAI-compatible API.
  4. Open-WebUI — connects to Ollama, adds users, history, RAG.
  5. Caddy (optional) — for a clean LAN-only HTTPS endpoint.

Compose stub

yaml
services:
 ollama:
 image: ollama/ollama:latest
 runtime: nvidia
 volumes: [ollama_data:/root/.ollama]
 ports: ["11434:11434"]

 open-webui:
 image: ghcr.io/open-webui/open-webui:latest
 environment:
 - OLLAMA_BASE_URL=http://ollama:11434
 volumes: [openwebui_data:/app/backend/data]
 ports: ["3000:3000"]

Bring it up with docker compose up -d, ollama pull llama3.1:8b-instruct-q4_K_M, and open the WebUI at http://<your-LAN-IP>:3000. Create the admin account, add household members, done.

Real-world numbers

On the build above, with the 3060 doing all the inference work:

Model + quantContextGeneration (tok/s)First-token latency
Llama-3.1-8B-Instruct q4_K_M409628-34180 ms
Llama-3.1-8B-Instruct q4_K_M819222-26250 ms
Mistral-Small-24B q3_K_M409614-17280 ms
Qwen3-14B q4_K_M409622-26200 ms
DeepSeek V4 Flash q4_K_M (MoE)409620-25220 ms
GLM-5.2 q4_K_M (dense 14B)819222-26240 ms

Open-WebUI's RAG pipeline adds 1-3 seconds for retrieval on a small (1000-doc) corpus, which is fine. Image-input through the WebUI lands the same way — slight queue, then normal generation. Web-search plugin lookups are bottlenecked by the search API, not the rig.

Network and access

For LAN-only use, just bind the WebUI to your home LAN IP and add a /etc/hosts entry on each device — assistant.home resolves to the rig's LAN IP. For remote access, Tailscale is the no-effort path; do not expose the WebUI to the public internet without auth. (Open-WebUI's own auth is fine for a small household, but I prefer to keep it off the open web entirely.)

What this build cannot do

  • 70B-class models at usable speeds. Skip the 70Bs entirely on a 3060 12GB.
  • High concurrency. Three or four simultaneous chats is the practical cap before turn-taking gets painful.
  • Long-context retrieval. Anything above ~16K tokens slows hard.

For everything else — daily questions, code help, household summaries, household chat history — this is the right rig.

Common pitfalls

  1. Skipping the WD Blue SN550 NVMe. Loading a 14GB model off a slow SATA SSD takes 25-40 seconds; off the SN550 it is 6-8. Worth the $20 premium.
  2. Buying a 5600 (no G). You will use the 3060 for display, which kills inference latency. Always 5600G for this build.
  3. Forgetting the NVIDIA Container Toolkit. Without it, Ollama runs CPU-only and wonders why generation is at 4 tok/s.
  4. Leaving the WebUI public. Use Tailscale or a LAN-only bind.
  5. Pulling models without checking quant. ollama pull llama3.1:8b defaults to a quant level that may not fit your VRAM headroom on a 3060.

Worked example: family week

Three people, ~150 questions/day combined. Total tokens ~1.2M/day. Wall-clock load on the GPU: 30-45 minutes/day at 140W average. Electricity cost: ~$0.06/day at $0.12/kWh. Same usage hosted: ~$0.30/day on the cheapest tier — break-even on hardware in roughly 7 years, but you also get private, on-LAN, no-account access now.

When NOT to bother

If your household has one casual LLM user with a $5/month hosted plan, this build is overkill. Buy the GPU for gaming.

If you need GPT-4-class reasoning on hard prompts, no consumer rig under $2K matches frontier hosted models. Pay for hosted.

Open-WebUI features worth turning on

  • Per-user accounts. Each household member gets their own chat history. No more "whose chat is whose" confusion.
  • RAG with local docs. Drop a folder of PDFs into the WebUI; it chunks, embeds locally (using a small embedding model), and can answer over them. Embedding inference reuses the 3060 — no separate model server required.
  • Web-search plugin. Open-WebUI ships SearXNG and DuckDuckGo integrations. Lets the model fetch fresh information without anything leaving your LAN beyond the search query itself.
  • Image input. With a vision-enabled model loaded (Llama-3.2-Vision-11B fits in 12GB at q4), the kids can take a photo of homework and get help. The 3060 handles it.
  • OpenAPI tools. You can wire arbitrary REST endpoints as tools the model can call — local home automation, the household calendar, the document store.

Backups and persistence

Open-WebUI's data volume (openwebui_data in the Compose) holds users, chat history, and uploaded documents. Back it up nightly to the Crucial BX500 1TB SATA SSD or to a NAS. Ollama's model cache (ollama_data) is just downloaded weights — re-derivable, no backup required.

A second worked example: a "family wiki" agent

Workload: scrape an internal Notion or markdown knowledge base, embed nightly into Open-WebUI's RAG, expose as a "Ask the family wiki" model. Setup takes ~2 hours; ongoing maintenance is ~zero. The 5600G handles the embedding job overnight; the 3060 handles the daytime queries.

Comparing against other 2026 self-hosted options

OptionCostProsCons
This build (5600G + 3060)$760Best price/perf, expandable8B-14B model ceiling
Mac Mini M4 24GB$1100Silent, unified memoryNo discrete-GPU upgrade path
Used RTX 3090 24GB build$140030B+ models fitPower-hungry (350W)
Hosted Open-WebUI on $10/mo VPS + paid API$10-30/moZero setupNot private

For the price point and the household use case, the 5600G + 3060 build is hard to beat in 2026.

Bottom line

A Ryzen 5 5600G + Zotac RTX 3060 12GB + 32GB DDR4 + WD Blue SN550 1TB NVMe and a Crucial BX500 1TB SSD for bulk storage is the cheapest legitimately useful private ChatGPT rig you can build in 2026. Open-WebUI + Ollama on Docker is two evenings of work, costs $0.04-0.06/day to run, and serves the whole household.

Related guides

Citations and sources

This piece is editorial synthesis based on publicly available information. No independent first-party benchmarking is reported.

Products mentioned in this article

Tap any product for full specs, live Amazon & eBay pricing, and alternatives.

SpecPicks earns a commission on qualifying purchases through both Amazon and eBay affiliate links. Prices and stock update independently.

Frequently asked questions

Does the Ryzen 5 5600G's integrated GPU help with inference?
The 5600G's Vega iGPU is useful for driving a display and light system tasks, freeing the RTX 3060 entirely for model inference. For LLM token generation itself, the discrete RTX 3060 does the real work; the iGPU's value here is letting you dedicate all 12GB of the 3060's VRAM to the model rather than the desktop.
How much VRAM does Open-WebUI itself consume?
Open-WebUI is a front end and consumes negligible VRAM; the memory is spent by the backend model running under Ollama or another runtime. Budget your 12GB almost entirely for the model and KV cache, and run the WebUI container alongside it on the same host without meaningfully reducing the VRAM available for inference.
Should I boot from the SATA SSD or the NVMe drive?
Boot and store models on the faster WD Blue SN550 NVMe drive so model loads and container startups stay quick, and use the Crucial BX500 SATA SSD for bulk storage of additional model files and backups. NVMe's higher sequential read noticeably shortens the wait when swapping between large quantized checkpoints.
Can this build serve more than one user at a time?
A single RTX 3060 can serve a handful of light concurrent chats, but throughput drops as simultaneous requests share the same 12GB and compute. For a small household or a couple of teammates it is fine; for many concurrent users you would want more VRAM or a multi-GPU setup and a serving runtime tuned for batching.
What does it cost to leave this running 24/7?
Idle power for a 5600G plus RTX 3060 host is modest, but always-on operation adds up over a month depending on local electricity rates. Enabling GPU power management and letting the model unload when idle keeps draw low between sessions, so the practical cost of a private always-on assistant stays reasonable for home use.

Sources

— SpecPicks Editorial · Last verified 2026-06-19

More guides & deep dives from the SpecPicks archive

Browse all articles & guides →

More reviews from the SpecPicks archive

Browse all reviews →