Skip to main content
Open WebUI — self-hosted ChatGPT for your local models

Open WebUI — self-hosted ChatGPT for your local models

The polished web UI for Ollama that actually looks like ChatGPT.

Multi-user auth with RBAC — kids get one account, adults another, admin gets model management RAG pipeline built in — drop a PDF, ask questions,

Open WebUI (formerly Ollama WebUI) is the answer to "how do I give my family a ChatGPT-like interface to my local Ollama?"

What it does

  • Multi-user auth with RBAC — kids get one account, adults another, admin gets model management
  • RAG pipeline built in — drop a PDF, ask questions, Open WebUI handles embedding + retrieval
  • Model switching per conversation
  • Function calling / tool use (via pipelines)
  • Responsive design — works on your phone over LAN

Install with Docker

bash
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Hit http://localhost:3000, create the admin account, point it at your Ollama instance (defaults work if Ollama runs on the same machine).

Hooking up RAG

  1. Admin → Documents → Upload PDF/Markdown/TXT
  2. Open WebUI chunks + embeds automatically (default: sentence-transformers all-MiniLM-L6-v2 running locally on CPU; switch to nomic-embed-text via Ollama by setting RAG_EMBEDDING_ENGINE=ollama and RAG_EMBEDDING_MODEL=nomic-embed-text)
  3. In chat, toggle the document → Open WebUI injects relevant chunks into your prompt

For serious RAG, swap in a dedicated vector DB (Qdrant or Chroma via pipelines).

Why not just SillyTavern / LibreChat / LM Studio?

  • SillyTavern: roleplay-focused, heavier customization per character. Different use case.
  • LibreChat: fuller OpenAI-style multi-provider, but heavier setup.
  • LM Studio: desktop app only, single-user. Great for solo dev; not for a family.

Open WebUI is the sweet spot for "one server, many users, local-first."

Related

Deployment playbook — family, team, or public

Family / home use (single container, trusted LAN)

bash
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui --restart always \
  ghcr.io/open-webui/open-webui:main

Access at http://<home-server-ip>:3000. First user to sign up is admin. RBAC is on by default; create pending accounts for family, approve from admin UI.

Team use (behind Caddy/Traefik, auth via OIDC)

Add a reverse proxy for HTTPS and SSO. Example Caddyfile:

caddyfile
chat.example.com {
  reverse_proxy localhost:3000
  forward_auth auth.example.com {
    uri /api/verify
    copy_headers Remote-User Remote-Groups
  }
}

Open WebUI reads REMOTE_USER / REMOTE_GROUPS from headers when enabled; configure via WEBUI_AUTH_TRUSTED_EMAIL_HEADER=Remote-User.

Public-facing (rate-limited, captcha, strict resource limits)

Don't. If you need a public chat UI, use LibreChat — Open WebUI was built for trusted-environment use and doesn't harden the abuse surface by default.

Hooking into Ollama / LiteLLM / OpenAI

In admin → Connections:

  • Ollama: add http://host.docker.internal:11434 — Open WebUI detects installed models automatically.
  • OpenAI-compatible (LiteLLM, vLLM, copilot-api): add the URL + key. Any OpenAI-shaped endpoint works; LiteLLM specifically is the industry-standard multi-provider proxy and pairs excellently with Open WebUI.
  • Anthropic native: enable via the Anthropic connection type; paste your API key. Supports Claude 4.x.

Building RAG without losing your mind

  1. Set embeddings model in admin → Settings → Documents → Embedding model. The default is sentence-transformers all-MiniLM-L6-v2 (CPU-local, ~500 MB RAM); for better quality switch the engine to Ollama and pick nomic-embed-text.
  2. Upload docs via admin → Documents. Per-user collections are also supported.
  3. In chat, click the document-picker icon to scope the conversation to a collection.

RAG caveats:

  • Default retriever is vector-only (cosine similarity on the embedding store). Turn on ENABLE_RAG_HYBRID_SEARCH to add BM25 lexical search plus a CrossEncoder reranker — usually worth it. Swap in Qdrant as the vector store via the Pipelines feature for more control.
  • Max chunk size matters. 512 tokens is default; bump to 1024 for long-document use cases.

Pipelines — custom logic without forking

Open WebUI's Pipelines feature lets you inject pre/post hooks:

  • Filter pipelines: middleware that mutates the request on the way in (inlet) and/or the response on the way out (outlet) — e.g. scrub PII, redact secrets, add a system-prompt prefix.
  • Pipe pipelines: replace the whole model call with custom logic — e.g. route to different backends based on token count, or wrap a non-OpenAI provider.
  • Manifold pipelines: a single pipeline that exposes multiple models in the picker (multi-model aggregation).
  • Valves (not a separate pipeline type): Pydantic-typed configuration knobs that any pipeline can expose to the admin UI — use them to surface API keys, toggles, and thresholds without redeploying.

Example filter that adds a system-prompt prefix:

python
class Filter:
    def inlet(self, body, user):
        body["messages"].insert(0, {"role": "system",
            "content": "Always include units with every numerical answer."})
        return body

How public benchmarks show and compared

Numbers in this article reflect our own SpecPicks family deployment — Open WebUI on an Ubuntu VM, Ollama on a bare-metal RTX 5090, three active users, ~40 chats/day for three months. Pipeline patterns are cross-referenced against the Open WebUI GitHub repository (issue tracker + discussions) and community feedback on r/LocalLLaMA.

Alternatives — when Open WebUI isn't right

  • LibreChat — more ChatGPT-clone; better for multi-tenant public deployments.
  • SillyTavern — RP / character focus. Different audience.
  • Big-AGI — prettier UI, less admin surface. Good solo-use pick.
  • LM Studio — desktop app only, single user. Good dev tool; not for sharing.

Frequently asked questions

Can Open WebUI replace ChatGPT for a family?

Yes — that's its primary pitch. Multi-user auth, RAG, model switching, mobile-friendly UI. The one thing it doesn't match ChatGPT on is image generation natively (though you can wire ComfyUI behind it via pipelines).

How do I keep the family out of the Ollama admin?

Don't give non-admin accounts the "admin" role. Regular users can chat, upload documents to their own collections, and pick from enabled models; they can't install new ones or see other users' data.

Is Open WebUI audited / secure enough for small business use?

For trusted-LAN use, yes. For anything public-facing, add a proper auth layer (OIDC via Authelia, Authentik, or your identity provider). Open WebUI itself doesn't hold security certifications; treat it as "hobbyist-quality security, production-quality UI."

What's the difference between Open WebUI and Ollama's built-in webui?

Ollama's built-in webui (via the ollama serve web interface) is a quick-test UI — no auth, no RAG, no multi-user. Open WebUI is the "production" layer on top — same Ollama backend, much more surface area.

Does Open WebUI work on Mac / Apple Silicon?

Yes — runs fine in Docker Desktop. Performance is bottlenecked by the model host (your Ollama / inference backend), not the UI container.

Sources

  1. Open WebUI GitHub repository — 133k+ stars, active issue tracker, canonical reference.
  2. LiteLLM documentation — pairing guide for using Open WebUI with multi-provider routing.
  3. r/LocalLLaMA — community deployment patterns.
  4. ComfyUI documentation — image-gen pipeline to optionally wire in.

Related guides


— SpecPicks Editorial · Last verified 2026-04-21

Products mentioned in this article

Live prices from Amazon and eBay — both shown for every product so you can pick the channel that fits.

SpecPicks earns a commission on qualifying purchases through both Amazon and eBay affiliate links. Prices and stock update independently.

Frequently asked questions

What are the hardware requirements for running Open WebUI?
Open WebUI itself has minimal hardware requirements as it is a lightweight Docker container. However, the performance depends on the backend model host (e.g., Ollama or LiteLLM). For optimal use, a system with a modern GPU, such as an NVIDIA RTX series, is recommended for handling inference workloads efficiently.
Can Open WebUI integrate with other AI models besides Ollama?
Yes, Open WebUI supports integration with multiple AI models. It works with OpenAI-compatible APIs, LiteLLM, vLLM, and Anthropic's Claude. These integrations can be configured in the admin settings by providing the appropriate API URLs and keys, making it versatile for various use cases.
How does Open WebUI handle document-based RAG workflows?
Open WebUI supports RAG (Retrieval-Augmented Generation) by allowing users to upload documents in formats like PDF, Markdown, or TXT. It automatically chunks and embeds the content using models like `nomic-embed-text`. For advanced use, users can integrate vector databases like Qdrant or Chroma for improved retrieval performance.
Is Open WebUI suitable for public-facing deployments?
Open WebUI is designed for trusted environments like family or team use. It lacks built-in protections for public-facing deployments, such as rate-limiting or advanced abuse prevention. For public use, it is recommended to use alternatives like LibreChat, which are better suited for such scenarios.
What customization options are available in Open WebUI?
Open WebUI offers extensive customization through its Pipelines feature. Users can create pre/post-processing hooks to modify input/output, route requests to different backends, or add toggles for specific features. This flexibility allows for tailored workflows without modifying the core application.

Sources

— SpecPicks Editorial · Last verified 2026-05-20

More guides & deep dives from the SpecPicks archive

Browse all articles & guides →

More reviews from the SpecPicks archive

Browse all reviews →