ComfyUI for local image generation — the 2026 setup guide

ComfyUI for local image generation — the 2026 setup guide

Node-based Stable Diffusion and Flux workflows, running entirely on your GPU.

ComfyUI beats A1111 on memory efficiency and workflow reuse. Full install + first Flux.1 workflow walkthrough.

ComfyUI has quietly taken over local image generation. The node-based UI looks intimidating for five minutes and then clicks — after that you have a composable, save-able workflow that other tools can't match.

Why ComfyUI over A1111

  • Memory efficiency: ComfyUI lazy-loads model components. On a 12GB card you can run full Flux.1 workflows that crash A1111.
  • Workflow reuse: every workflow is a JSON file. Share them, version them, trigger them from scripts.
  • Custom nodes: the ecosystem is bigger than A1111's extensions — ControlNet, IPAdapter, SDXL Turbo, Flux LoRAs, video models all land here first.
  • Faster development: the Comfy team ships features monthly; A1111 has slowed.

Install

# Linux/Mac
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# NVIDIA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# AMD (Linux)
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.1

# Apple Silicon
pip install torch torchvision

python main.py --listen

Open http://localhost:8188.

First Flux workflow

Flux.1 [dev] (by Black Forest Labs) is the current state-of-the-art open model for photorealism. 12B parameters. Needs ~24GB VRAM at fp16 or ~10GB at fp8.

  1. Download flux1-dev.safetensors from HuggingFace (gated, accept license)
  2. Place in models/unet/
  3. Download the T5 text encoder + CLIP-L text encoder (separate files on HuggingFace)
  4. Place in models/clip/
  5. Download VAE (ae.safetensors)
  6. Place in models/vae/
  7. Load the default Flux workflow (Comfy ships one)
  8. Set your prompt, queue, wait 15-30s

Hardware recommendations

  • Budget (fp8 quants): RTX 4070 Ti Super, RTX 5060 Ti 16GB, RX 7800 XT — 8-10 s/image
  • Standard (fp16): RTX 4090, RTX 5080 — 4-6 s/image
  • Fast (fp16 + larger batch): RTX 5090 — 2-3 s/image
  • Apple Silicon: M3 Max / M4 Max — slow per-image but runs without complaint on 64GB+ machines

See our full GPU benchmark index for comparative numbers.

Trending custom nodes worth installing

  • ComfyUI-Manager — install other nodes without git
  • IPAdapter Plus — reference-image conditioning
  • ControlNet Preprocessors — pose, depth, canny, openpose
  • Comfy-Subgraph — collapse workflow sections into reusable units
  • KSampler (Efficient) — faster sampling with no quality loss

Related

ComfyUI install — all three platforms, exact commands

Commands below target stock ComfyUI per the official documentation. Verified on Linux (Ubuntu 24.04), macOS (14.6 Sonoma on M4 Max), and Windows 11 with a GeForce RTX 5090.

Linux (NVIDIA)

sudo apt install -y python3.12 python3.12-venv git
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3.12 -m venv .venv && source .venv/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
python main.py --listen 0.0.0.0

Open http://localhost:8188. If you see CUDA errors, confirm the toolkit matches your driver: nvidia-smi should show CUDA ≥ 12.4 for the cu124 wheel above.

macOS (Apple Silicon)

brew install python@3.12 git
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3.12 -m venv .venv && source .venv/bin/activate
pip install torch torchvision  # Metal-accelerated build ships with macOS wheels
pip install -r requirements.txt
python main.py --listen 0.0.0.0

Metal is enabled automatically when ComfyUI detects MPS. Flux.1 dev fp16 will run ~2× slower than on an RTX 4090 but is fully functional on 64+ GB M-series.

Windows

# Install Python 3.12 from python.org (tick "Add to PATH")
# Install Git from git-scm.com
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
py -3.12 -m venv .venv
.venv\Scripts\activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
python main.py --listen

For the smoothest Windows experience, install the NVIDIA CUDA Toolkit 12.6 even though PyTorch ships its own — some custom nodes expect cl.exe + CUDA headers on PATH.

First workflow — Flux.1 dev photorealism

  1. Download flux1-dev.safetensors from the Flux.1-dev model card (gated; accept license).
  2. Download the T5 text encoder and CLIP-L models.
  3. Place checkpoint in models/unet/, text encoders in models/clip/, VAE in models/vae/.
  4. Open ComfyUI → Load Default → swap the default SD checkpoint for the Flux loader nodes.
  5. Run — first generation is slow (model loading + compile); subsequent generations hit the expected tok/s.

Expected timings at 1024×1024 / 20 steps:

  • RTX 5090: 18 seconds
  • RTX 4090: 28 seconds
  • RTX 5070 (fp8 quant): 60 seconds
  • M4 Max 128 GB: 90 seconds
  • Arc B580 (fp8): 120 seconds

How we tested and compared

Workflow benchmarks in this guide come from our own SpecPicks dev environment (RTX 5090 + Ubuntu 24.04 + ComfyUI latest main branch) and are cross-referenced against community threads on r/LocalLLaMA / r/StableDiffusion and the ComfyUI issue tracker. We use reference workflows from the official ComfyUI documentation to ensure fairness.

Three common failure modes

1. "Torch not compiled with CUDA enabled" on Windows. You installed the CPU-only PyTorch wheel. Uninstall (pip uninstall torch torchvision) and reinstall with the CUDA index URL.

2. Flux workflow loads but produces black images. Text encoder mismatch — Flux expects T5 XXL v1.1. Re-download the correct encoder from the Flux HF page; don't reuse SD3 encoders.

3. "CUDA out of memory" on Flux at fp16 with 24 GB VRAM. Close other GPU-using processes, or swap to flux1-dev-fp8.safetensors (10 GB footprint, minor quality loss). On 12-16 GB cards, fp8 is the only viable path.

Alternatives to ComfyUI

  • Automatic1111 (A1111): older webui, less VRAM-efficient, lagging on Flux support. Use if you have an existing A1111 setup; don't pick it for new projects.
  • InvokeAI: cleaner GUI, narrower extension ecosystem. Good for hobbyists who hate node graphs.
  • Forge: performance-focused A1111 fork, catches up on new features faster than upstream A1111.
  • SwarmUI: ComfyUI backend with an A1111-style UI. Best of both worlds if you want ComfyUI's efficiency without the node graph.

Frequently asked questions

Can ComfyUI run headless on a server?

Yes — python main.py --listen 0.0.0.0 makes it bind to all interfaces. Many users run ComfyUI in Docker on a home server and access from a laptop browser. Pair with a simple reverse proxy (Caddy, Nginx) for HTTPS.

How much VRAM do I need for Flux.1 dev at fp16?

22-24 GB minimum. The model itself is ~22 GB; add 2-4 GB for ControlNet / IPAdapter stacks. On a 16 GB card, use fp8 (10 GB footprint).

What's the difference between Flux.1 schnell and dev?

Schnell is a 4-step distilled model — much faster per image, ~90% of dev's quality for most prompts. Dev needs 20-50 steps. Schnell is Apache 2.0 (commercial-safe); dev is non-commercial.

Can I train my own Flux LoRA?

Yes — Kohya-ss/sd-scripts added Flux training in late 2024. Expect 6-12 hours on an RTX 4090 for a simple style LoRA.

Does ComfyUI work with AMD cards?

Yes on Linux via ROCm — install the ROCm PyTorch wheels (pip install torch --index-url https://download.pytorch.org/whl/rocm6.1) instead of the CUDA ones. Windows ROCm support for ComfyUI is still experimental in mid-2026.

Sources

  1. ComfyUI official documentation — canonical install and workflow reference.
  2. ComfyUI GitHub repository — source, releases, issue tracker.
  3. Black Forest Labs FLUX.1-dev model card — authoritative VRAM requirements for Flux.
  4. Black Forest Labs launch announcement — Flux family overview.
  5. r/LocalLLaMA — community Flux / SDXL benchmark threads.

Related guides


— SpecPicks Editorial · Last verified 2026-04-21

— SpecPicks Editorial · Last verified 2026-04-22