Skip to main content
ComfyUI on an RTX 3060 12GB: Flux and SDXL Speeds in 2026

ComfyUI on an RTX 3060 12GB: Flux and SDXL Speeds in 2026

What models fit, how fast they generate, and what 12GB lets you do for local image generation

What fits on a 12GB RTX 3060: SDXL at fp16 in 20-35s, Flux fp8 in 60-90s. Full quant matrix, hardware pairings, when to upgrade.

Yes. The MSI GeForce RTX 3060 Ventus 2X 12G and ZOTAC Gaming GeForce RTX 3060 Twin Edge both ship 12GB of VRAM, enough to run SDXL at fp16 and fp8/GGUF builds of Flux.1 with model offloading. Public measurements place SDXL at 1024px in the 20-35 second range per image and Flux fp8 closer to 60-90 seconds per image, depending on sampler and steps.

ComfyUI has settled into the default local image-generation pipeline because it exposes the node graph that other tools hide. You can route latents through arbitrary samplers, attach ControlNets, stack LoRAs, and trade off speed against quality at every node. That same flexibility is what makes a 12GB card workable for Flux: ComfyUI lets you offload the text encoder to CPU, run the diffusion model on GPU, and recombine — moves that fail in monolithic UIs. Per the ComfyUI repository on GitHub, the project documents specific 12GB workflows for both SDXL and Flux in its examples directory.

12GB is the threshold where image generation stops feeling claustrophobic. At 8GB, every Flux workflow needs aggressive offloading; at 12GB, fp8 Flux runs cleanly and SDXL leaves room for a LoRA stack. Per the Black Forest Labs Flux model card on Hugging Face, the fp8 Flux.1 dev variant has been the standard 12GB-compatible build since late 2024 and remains the default for cards in the 3060 class.

Who this matters for: hobbyists generating images at 1024-1536px who do not need batches of eight, indie creators who want LoRA training off a single card, and developers building image pipelines without an API in the loop. If you need fast batches of high-resolution outputs, the workflow changes — 12GB is the floor, not the ceiling.

Key takeaways

  • SDXL at fp16 runs cleanly on a 12GB card; Flux.1 dev runs in fp8 or GGUF form with offloading.
  • A single 1024px SDXL image lands in the 20-35 second range; Flux fp8 lands at 60-90 seconds.
  • The WD Blue SN550 1TB NVMe keeps model cold-loads fast — relevant because ComfyUI swaps models often.
  • VRAM headroom limits batch size and LoRA stacking more than resolution.
  • A 16GB+ card buys you batches, unquantized Flux, and ControlNet pipelines; a 24GB card is needed for any serious LoRA training above SDXL size.
  • Cloud image APIs win on latency, not on cost-per-image at high volume.

What models fit in 12GB?

The 12GB ceiling shapes which weights you can actually load.

ModelFile size (fp16)Approx. VRAM at fp16Fits 12GB?
SD 1.5~2 GB~3 GByes, trivially
SDXL base~6.6 GB~8 GByes
SDXL + refiner~13 GB~14 GB combinedno, run sequentially
Flux.1 schnell (fp16)~24 GB~22 GBno
Flux.1 schnell (fp8)~12 GB~11 GByes, with offload
Flux.1 dev (fp16)~24 GB~22 GBno
Flux.1 dev (fp8)~12 GB~11 GByes, tight
Flux.1 dev (GGUF Q8)~13 GB~10 GByes
Flux.1 dev (GGUF Q4)~7 GB~6 GByes, with quality drop

GGUF builds are the practical workaround for unquantized Flux. Q8 is near-indistinguishable from fp8 in practice; Q4 shows visible quality loss on faces and fine text.

Generation benchmark table: seconds-per-image at 1024px

Public ComfyUI community measurements for the RTX 3060 12GB cluster in the ranges below. Treat as orientation, not your own benchmark.

WorkflowResolutionStepsApprox. seconds/image
SDXL base only1024×10242522-28
SDXL base + refiner1024×102425 + 1030-40
SDXL + 2 LoRAs1024×10242525-32
Flux.1 schnell (fp8)1024×1024435-50
Flux.1 dev (fp8)1024×10242070-95
Flux.1 dev (GGUF Q4)1024×10242055-75
SDXL upscale (latent x2)2048×20482590-120

Flux is the slower model by design — it is a much larger network than SDXL. The schnell variant runs in 4 steps, which is what makes it usable on a 12GB card without the fp16 weights.

Quantization and offload: fp16 vs fp8 vs GGUF Flux

Flux on a 12GB card is the workflow most newcomers ask about. There are three viable paths.

BuildApprox. VRAMApprox. seconds/image (20 steps)Quality vs fp16
fp16 (does not fit)~22 GBn/areference
fp8~11 GB70-90indistinguishable on most prompts
GGUF Q8~10 GB75-95indistinguishable
GGUF Q6~8 GB65-85very minor loss
GGUF Q4~6 GB55-75visible loss on faces, text

fp8 is the default first try on a 12GB card; GGUF Q8 buys you ~1GB of headroom for LoRAs and ControlNets at no visible quality cost. Q4 is for batch experimentation only.

How VRAM headroom limits batch size, resolution, and LoRA stacking

The same 12GB you spend on the model is the 12GB you have to spend on everything else.

Workflow add-onApprox. extra VRAMNotes
One LoRA (SDXL, rank 32)~0.2 GBtrivial
ControlNet (SDXL)~1.5 GBpainful on Flux
Batch of 2 (SDXL 1024px)~3 GB extrarisky on 12GB Flux
Upscale to 1536px (latent)~2 GB extrausually fine on SDXL
Two stacked Flux LoRAs~1 GBusually fits on Q8 build

The practical limit on a 12GB card is one image at a time at 1024-1536px with a small stack of LoRAs. Batches and ControlNet on Flux push you to a 16GB+ card immediately.

What CPU, RAM, and SSD pair well

ComfyUI is GPU-bound during generation, but the CPU and disk matter during model load and text encoding.

  • CPU. A modern 6-8 core chip is plenty. A Ryzen 7 5800X handles text encoders comfortably; a Ryzen 5 5600 is fine. Text encoding for Flux is the one CPU-bound stage if you offload the T5 encoder.
  • System RAM. 32GB is the comfortable floor. Flux's text encoder alone wants ~8GB if offloaded to CPU, and Comfy keeps multiple models cached in RAM between runs to avoid reload time. 16GB will work but forces eager swap-out.
  • Storage. ComfyUI swaps between models often. A 1TB NVMe like the WD Blue SN550 keeps model cold-loads under 10 seconds for SDXL; SATA roughly doubles those times. NVMe is the right choice here.
  • PSU. The card pulls up to ~200W under image-gen load. 550W is the floor; 650W gives margin.

Perf-per-dollar vs a 16GB/24GB card and vs cloud image APIs

The 12GB tier is the value floor; the question is whether your workload pulls you up the ladder.

TierCard exampleApprox. priceWhat it adds
12GBRTX 3060 12GB$200 used / $300 newSDXL + Flux fp8, single images
16GBRTX 4060 Ti 16GB / 4080 16GB$450-1,100Flux fp16, ControlNet, small batches
24GBRTX 4090 / RTX PRO 6000$1,500+batches of 4-8, LoRA training
CloudOpenAI / Stability APIs$0 up frontspeed; pay per image

A cloud image API costs cents per image. At a few hundred images a month the cloud is cheaper. At a few thousand a month the math flips, especially if your prompts violate hosted-content rules and force you off the cheapest tiers. The other cloud advantage is that frontier image models — particularly the closed flagships — are larger and slightly higher quality than the open Flux dev release. For a creator who needs the highest absolute fidelity per image and shoots in low volume, cloud still wins. For everyone else, local on a 12GB card has crossed the quality bar where it is no longer a compromise.

Worked example: a typical SDXL session

A representative 30-minute SDXL session on the 12GB rig — drafting a hero image for an article — looks like this:

  • Boot ComfyUI, load SDXL base + a single style LoRA: ~12 seconds.
  • Generate 8 candidate images at 1024px, 28 steps, DPM++ 2M Karras sampler: ~3.5 minutes of GPU time.
  • Pick two finalists, re-run with seed variation (4 outputs each): ~1.5 minutes.
  • Latent upscale the chosen image to 1536px with a refiner pass: ~70 seconds.
  • Final export at PNG: instant.

Total: under 6 minutes for one polished output and seven discarded candidates. That throughput is the reason the 12GB tier became the floor — it is the cheapest hardware where the iteration cycle feels fast enough for creative work.

Worked example: a Flux fp8 session

The same shape on Flux fp8 changes meaningfully because each image takes longer.

  • Boot ComfyUI, load Flux.1 dev fp8, offload T5 to CPU: ~25 seconds.
  • Generate 4 candidates at 1024px, 20 steps: ~6 minutes.
  • Pick one, re-run with seed variation (2 outputs): ~3 minutes.
  • Upscale via latent + lightweight detailer: ~2 minutes.

Total: ~12 minutes for one final output. Flux's prompt adherence is dramatically better than SDXL's on long natural-language prompts, so fewer iterations are usually needed; but the per-image cost is higher. On a 12GB card the right workflow is fewer, more deliberate generations rather than batches of variants.

Common pitfalls

  1. Loading fp16 Flux on a 12GB card. It crashes or silently falls back to disk-offload, which collapses speed to minutes per image. Use fp8 or GGUF.
  2. Leaving the text encoder on GPU. Force the T5 encoder to CPU on a 12GB Flux workflow. Saves ~3GB.
  3. Browser tabs holding VRAM. Hardware-accelerated browser tabs casually hold 1-2 GB. Close them before a Flux run.

When NOT to use a 12GB card for image gen

If your workflow is batch-of-eight at 1024px, ControlNet-heavy pipelines, or LoRA training above SDXL scale, the 12GB ceiling will frustrate you. A 16-24GB card pays off fast. Likewise if you generate fewer than a hundred images a month total — the cloud API is cheaper and faster than amortizing a local rig.

Sharing the card with chat

A common 12GB build also runs Ollama for chat. ComfyUI loads its models into VRAM on launch and holds them, so simultaneously running a chat model is fragile — the chat model will OOM or you will lose Flux from VRAM and reload it from disk on every image. The clean fix is to free Flux between sessions: launch ComfyUI when you need images, shut it down when you do not, and keep Ollama as the always-on resident. On a 16GB card, both can coexist with care. On 12GB, treat ComfyUI as exclusive while it is running.

Bottom line

A 12GB RTX 3060 ComfyUI rig is the right call for an independent creator generating SDXL or Flux images one at a time, who wants the privacy and offline behavior of a local pipeline without spending more than ~$700 on the whole box. The card is four years old in 2026 and still occupies the floor of serious image gen because 12GB of VRAM is the threshold that fits Flux at fp8.

Related guides

Citations and sources

This piece is editorial synthesis based on publicly available information. No independent first-party benchmarking is reported.

Products mentioned in this article

Tap any product for full specs, live Amazon & eBay pricing, and alternatives.

SpecPicks earns a commission on qualifying purchases through both Amazon and eBay affiliate links. Prices and stock update independently.

Frequently asked questions

Can the RTX 3060 12GB run Flux?
Yes, with caveats. Full fp16 Flux.1 dev is tight on 12GB and usually needs fp8 or a GGUF-quantized build plus model offload. Flux fp8 generates a 1024px image in roughly 60-90 seconds on the 3060, depending on sampler and steps. The schnell variant is faster because it converges in 4 steps rather than 20.
How long does one image take on a 3060?
For SDXL at 1024px with a typical 25-30 step sampler, public measurements cluster in the tens-of-seconds range per image. Flux dev fp8 at 20 steps lands closer to 70-95 seconds per image. The 3060's memory bandwidth bounds generation, not raw compute, so quantized models scale well.
Does VRAM limit resolution or batch size?
Yes. With 12GB you generally generate one image at a time at 1024px and may need tiled upscaling for higher resolutions. Batch generation, heavy ControlNet pipelines, and large LoRA stacks push you toward a 16GB+ card. The 12GB tier is one-at-a-time creative work, not high-volume batched production.
Is a 16GB or 24GB card worth it instead?
If image generation is your main workload and you want unquantized Flux, larger batches, or heavy ControlNet pipelines, the upgrade pays off. For occasional generation and creative iteration the 3060 12GB is still the value floor. Above SDXL-scale LoRA training is where 24GB becomes non-negotiable.
What else do I need besides the GPU?
ComfyUI runs fine on a mid-range CPU like the Ryzen 7 5800X, with 32GB system RAM helping when models offload. An NVMe SSD like the WD Blue SN550 keeps model swaps fast — relevant because ComfyUI workflows often swap between checkpoints and LoRAs. PSU should be 550W or higher with quality 80+ rating.

Sources

— SpecPicks Editorial · Last verified 2026-06-08

More guides & deep dives from the SpecPicks archive

Browse all articles & guides →

More reviews from the SpecPicks archive

Browse all reviews →