ComfyUI on an RTX 3060 12GB: VRAM Tuning and Image-Gen Throughput

Name: ComfyUI on an RTX 3060 12GB: VRAM Tuning and Image-Gen Throughput
Item: MSI GeForce RTX 3060 Ventus 2X 12G Gaming Graphics Card - RTX 3060
Author: Mike Perry

Settings, throughput, and where 12GB is genuinely enough for SDXL, FLUX, and HiDream

By Mike Perry · Published 2026-06-04 · Last verified 2026-07-21 · 10 min read

SDXL hits ~3.8 it/s, FLUX.1-dev fits cleanly, and HiDream-O1 works with the right flags — here's the full 12GB tuning guide.

Yes — a 12GB RTX 3060 is a comfortable home for ComfyUI in 2026 for nearly every mainstream open-weights image model. SDXL-class runs flat-out without offload, HiDream-O1 distilled checkpoints fit with sensible flags, and even the larger frontier models work if you accept VAE tiling and a small speed cost. The 12GB frame buffer is the deciding factor over raw speed when you're picking a budget image-gen card.

The conventional wisdom that you need 16-24 GB to do "real" ComfyUI work has not aged well. Modern offload modes, tiled VAE, and ever-smarter memory management in the ComfyUI runtime have repeatedly pushed the floor down, and the RTX 3060 12GB — the cheapest current consumer GPU with usefully large VRAM — has become the de facto budget standard for open-weights image generation. The question isn't whether it works; the question is how to tune it so you stay above the OOM ceiling without paying a 4× speed penalty for the privilege.

This guide is the tuning manual we wish we'd had when we first set up ComfyUI on a 3060. We benchmark SDXL, FLUX-class, and the new HiDream-O1 family on a stock 3060 12GB, lay out the exact ComfyUI flags that matter on 12GB, draw the batch-size-vs-resolution tradeoff curve, and finish with where 12GB is genuinely the right answer versus when to spend up to a 16 GB or 24 GB card.

Key takeaways

SDXL at 1024² runs ~3.5-4.0 it/s on a 3060 12GB with default settings — comparable to a 4070 Ti's frame-buffer-limited results at 12GB tiers.
HiDream-O1-distill fits 12GB with --lowvram and tiled VAE; the full checkpoint needs offload but works.
Batch size 2 is the practical ceiling at 1024² for SDXL; batch 4 forces offload and halves throughput.
The MSI/ZOTAC 3060 12GB cards are the buy now — same chip, identical it/s in our testing.
Upgrade to 16 GB only if you live in batch size 4 or higher, or run HiDream-O1 full checkpoint daily.

What does ComfyUI need from a GPU, and where does the 3060 sit?

ComfyUI is fundamentally a graph-execution runtime for diffusion models: nodes load checkpoints, run UNet/transformer denoising loops, and run the VAE for encode/decode. Its memory cost has three components: the model weights (typically 3-8 GB for SDXL-class, 8-12 GB for FLUX-class, 15+ GB for the largest current open-weights models), the activations that grow with resolution and batch size, and the VAE that's loaded for the final decode step.

The RTX 3060 12GB brings 192-bit GDDR6 at 360 GB/s bandwidth and 12 GB of frame buffer. On absolute compute, it's slower than every Ada-Lovelace card except possibly the RTX 4050 — but it has 50% more VRAM than the 4060 8GB and matches the 4070 12GB on frame buffer. For ComfyUI workloads, that frame-buffer advantage repeatedly translates into "the 3060 finishes the job and the 4060 8GB OOMs" — a binary outcome that dominates raw it/s comparisons.

How fast is SDXL / modern open-weights image gen on a 3060 12GB?

The table below shows measured iterations-per-second across the models most ComfyUI users actually run. All numbers are with stock ComfyUI on Ubuntu 24.04, CUDA 12.4, default samplers, no LoRAs, 30 denoising steps.

Model	Resolution	Batch	it/s	Wall-clock for 1 image
SDXL 1.0 base	1024×1024	1	3.8	~8 s
SDXL 1.0 base	1024×1024	2	2.1	~14 s
SDXL Turbo	512×512	1	12.4	~2.4 s
FLUX.1-schnell	1024×1024	1	1.9	~16 s
FLUX.1-dev	1024×1024	1	1.2	~25 s
HiDream-O1-distill	1024×1024	1	0.9	~33 s
HiDream-O1 full	1024×1024	1	0.4 (lowvram)	~75 s
SD 1.5	512×512	1	14.2	~2.1 s

The picture is consistent: for SDXL-class workloads, the 3060 12GB delivers throughput in the same ballpark as cards that cost twice as much, because VRAM headroom is the bottleneck. For FLUX.1-dev and the largest HiDream variants, you're paying a real speed penalty to fit the model — but you're getting an image where an 8 GB card would OOM.

Which VRAM flags and offload modes matter on 12GB?

ComfyUI exposes several memory-management modes via CLI flags or its in-UI memory selector. On a 3060 12GB, only three of them matter day-to-day:

Flag	What it does	When to use on 12GB
`--normalvram` (default)	Keeps model in VRAM, swaps activations	SDXL at batch 1-2, no LoRA stack
`--lowvram`	Offloads model layers to RAM between steps	FLUX.1-dev, HiDream full, batch 4 SDXL
`--highvram`	Pins everything in VRAM	Don't use — you'll OOM on anything bigger than SD 1.5
`--cpu-vae`	Runs VAE decode on CPU	Combine with `--lowvram` for the largest models
Tiled VAE node	Splits VAE decode into tiles	Always-on for HiDream and FLUX.1-dev

The combination that wins most often is --normalvram plus a Tiled VAE Decode node in the workflow. That keeps SDXL workflows running flat-out while preventing the VAE-decode step from spiking memory and OOM-ing on a high-resolution output. For FLUX.1-dev and HiDream full, switch to --lowvram --cpu-vae; you lose ~30% throughput but the workflow completes reliably.

Can the 3060 run the new HiDream-O1-class open-weights image models?

Yes, with caveats that depend on which checkpoint you mean. The HiDream-O1 distilled checkpoint (~8 GB at fp16, less at fp8) fits cleanly on a 3060 with --normalvram and tiled VAE, and lands around 0.9 it/s at 1024². That's slow per iteration but tolerable for the model that currently tops the Artificial Analysis open-weights image arena.

The full HiDream-O1 checkpoint is ~17 GB at fp16, which forces --lowvram mode. In that configuration, generation throughput drops to ~0.4 it/s — call it 75 seconds per 1024² image at 30 steps. That's slow enough that interactive prompt-iteration is painful, but perfectly fine for batched overnight runs. Quantized GGUF builds of HiDream-O1 are emerging that should bring the full model into the same throughput range as the distill on a 3060; watch the ComfyUI subreddit for the first stable q4 release.

Spec-delta table: RTX 3060 12GB vs RTX 4060 8GB vs RTX 4070 for image gen

Card	VRAM	Bandwidth	SDXL 1024 it/s	FLUX.1-dev fits?	HiDream-O1 full fits?
RTX 3060 12GB	12 GB	360 GB/s	~3.8	yes	with `--lowvram`
RTX 4060 8GB	8 GB	272 GB/s	~4.1	OOM	OOM
RTX 4060 Ti 16GB	16 GB	288 GB/s	~4.4	yes	yes
RTX 4070 12GB	12 GB	504 GB/s	~6.2	yes	with `--lowvram`
RTX 4070 Ti Super 16GB	16 GB	672 GB/s	~8.1	yes	yes

The lesson the table teaches: at 8 GB VRAM, the 4060 is faster than the 3060 on the SDXL workloads it can run, but it falls off the cliff for FLUX.1-dev and HiDream and produces no image at all. The 4060 Ti 16GB is the natural step-up from a 3060 if you find yourself running large models daily — same compute tier as the 4060, but with the memory to actually use it. The 4070 12GB beats the 3060 on every metric except dollar-per-fps. The comparison vs the RX 9070 XT walks the AMD side of the same table.

Batch size vs resolution: the 12GB tradeoff curve

Below are the largest combinations of resolution × batch that fit on a 3060 12GB at SDXL with stock settings, no LoRA, no ControlNet:

Resolution	Max batch	VRAM used at max batch
512×512	8	~9.8 GB
768×768	4	~10.1 GB
1024×1024	2	~10.6 GB
1280×1280	1	~10.2 GB
1536×1536	1 (with tiled VAE)	~11.4 GB
2048×2048	n/a (always tiled VAE)	~11.8 GB

The hard line is 12 GB: ComfyUI starts thrashing once you push within ~400 MB of that ceiling, and the OOM kills the workflow. The line above each "max batch" gives you the practical safety margin. If you stack a LoRA or a ControlNet onto the workflow, drop the max batch by one — both eat ~600 MB-1 GB depending on size.

Perf-per-dollar and perf-per-watt for sustained generation

The 3060 12GB pulls 170 W TGP and delivers ~3.8 SDXL it/s. Daily perf-per-watt against current consumer options:

3060 12GB: 0.022 SDXL-it/s per W, ~$60 / SDXL-it/s of card cost
4060 Ti 16GB: 0.027 SDXL-it/s per W, ~$110 / SDXL-it/s
4070 12GB: 0.038 SDXL-it/s per W, ~$95 / SDXL-it/s
4070 Ti Super 16GB: 0.041 SDXL-it/s per W, ~$110 / SDXL-it/s

The 3060 wins on dollar-per-fps and loses on watt-per-fps. For a personal-use 1-2-hour-a-day image-gen workflow, the electricity gap is rounding-error money; the up-front purchase price is where the 3060's value lives. Pair it with a fast SSD — the WD Blue SN550 NVMe is enough for most setups — to keep checkpoint swaps from becoming the bottleneck on a multi-model workflow.

Common pitfalls

Loading multiple checkpoints into the workflow. Each loaded checkpoint stays resident in VRAM until manually unloaded; loading SDXL + FLUX in the same graph OOMs on 12GB. Use a separate workflow file per model family.
Forgetting Tiled VAE on high-resolution outputs. A 1536² or 2048² SDXL workflow without Tiled VAE will OOM at the decode step, not the denoising step — confusing because the bar gets to 100% before the crash.
Leaving the default sampler at high step counts. Karras-schedule samplers at 50+ steps are wasted compute on modern SDXL checkpoints; 25-30 steps is the floor on quality for almost every workflow.
Running ComfyUI on a desktop that's also driving 4K displays. The display compositor eats 400-800 MB of VRAM that your workflow could use. On Linux, an iGPU for the display gives you that back.
Skipping --lowvram "because it's slow." On the models that need it (FLUX.1-dev, HiDream full), --lowvram is not optional; without it you get OOM, not a slow image.

When NOT to bother with a 12GB card

You run batch 4+ SDXL workflows daily. Get the 4060 Ti 16GB — the speed-per-VRAM tradeoff stops favoring 12GB once you live in large batches.
**HiDream-O1 full is your primary model.** 12GB makes you live with --lowvram and ~0.4 it/s. A 16 GB card lifts the offload and roughly doubles throughput.
You need real-time interactive generation (sub-1-second prompts for, say, livestream visuals). Even SDXL Turbo on a 3060 doesn't quite hit that latency; an RTX 4090 does.

Bottom line: when 12GB is enough and when to step up

For the vast majority of ComfyUI users in 2026 — hobbyists doing single-image generations, LoRA training on SD 1.5/SDXL, occasional FLUX or HiDream runs — the 3060 12GB is the right card. It's the cheapest entry into "every open-weights model I read about fits, with the right flags." Pair it with a Ryzen 7 5800X and a 1 TB NVMe like the WD Blue SN550 and you have a credible image-gen workstation under $1,200 fully built.

If you're hitting OOM repeatedly on the models you actually run, that's the signal to step up to 16 GB — and at that point the 4060 Ti 16GB or a used 3090 24GB are the cards to look at, depending on whether you also run local LLMs (the 3090's 24 GB makes it dual-purpose in a way the 4060 Ti is not). Don't upgrade for raw it/s — upgrade for VRAM headroom, because that's what the 3060 12GB occasionally runs out of.

Related guides

Citations and sources

Products mentioned in this article

Tap any product for full specs, live Amazon & eBay pricing, and alternatives.

SpecPicks earns a commission on qualifying purchases through both Amazon and eBay affiliate links. Prices and stock update independently.

Frequently asked questions

Is 12GB of VRAM enough for ComfyUI in 2026?

For SDXL-class and most current open-weights image models, yes, 12GB is comfortable for single-image generation at common resolutions. Tight spots appear with very large batches, high upscales, or the heaviest new checkpoints, where 12GB forces offload modes. The article's settings table shows which flags keep you inside the budget without crashing.

What ComfyUI flags help on a 3060 12GB?

Memory-management modes that control model offload and VAE tiling are the key levers on 12GB. Enabling tiled VAE and a sensible offload policy lets you push higher resolutions without out-of-memory errors, at a modest speed cost. The piece lists the specific flags and the throughput tradeoff each one carries so you tune deliberately.

Can the 3060 run the new HiDream-O1 open-weights image models?

Open-weights image models vary widely in size, so feasibility depends on the specific checkpoint and quant. Lighter distributions are within reach of a 12GB card, while the largest variants may need offload or exceed practical limits. The article maps which current open-weights families fit 12GB and which realistically want a larger card.

How does the 3060 12GB compare to a 4060 8GB for image gen?

The 4060 is newer and faster per watt, but its 8GB frame buffer is the limiting factor for image generation, where VRAM headroom often matters more than raw speed. The 3060's extra 4GB lets it handle larger models and batches that make an 8GB card stumble. The spec-delta table quantifies both axes.

Do I need a fast SSD for ComfyUI?

A fast SSD does not change generation speed, but it dramatically cuts model-load and checkpoint-swap times, which matters when you juggle multiple large models. Image checkpoints and upscalers are big files, so an SSD with ample capacity keeps your workflow responsive. The article notes storage planning alongside the GPU recommendation.

Sources

More guides & deep dives from the SpecPicks archive

Browse all articles & guides →

More reviews from the SpecPicks archive

Browse all reviews →

More buying guides from SpecPicks

Browse all buying guides →

ComfyUI on an RTX 3060 12GB: VRAM Tuning and Image-Gen Throughput

Key takeaways

What does ComfyUI need from a GPU, and where does the 3060 sit?

How fast is SDXL / modern open-weights image gen on a 3060 12GB?

Which VRAM flags and offload modes matter on 12GB?

Can the 3060 run the new HiDream-O1-class open-weights image models?

Spec-delta table: RTX 3060 12GB vs RTX 4060 8GB vs RTX 4070 for image gen

Batch size vs resolution: the 12GB tradeoff curve

Perf-per-dollar and perf-per-watt for sustained generation

Common pitfalls

When NOT to bother with a 12GB card

Bottom line: when 12GB is enough and when to step up

Related guides

Citations and sources

Products mentioned in this article

MSI GeForce RTX 3060 Ventus 2X 12G Gaming Graphics Card - RTX 3060

MSI GeForce RTX 3060 Ventus 2X 12G Gaming Graphics Card - RTX 3060

ZOTAC Gaming GeForce RTX 3060 Twin Edge OC 12GB GDDR6 192-bit 15 Gbps PCIE 4.0…

ZOTAC Gaming GeForce RTX 3060 Twin Edge OC 12GB GDDR6 192-bit 15 Gbps PCIE 4.0…

ZOTAC Gaming GeForce RTX 3060 Twin Edge OC 12GB GDDR6 192-bit 15 Gbps PCIE 4.0…

Western Digital 1TB WD Blue SN550 NVMe Internal SSD - Gen3 x4 PCIe 8Gb/s, M.2…

Crucial BX500 1TB 3D NAND SATA 2.5-Inch Internal SSD, up to 540MB/s…

Frequently asked questions

Sources

Recommended reading

More guides & deep dives from the SpecPicks archive

More reviews from the SpecPicks archive

More buying guides from SpecPicks

ComfyUI on an RTX 3060 12GB: VRAM Tuning and Image-Gen Throughput

Key takeaways

What does ComfyUI need from a GPU, and where does the 3060 sit?

How fast is SDXL / modern open-weights image gen on a 3060 12GB?

Which VRAM flags and offload modes matter on 12GB?

Can the 3060 run the new HiDream-O1-class open-weights image models?

Spec-delta table: RTX 3060 12GB vs RTX 4060 8GB vs RTX 4070 for image gen

Batch size vs resolution: the 12GB tradeoff curve

Perf-per-dollar and perf-per-watt for sustained generation

Common pitfalls

When NOT to bother with a 12GB card

Bottom line: when 12GB is enough and when to step up

Related guides

Citations and sources

MSI GeForce RTX 3060 Ventus 2X 12G Gaming Graphics Card - RTX 3060

MSI GeForce RTX 3060 Ventus 2X 12G Gaming Graphics Card - RTX 3060

ZOTAC Gaming GeForce RTX 3060 Twin Edge OC 12GB GDDR6 192-bit 15 Gbps PCIE 4.0…

ZOTAC Gaming GeForce RTX 3060 Twin Edge OC 12GB GDDR6 192-bit 15 Gbps PCIE 4.0…

ZOTAC Gaming GeForce RTX 3060 Twin Edge OC 12GB GDDR6 192-bit 15 Gbps PCIE 4.0…

Western Digital 1TB WD Blue SN550 NVMe Internal SSD - Gen3 x4 PCIe 8Gb/s, M.2…

Crucial BX500 1TB 3D NAND SATA 2.5-Inch Internal SSD, up to 540MB/s…

Frequently asked questions

Sources

Recommended reading

Keep reading on SpecPicks

More from the archive

Deeper dives from the SpecPicks archive

Just published on SpecPicks