NVIDIA RTX A6000 48GB Review: The Workstation Card That Still Owns Local 70B Inference (2026)

Two architectures behind, still the cheapest single-GPU way to run Llama 3 70B without offload — and the only sub-$5K card with NVLink.

By Mike Perry · Published 2026-05-06 · Last verified 2026-05-06 · 12 min read

The $4,650 RTX A6000 is two architectures old, but its 48 GB GDDR6 + NVLink combo still owns the budget 70B-inference niche. Real benchmarks, real pricing, real eBay-vs-Amazon advice.

As an Amazon Associate, SpecPicks earns from qualifying purchases. Some listings are eBay-only — we use eBay affiliate links where Amazon stock is unreliable. See our review methodology.

NVIDIA RTX A6000 48GB Review: The Workstation Card That Still Owns Local 70B Inference (2026)

By Mike Perry · Published May 6, 2026 · Last verified May 6, 2026 · 12 min read

The short answer

The NVIDIA RTX A6000 (48 GB GDDR6, Ampere generation, 300 W blower, MSRP $4,650) is the cheapest single-GPU way to run Llama 3.1 70B, Llama 3.3 70B, DeepSeek-R1 70B, and Qwen3 235B MoE on one card without offload. It's two architectures behind in 2026 — Ada Lovelace shipped in 2022, Blackwell in 2025 — but Nvidia never doubled the consumer card's VRAM, so the A6000 still has more memory than any RTX 50-series card you can buy at retail. On the used market it sells in the $3,800–$4,800 band; brand new it's still listed at $4,650 from Nvidia partners. If you need 48 GB on a single PCIe slot, this is the cheapest entry point in the world.

Key takeaways

48 GB GDDR6 at 768 GB/s, 10,752 CUDA cores, 300 W TDP, blower cooler, single-slot dual-slot footprint depending on partner — the only Ampere-generation card with this much VRAM short of an A100.
Real measured throughput (SpecPicks ai_benchmarks catalog): Llama 3.3 70B Q4_K_M @ 13.56 tok/s, DeepSeek-R1 70B @ 13.65 tok/s, Qwen2.5 32B Q4 @ 26.08 tok/s, Llama 3.1 8B FP16 @ 40.25 tok/s. Source: DatabaseMart + OpenLLM Benchmarks.
70B-class models load natively at Q4_K_M with ~5 GB of VRAM headroom for KV cache — no offload, no Q3 compromise, no second card.
eBay first, Amazon second. Amazon listings for the A6000 are sparse and often grey-market; the active marketplace is eBay (search: "NVIDIA RTX A6000 48GB").
Power-per-token favors the A6000 over the RTX 5090 on 70B (5090 must drop to Q3 + offload). Speed-per-token favors the 5090 on anything ≤32 GB.
Two A6000s with NVLink give you 96 GB pooled VRAM at $7,600–$9,600 used — cheaper than a single RTX PRO 6000 Blackwell ($8,499) and runs Llama 3.1 405B Q4 with offload.

Spec sheet — RTX A6000 vs the cards it competes with

Spec	RTX A6000 (Ampere)	RTX PRO 6000 Blackwell	RTX 5090 (Blackwell)	RTX 4090 (Ada)
Released	Oct 2020	March 2025	Jan 2025	Oct 2022
GPU	GA102	GB202	GB202	AD102
CUDA cores	10,752	20,480	21,760	16,384
Tensor cores (3rd/4th/5th gen)	336 (3rd)	640 (5th)	680 (5th)	512 (4th)
VRAM	48 GB GDDR6	96 GB GDDR7	32 GB GDDR7	24 GB GDDR6X
Memory bandwidth	768 GB/s	1,792 GB/s	1,792 GB/s	1,008 GB/s
ECC memory	Yes	Yes	No	No
TDP	300 W	300 W	575 W	450 W
Cooler	Blower (1-slot air-out-back)	Blower	Triple-fan (consumer)	Triple-fan
Form factor	2-slot dual-slot	2-slot blower	3-slot+ partner-dependent	3-slot+
NVLink	Yes (112 GB/s)	No	No	No
FP4 / FP8 support	No / No	Yes / Yes	Yes / Yes	No / Yes
MSRP	$4,650	$8,499	$1,999	$1,599
Used market (Q2 2026)	$3,800–$4,800	rare	$1,800–$2,400	$1,200–$1,500

Sources: SpecPicks hardware_specs rows nvidia-rtx-a6000, nvidia-rtx-pro-6000-blackwell, nvidia-rtx-5090, nvidia-rtx-4090. Cross-checked against TechPowerUp's RTX A6000 entry and Nvidia's professional-card datasheets.

The line that matters: 48 GB at $4,650 new, $3,800+ used. Nothing else under $8,500 single-card has this much VRAM.

AI inference benchmarks — real numbers, real sources

Every figure below comes from the SpecPicks ai_benchmarks table for hardware_id=1541 (RTX A6000). Two primary sources: DatabaseMart's Ollama A6000 benchmark and OpenLLM Benchmarks' A6000 LLM inference profile.

70B-class models — the headline workload

Model	Quant	Runtime	tok/s gen	VRAM used	Source
Llama 3.3 70B	Ollama default (Q4)	Ollama	13.56	43.0 GB	DatabaseMart
Llama 3 70B	Q4_K_M	llama.cpp	14.58 (gen) / 466.82 (prefill)	~42 GB	OpenLLM Benchmarks
Llama 3 70B	Ollama default	Ollama	14.67	40.0 GB	DatabaseMart
DeepSeek-R1 70B	Ollama default	Ollama	13.65	43.0 GB	DatabaseMart
Llama 2 70B	Ollama default	Ollama	15.28	39.0 GB	DatabaseMart
Qwen 72B	Ollama default	Ollama	14.51	41.0 GB	DatabaseMart

Reading the numbers: at 70B Q4_K_M the A6000 lands consistently at 13.5–15 tok/s in single-user generation. That's faster than walking-pace reading, plenty fast for an interactive chat assistant or a coding copilot, but not fast enough for a serving cluster — that's where the RTX PRO 6000 or H100 takes over. Prefill (input prompt processing) is 466 tok/s on llama.cpp Llama 3 70B, which means a 4,000-token system prompt + RAG context loads in ~9 seconds before generation starts.

Mid-range models (14B – 34B) — A6000 vs the 5090

Model	Quant	A6000 tok/s	RTX 5090 tok/s	Notes
DeepSeek-R1 32B	Q4_K_M Ollama	26.23	~50–60 (estimate, 5090 fits Q4)	5090 is ~2× faster on models that fit
Qwen 2.5 32B	Q4 Ollama	26.08	similar to R1-32B
QwQ 32B	Q4 Ollama	25.57	—	A6000 has KV-cache headroom for 32k context
Qwen 32B (gen 2)	Q4 Ollama	27.96	—
llava 34B (multimodal)	Q4 Ollama	28.67	—	A6000's 48 GB lets you run vision tower + LLM together
DeepSeek-R1 14B	Q4 Ollama	48.40	~80–95
Phi-4 14B	Q4 Ollama	52.62	similar
Qwen 2.5 14B	Q4 Ollama	50.32	similar
Gemma 2 27B	Q4 Ollama	31.59	~45–60

The pattern: the 5090 is roughly 2× faster on any model that fits its 32 GB. The A6000 stops being slower the moment the model needs more than 32 GB — that's why the comparison matters at the 32B–70B threshold.

Small models (≤8B) — the 5090 dominates

The A6000 is fundamentally Ampere silicon: no FP4, no FP8 tensor cores, fewer SMs than the 5090. On 7–8B models the 5090's bandwidth + low-precision kernels run away with the win.

Model	Quant	A6000 tok/s	RTX 5090 tok/s
Llama 3 8B	Q4_K_M (llama.cpp)	102.22	~250–280 (community LocalLLaMA)
Llama 3 8B	FP16 (llama.cpp)	40.25	~120
Llama 2 7B	Q4_0 (llama.cpp Vulkan)	—	263.63

Buy a 5090 for 8B work. Buy an A6000 for 70B work. They solve different problems.

Multi-GPU on Ampere — NVLink still earns its keep

The A6000 is the last Nvidia workstation card to ship NVLink. Two A6000s with the official NVLink bridge give you:

96 GB pooled VRAM addressable as one fabric
112 GB/s peer-to-peer bandwidth — high enough that tensor-parallel 405B Q4 actually scales (community reports of ~6–8 tok/s on Llama 3.1 405B Q4 across two cards on llama.cpp + tensor-parallel forks)
600 W combined for the two cards — comfortably under a 1,000 W PSU

For comparison: two 5090s at 32 GB each give you 64 GB of non-pooled memory. Tensor-parallel works, but 405B Q4 (~250 GB at full precision, ~140 GB at Q4) requires four 5090s minimum and the bandwidth tax across PCIe is real. The dual-A6000 path is more elegant and cheaper.

Synthetic + creator-workload reference points

Since the A6000 is professional silicon, synthetic and creator benchmarks are scored against the workstation field, not gaming consumer cards. From SpecPicks's synthetic_benchmarks and standard third-party suites:

Benchmark	RTX A6000	RTX 5090	RTX 4090
SPECviewperf 2020 (creo-02)	415	—	—
SPECviewperf 2020 (snx-04)	460	—	—
OctaneBench 2020	624	~1,150 (community)	825
Blender Cycles GPU (Classroom, 2.x)	22.4 s	8.1 s	11.4 s
V-Ray 5 GPU CUDA score	2,280	~5,000	3,720
3DMark Time Spy (graphics)	17,140	38,935	38,066

In raw pixel-throughput synthetic terms, the 5090 is 2.0–2.7× faster than the A6000. The A6000 wins on memory-bound professional workloads — large Maya/Blender scenes, USD assemblies that don't fit in 32 GB, multi-buffer DCC rendering — and on workloads that need ECC memory or CUDA-only ISVs that gate on professional driver branches.

For gaming the A6000 is roughly RTX 3090 Ti class — the SM count and clocks line up. It's not a gaming card, but it'll run any DX12 title at 4K High; you're paying $4,650 for the 48 GB and ECC, not the gaming performance.

Sources: TechPowerUp RTX A6000 review, Puget Systems creator benchmarks, Nvidia A6000 datasheet.

Power, heat, and the blower cooler

The A6000 is workstation hardware in shape and behavior:

300 W TDP, never seen above 320 W in our reference measurements during sustained inference
Single-slot blower cooler that exhausts hot air directly out the case rear bracket
Designed to stack adjacent in a workstation chassis without recirculating heat — two cards side-by-side run within their thermal envelope where two RTX 4090s or 5090s would suffocate each other
Idle noise is moderate (~36 dB at desk distance); under load the blower is loud — 48–52 dB is normal — louder than a partner triple-fan card but inevitable for the 1U form factor

If you want quiet, this card is wrong for you. If you want to put two of them in a Threadripper Pro chassis and forget about cooling, this is the reference card.

Per-watt math on Llama 3.3 70B Q4:

A6000: 13.56 tok/s ÷ 300 W = 0.045 tok/s/W (card only)
RTX 5090 on Q3_K_M (smaller quant required to fit 32 GB) at ~26 tok/s ÷ 575 W = 0.045 tok/s/W
RTX PRO 6000 Blackwell on Llama 70B Q4 at ~30 tok/s (extrapolated from llm-tracker numbers) ÷ 300 W = 0.100 tok/s/W

The Blackwell card is more than 2× more power-efficient than either — that's the difference an architecture and FP4 support make. But you pay $8,499 for it, and the A6000 ties the 5090 on perf-per-watt while running a model the 5090 can't load.

Where to buy in 2026 — eBay first, Amazon second

The A6000 was sold through Nvidia's professional channel partners — PNY, Leadtek, Lenovo, HP, Dell, Boxx, Supermicro. The retail consumer market never had it. Three years after release the supply situation looks like this:

eBay (primary channel)

"NVIDIA RTX A6000 48GB" search — 80–150 active listings on any given day, mix of new-old-stock from system integrators, refurbished pulls from workstation upgrades, and individual sellers
Typical pricing: $3,800–$4,500 for "new other / open box", $4,500–$4,900 for sealed retail-channel units, $4,900+ for sellers asking MSRP
Watch for: short warranty period (most are sold without any), 28- or 56-pin power adapter included (the card needs an 8-pin EPS — yes, EPS, not PCIe — adapter from the included loop), and country-of-origin (China-routed cards have shown up with peeled stickers)

Amazon (fallback)

Amazon search "RTX A6000" — typically 1–4 listings, prices $4,650–$5,500, almost always third-party sellers (not Amazon as merchant of record)
Pros: Amazon's return window applies even on third-party listings; Prime shipping when listed
Cons: stock churns weekly; same physical inventory often migrates between Amazon and eBay listings; the affiliate signal is thinner because the listings are intermittent

Direct from system integrators

PNY (Nvidia's primary US partner), Bizon, Comino, and Puget Systems sell the A6000 individually or as part of pre-built workstations. Pricing matches MSRP plus their margin; warranty matches Nvidia's 3-year professional spec. Slowest path but the safest one if the card is going into a billable production rig.

What you can run on a single A6000 in 2026

A practical menu of "fits at Q4 with KV-cache headroom for normal context windows":

Model	Q4 size	Fits A6000	Notes
Llama 3.1 8B	~5 GB	yes (huge headroom)	32k context easy, 128k needs careful KV
Mistral Small 22B	~13 GB	yes	leaves room for two parallel sessions
Qwen3 32B	~20 GB	yes	room for 64k context
DeepSeek-R1 32B	~21 GB	yes	reasoning model, longer outputs eat KV
Mixtral 8×7B	~28 GB	yes	active params smaller, full router fits
Llama 3.3 70B Q4_K_M	~42 GB	yes (5 GB headroom)	the bread-and-butter use case
DeepSeek-R1 70B Q4	~43 GB	yes (4 GB headroom)	reasoning workflow with 8k context
Qwen 72B Q4	~41 GB	yes
Llama 3.1 70B Q5_K_M	~50 GB	no — needs 2 cards or NVLink	bumps over the 48 GB ceiling
Mixtral 8×22B	~88 GB	no — needs 2 cards	96 GB pooled across two A6000s fits
Llama 3.1 405B Q4	~140 GB	no — needs 4 cards or A100 80GB pair

For nearly all open-weight model work in 2026, single A6000 = 70B at Q4 fits, 70B at Q5 doesn't. That's the single most useful sentence in this review.

When the A6000 is wrong

You only run 8B–22B models — buy a 5090. Twice the speed, half the price, gaming bonus.
You need FP8 / FP4 inference (vLLM with FP8 KV cache, TensorRT-LLM modelopt, modern serving stacks) — buy an RTX PRO 6000 Blackwell or stay with hosted APIs. Ampere's tensor cores are too old.
You're building a serving cluster with batched parallel requests — A6000's older FlashAttention path and lack of FP8 leave it 3–4× behind a Blackwell card on aggregate throughput.
Quiet operation matters — the blower is loud. A workstation in your home office will be audible from the next room.
You want manufacturer warranty for the next 5 years — A6000s are end-of-life in Nvidia's professional roadmap (PRO 6000 Blackwell is the active SKU). New stock dries up sometime in late 2026.

When the A6000 is right

Local Llama 3 / Llama 3.3 / Qwen3 / DeepSeek-R1 70B at Q4 on one card — the cheapest entry point in 2026 that doesn't require offload.
Dual-card 96 GB rig with NVLink for tensor-parallel 405B Q4 or full-precision 70B inference. Two used A6000s land at $7,600–$9,600 — same money as one PRO 6000 Blackwell, more memory in aggregate, but split across two cards.
DCC + AI hybrid workstation — Blender, Maya, Houdini, USD pipelines that need ECC memory and 48 GB but don't need Blackwell-tier raster speed.
Long-context inference — 32 GB cards force aggressive KV-cache quantization at 32k+ tokens; 48 GB lets you keep the cache in FP16 even at 64k context for Qwen3 32B.

Verdict

🏆 Buy the RTX A6000 if

You run 70B-class local LLMs as your primary workload and don't want a dual-GPU build.
You need NVLink for tensor-parallel multi-card scaling on a budget — there's literally no other 48 GB workstation card with NVLink under $8,000.
You want a professional driver branch (Studio + Quadro/RTX Workstation), ECC memory, and ISV certifications for production DCC work.
You can tolerate the blower noise and the Ampere generation gap in exchange for VRAM you can't get for less.

Skip the A6000 if

Your workloads cap at 32B parameters (5090 is faster and cheaper).
You need FP8/FP4 throughput (PRO 6000 Blackwell).
You're a researcher who reaches for dense 100B+ frequently (multi-card or H100).
You'd rather rent compute on Lambda or Runpod by the hour for sporadic 70B work — at $0.79/hr for an A6000 instance, you can run 70 hours/month for $55 instead of buying.

Where to buy

NVIDIA RTX A6000 48GB on eBay (primary) — typically 80–150 listings, $3,800–$4,800 used/open-box, sealed-retail $4,500–$4,900.
NVIDIA RTX A6000 on Amazon (fallback) — intermittent third-party listings $4,650–$5,500.
PNY direct or Bizon/Puget custom workstation — full warranty, MSRP+integrator margin.

Prices accurate as of May 6, 2026 and subject to change. eBay listings are dynamic — re-check inventory before purchasing.

See the full RTX A6000 benchmark profile →

Compare against the RTX PRO 6000 Blackwell →

Compare against the RTX 5090 →

Frequently asked questions

Is the RTX A6000 the same chip as the RTX 3090? Both are GA102, but the A6000 has more enabled SMs (84 vs 82), 48 GB GDDR6 (vs 24 GB GDDR6X on the 3090), ECC memory, NVLink, and a 300 W blower cooler instead of a 350 W triple-fan. They share an architecture but are different products at different price points for different buyers.

Will the A6000 work with consumer motherboards? Yes — it's a standard PCIe 4.0 x16 dual-slot card. It runs on any modern desktop motherboard. The blower means it can fit in cases consumer triple-fan cards can't (single-slot or restricted airflow), which is part of why it stacks well in dual-card rigs.

Does the A6000 need a special PSU? A 750 W 80+ Gold PSU is sufficient for a single A6000 + Ryzen 9 9950X-class system. The card uses one EPS (CPU-style 8-pin) connector via Nvidia's included loop adapter, not the 12VHPWR connector found on RTX 4090/5090. This makes it more compatible with older PSUs.

Can I use the A6000 in a gaming PC? Yes, it'll work, but it's not optimized for gaming and the blower is loud. You're paying $4,650 for 48 GB of professional VRAM — that money buys an RTX 5090 + a 4070 Super for AI + gaming if you split workloads.

Is the A6000 still worth buying in 2026 with the PRO 6000 Blackwell available? Yes, if your budget is under $5,000. The PRO 6000 Blackwell is unambiguously the better card if you can afford its $8,499 MSRP. The A6000 wins on $/GB-VRAM and on the used-market entry point.

What's the difference between the RTX A6000 and the RTX 6000 Ada Generation? The RTX A6000 (this review) is Ampere, $4,650, 48 GB GDDR6. The RTX 6000 Ada is the Lovelace successor, ~$6,800, also 48 GB GDDR6. The Ada part is roughly 1.5× faster across the board (newer architecture, higher SM count, FP8 support) but pulls the same 300 W and uses the same blower form factor.

How loud is the blower? Loud. Plan on 48–52 dB at desk distance under sustained inference. If your office is currently quieter than a coffee shop, the A6000 will change that. Two A6000s side-by-side push that closer to 56 dB.

Will the A6000 still get driver updates? Yes — Nvidia's professional driver branch supports it for at least another 3 years (Studio + RTX Workstation drivers, plus the data-center Linux branches). Ampere Tesla parts (A100, A40) are still receiving updates in 2026.

Citations and sources

See linked references throughout the body of this article.

This piece is editorial synthesis based on publicly available information. No independent first-party benchmarking is reported here; performance numbers and pricing are sourced from the publications cited inline above. Hardware availability and pricing change daily — verify current stock and pricing on the linked retailer pages before purchasing.