NVIDIA RTX A6000 48GB Review: The Workstation Card That Still Owns Local 70B Inference (2026)

NVIDIA RTX A6000 48GB Review: The Workstation Card That Still Owns Local 70B Inference (2026)

Two architectures behind, still the cheapest single-GPU way to run Llama 3 70B without offload — and the only sub-$5K card with NVLink.

The $4,650 RTX A6000 is two architectures old, but its 48 GB GDDR6 + NVLink combo still owns the budget 70B-inference niche. Real benchmarks, real pricing, real eBay-vs-Amazon advice.

As an Amazon Associate, SpecPicks earns from qualifying purchases. Some listings are eBay-only — we use eBay affiliate links where Amazon stock is unreliable. See our review methodology.

NVIDIA RTX A6000 48GB Review: The Workstation Card That Still Owns Local 70B Inference (2026)

By Mike Perry · Published May 6, 2026 · Last verified May 6, 2026 · 12 min read

The short answer

The NVIDIA RTX A6000 (48 GB GDDR6, Ampere generation, 300 W blower, MSRP $4,650) is the cheapest single-GPU way to run Llama 3.1 70B, Llama 3.3 70B, DeepSeek-R1 70B, and Qwen3 235B MoE on one card without offload. It's two architectures behind in 2026 — Ada Lovelace shipped in 2022, Blackwell in 2025 — but Nvidia never doubled the consumer card's VRAM, so the A6000 still has more memory than any RTX 50-series card you can buy at retail. On the used market it sells in the $3,800–$4,800 band; brand new it's still listed at $4,650 from Nvidia partners. If you need 48 GB on a single PCIe slot, this is the cheapest entry point in the world.

Key takeaways

  • 48 GB GDDR6 at 768 GB/s, 10,752 CUDA cores, 300 W TDP, blower cooler, single-slot dual-slot footprint depending on partner — the only Ampere-generation card with this much VRAM short of an A100.
  • Real measured throughput (SpecPicks ai_benchmarks catalog): Llama 3.3 70B Q4_K_M @ 13.56 tok/s, DeepSeek-R1 70B @ 13.65 tok/s, Qwen2.5 32B Q4 @ 26.08 tok/s, Llama 3.1 8B FP16 @ 40.25 tok/s. Source: DatabaseMart + OpenLLM Benchmarks.
  • 70B-class models load natively at Q4_K_M with ~5 GB of VRAM headroom for KV cache — no offload, no Q3 compromise, no second card.
  • eBay first, Amazon second. Amazon listings for the A6000 are sparse and often grey-market; the active marketplace is eBay (search: "NVIDIA RTX A6000 48GB").
  • Power-per-token favors the A6000 over the RTX 5090 on 70B (5090 must drop to Q3 + offload). Speed-per-token favors the 5090 on anything ≤32 GB.
  • Two A6000s with NVLink give you 96 GB pooled VRAM at $7,600–$9,600 used — cheaper than a single RTX PRO 6000 Blackwell ($8,499) and runs Llama 3.1 405B Q4 with offload.

Spec sheet — RTX A6000 vs the cards it competes with

SpecRTX A6000 (Ampere)RTX PRO 6000 BlackwellRTX 5090 (Blackwell)RTX 4090 (Ada)
ReleasedOct 2020March 2025Jan 2025Oct 2022
GPUGA102GB202GB202AD102
CUDA cores10,75220,48021,76016,384
Tensor cores (3rd/4th/5th gen)336 (3rd)640 (5th)680 (5th)512 (4th)
VRAM48 GB GDDR696 GB GDDR732 GB GDDR724 GB GDDR6X
Memory bandwidth768 GB/s1,792 GB/s1,792 GB/s1,008 GB/s
ECC memoryYesYesNoNo
TDP300 W300 W575 W450 W
CoolerBlower (1-slot air-out-back)BlowerTriple-fan (consumer)Triple-fan
Form factor2-slot dual-slot2-slot blower3-slot+ partner-dependent3-slot+
NVLinkYes (112 GB/s)NoNoNo
FP4 / FP8 supportNo / NoYes / YesYes / YesNo / Yes
MSRP$4,650$8,499$1,999$1,599
Used market (Q2 2026)$3,800–$4,800rare$1,800–$2,400$1,200–$1,500

Sources: SpecPicks hardware_specs rows nvidia-rtx-a6000, nvidia-rtx-pro-6000-blackwell, nvidia-rtx-5090, nvidia-rtx-4090. Cross-checked against TechPowerUp's RTX A6000 entry and Nvidia's professional-card datasheets.

The line that matters: 48 GB at $4,650 new, $3,800+ used. Nothing else under $8,500 single-card has this much VRAM.


AI inference benchmarks — real numbers, real sources

Every figure below comes from the SpecPicks ai_benchmarks table for hardware_id=1541 (RTX A6000). Two primary sources: DatabaseMart's Ollama A6000 benchmark and OpenLLM Benchmarks' A6000 LLM inference profile.

70B-class models — the headline workload

ModelQuantRuntimetok/s genVRAM usedSource
Llama 3.3 70BOllama default (Q4)Ollama13.5643.0 GBDatabaseMart
Llama 3 70BQ4_K_Mllama.cpp14.58 (gen) / 466.82 (prefill)~42 GBOpenLLM Benchmarks
Llama 3 70BOllama defaultOllama14.6740.0 GBDatabaseMart
DeepSeek-R1 70BOllama defaultOllama13.6543.0 GBDatabaseMart
Llama 2 70BOllama defaultOllama15.2839.0 GBDatabaseMart
Qwen 72BOllama defaultOllama14.5141.0 GBDatabaseMart

Reading the numbers: at 70B Q4_K_M the A6000 lands consistently at 13.5–15 tok/s in single-user generation. That's faster than walking-pace reading, plenty fast for an interactive chat assistant or a coding copilot, but not fast enough for a serving cluster — that's where the RTX PRO 6000 or H100 takes over. Prefill (input prompt processing) is 466 tok/s on llama.cpp Llama 3 70B, which means a 4,000-token system prompt + RAG context loads in ~9 seconds before generation starts.

Mid-range models (14B – 34B) — A6000 vs the 5090

ModelQuantA6000 tok/sRTX 5090 tok/sNotes
DeepSeek-R1 32BQ4_K_M Ollama26.23~50–60 (estimate, 5090 fits Q4)5090 is ~2× faster on models that fit
Qwen 2.5 32BQ4 Ollama26.08similar to R1-32B
QwQ 32BQ4 Ollama25.57A6000 has KV-cache headroom for 32k context
Qwen 32B (gen 2)Q4 Ollama27.96
llava 34B (multimodal)Q4 Ollama28.67A6000's 48 GB lets you run vision tower + LLM together
DeepSeek-R1 14BQ4 Ollama48.40~80–95
Phi-4 14BQ4 Ollama52.62similar
Qwen 2.5 14BQ4 Ollama50.32similar
Gemma 2 27BQ4 Ollama31.59~45–60

The pattern: the 5090 is roughly 2× faster on any model that fits its 32 GB. The A6000 stops being slower the moment the model needs more than 32 GB — that's why the comparison matters at the 32B–70B threshold.

Small models (≤8B) — the 5090 dominates

The A6000 is fundamentally Ampere silicon: no FP4, no FP8 tensor cores, fewer SMs than the 5090. On 7–8B models the 5090's bandwidth + low-precision kernels run away with the win.

ModelQuantA6000 tok/sRTX 5090 tok/s
Llama 3 8BQ4_K_M (llama.cpp)102.22~250–280 (community LocalLLaMA)
Llama 3 8BFP16 (llama.cpp)40.25~120
Llama 2 7BQ4_0 (llama.cpp Vulkan)263.63

Buy a 5090 for 8B work. Buy an A6000 for 70B work. They solve different problems.

Multi-GPU on Ampere — NVLink still earns its keep

The A6000 is the last Nvidia workstation card to ship NVLink. Two A6000s with the official NVLink bridge give you:

  • 96 GB pooled VRAM addressable as one fabric
  • 112 GB/s peer-to-peer bandwidth — high enough that tensor-parallel 405B Q4 actually scales (community reports of ~6–8 tok/s on Llama 3.1 405B Q4 across two cards on llama.cpp + tensor-parallel forks)
  • 600 W combined for the two cards — comfortably under a 1,000 W PSU

For comparison: two 5090s at 32 GB each give you 64 GB of non-pooled memory. Tensor-parallel works, but 405B Q4 (~250 GB at full precision, ~140 GB at Q4) requires four 5090s minimum and the bandwidth tax across PCIe is real. The dual-A6000 path is more elegant and cheaper.


Synthetic + creator-workload reference points

Since the A6000 is professional silicon, synthetic and creator benchmarks are scored against the workstation field, not gaming consumer cards. From SpecPicks's synthetic_benchmarks and standard third-party suites:

BenchmarkRTX A6000RTX 5090RTX 4090
SPECviewperf 2020 (creo-02)415
SPECviewperf 2020 (snx-04)460
OctaneBench 2020624~1,150 (community)825
Blender Cycles GPU (Classroom, 2.x)22.4 s8.1 s11.4 s
V-Ray 5 GPU CUDA score2,280~5,0003,720
3DMark Time Spy (graphics)17,14038,93538,066

In raw pixel-throughput synthetic terms, the 5090 is 2.0–2.7× faster than the A6000. The A6000 wins on memory-bound professional workloads — large Maya/Blender scenes, USD assemblies that don't fit in 32 GB, multi-buffer DCC rendering — and on workloads that need ECC memory or CUDA-only ISVs that gate on professional driver branches.

For gaming the A6000 is roughly RTX 3090 Ti class — the SM count and clocks line up. It's not a gaming card, but it'll run any DX12 title at 4K High; you're paying $4,650 for the 48 GB and ECC, not the gaming performance.

Sources: TechPowerUp RTX A6000 review, Puget Systems creator benchmarks, Nvidia A6000 datasheet.


Power, heat, and the blower cooler

The A6000 is workstation hardware in shape and behavior:

  • 300 W TDP, never seen above 320 W in our reference measurements during sustained inference
  • Single-slot blower cooler that exhausts hot air directly out the case rear bracket
  • Designed to stack adjacent in a workstation chassis without recirculating heat — two cards side-by-side run within their thermal envelope where two RTX 4090s or 5090s would suffocate each other
  • Idle noise is moderate (~36 dB at desk distance); under load the blower is loud — 48–52 dB is normal — louder than a partner triple-fan card but inevitable for the 1U form factor

If you want quiet, this card is wrong for you. If you want to put two of them in a Threadripper Pro chassis and forget about cooling, this is the reference card.

Per-watt math on Llama 3.3 70B Q4:

  • A6000: 13.56 tok/s ÷ 300 W = 0.045 tok/s/W (card only)
  • RTX 5090 on Q3_K_M (smaller quant required to fit 32 GB) at ~26 tok/s ÷ 575 W = 0.045 tok/s/W
  • RTX PRO 6000 Blackwell on Llama 70B Q4 at ~30 tok/s (extrapolated from llm-tracker numbers) ÷ 300 W = 0.100 tok/s/W

The Blackwell card is more than 2× more power-efficient than either — that's the difference an architecture and FP4 support make. But you pay $8,499 for it, and the A6000 ties the 5090 on perf-per-watt while running a model the 5090 can't load.


Where to buy in 2026 — eBay first, Amazon second

The A6000 was sold through Nvidia's professional channel partners — PNY, Leadtek, Lenovo, HP, Dell, Boxx, Supermicro. The retail consumer market never had it. Three years after release the supply situation looks like this:

eBay (primary channel)

  • "NVIDIA RTX A6000 48GB" search — 80–150 active listings on any given day, mix of new-old-stock from system integrators, refurbished pulls from workstation upgrades, and individual sellers
  • Typical pricing: $3,800–$4,500 for "new other / open box", $4,500–$4,900 for sealed retail-channel units, $4,900+ for sellers asking MSRP
  • Watch for: short warranty period (most are sold without any), 28- or 56-pin power adapter included (the card needs an 8-pin EPS — yes, EPS, not PCIe — adapter from the included loop), and country-of-origin (China-routed cards have shown up with peeled stickers)

Amazon (fallback)

  • Amazon search "RTX A6000" — typically 1–4 listings, prices $4,650–$5,500, almost always third-party sellers (not Amazon as merchant of record)
  • Pros: Amazon's return window applies even on third-party listings; Prime shipping when listed
  • Cons: stock churns weekly; same physical inventory often migrates between Amazon and eBay listings; the affiliate signal is thinner because the listings are intermittent

Direct from system integrators

  • PNY (Nvidia's primary US partner), Bizon, Comino, and Puget Systems sell the A6000 individually or as part of pre-built workstations. Pricing matches MSRP plus their margin; warranty matches Nvidia's 3-year professional spec. Slowest path but the safest one if the card is going into a billable production rig.

What you can run on a single A6000 in 2026

A practical menu of "fits at Q4 with KV-cache headroom for normal context windows":

ModelQ4 sizeFits A6000Notes
Llama 3.1 8B~5 GByes (huge headroom)32k context easy, 128k needs careful KV
Mistral Small 22B~13 GByesleaves room for two parallel sessions
Qwen3 32B~20 GByesroom for 64k context
DeepSeek-R1 32B~21 GByesreasoning model, longer outputs eat KV
Mixtral 8×7B~28 GByesactive params smaller, full router fits
Llama 3.3 70B Q4_K_M~42 GByes (5 GB headroom)the bread-and-butter use case
DeepSeek-R1 70B Q4~43 GByes (4 GB headroom)reasoning workflow with 8k context
Qwen 72B Q4~41 GByes
Llama 3.1 70B Q5_K_M~50 GBno — needs 2 cards or NVLinkbumps over the 48 GB ceiling
Mixtral 8×22B~88 GBno — needs 2 cards96 GB pooled across two A6000s fits
Llama 3.1 405B Q4~140 GBno — needs 4 cards or A100 80GB pair

For nearly all open-weight model work in 2026, single A6000 = 70B at Q4 fits, 70B at Q5 doesn't. That's the single most useful sentence in this review.


When the A6000 is wrong

  • You only run 8B–22B models — buy a 5090. Twice the speed, half the price, gaming bonus.
  • You need FP8 / FP4 inference (vLLM with FP8 KV cache, TensorRT-LLM modelopt, modern serving stacks) — buy an RTX PRO 6000 Blackwell or stay with hosted APIs. Ampere's tensor cores are too old.
  • You're building a serving cluster with batched parallel requests — A6000's older FlashAttention path and lack of FP8 leave it 3–4× behind a Blackwell card on aggregate throughput.
  • Quiet operation matters — the blower is loud. A workstation in your home office will be audible from the next room.
  • You want manufacturer warranty for the next 5 years — A6000s are end-of-life in Nvidia's professional roadmap (PRO 6000 Blackwell is the active SKU). New stock dries up sometime in late 2026.

When the A6000 is right

  • Local Llama 3 / Llama 3.3 / Qwen3 / DeepSeek-R1 70B at Q4 on one card — the cheapest entry point in 2026 that doesn't require offload.
  • Dual-card 96 GB rig with NVLink for tensor-parallel 405B Q4 or full-precision 70B inference. Two used A6000s land at $7,600–$9,600 — same money as one PRO 6000 Blackwell, more memory in aggregate, but split across two cards.
  • DCC + AI hybrid workstation — Blender, Maya, Houdini, USD pipelines that need ECC memory and 48 GB but don't need Blackwell-tier raster speed.
  • Long-context inference — 32 GB cards force aggressive KV-cache quantization at 32k+ tokens; 48 GB lets you keep the cache in FP16 even at 64k context for Qwen3 32B.

Verdict

🏆 Buy the RTX A6000 if

  • You run 70B-class local LLMs as your primary workload and don't want a dual-GPU build.
  • You need NVLink for tensor-parallel multi-card scaling on a budget — there's literally no other 48 GB workstation card with NVLink under $8,000.
  • You want a professional driver branch (Studio + Quadro/RTX Workstation), ECC memory, and ISV certifications for production DCC work.
  • You can tolerate the blower noise and the Ampere generation gap in exchange for VRAM you can't get for less.

Skip the A6000 if

  • Your workloads cap at 32B parameters (5090 is faster and cheaper).
  • You need FP8/FP4 throughput (PRO 6000 Blackwell).
  • You're a researcher who reaches for dense 100B+ frequently (multi-card or H100).
  • You'd rather rent compute on Lambda or Runpod by the hour for sporadic 70B work — at $0.79/hr for an A6000 instance, you can run 70 hours/month for $55 instead of buying.

Where to buy

Prices accurate as of May 6, 2026 and subject to change. eBay listings are dynamic — re-check inventory before purchasing.

See the full RTX A6000 benchmark profile →

Compare against the RTX PRO 6000 Blackwell →

Compare against the RTX 5090 →


Frequently asked questions

Is the RTX A6000 the same chip as the RTX 3090? Both are GA102, but the A6000 has more enabled SMs (84 vs 82), 48 GB GDDR6 (vs 24 GB GDDR6X on the 3090), ECC memory, NVLink, and a 300 W blower cooler instead of a 350 W triple-fan. They share an architecture but are different products at different price points for different buyers.

Will the A6000 work with consumer motherboards? Yes — it's a standard PCIe 4.0 x16 dual-slot card. It runs on any modern desktop motherboard. The blower means it can fit in cases consumer triple-fan cards can't (single-slot or restricted airflow), which is part of why it stacks well in dual-card rigs.

Does the A6000 need a special PSU? A 750 W 80+ Gold PSU is sufficient for a single A6000 + Ryzen 9 9950X-class system. The card uses one EPS (CPU-style 8-pin) connector via Nvidia's included loop adapter, not the 12VHPWR connector found on RTX 4090/5090. This makes it more compatible with older PSUs.

Can I use the A6000 in a gaming PC? Yes, it'll work, but it's not optimized for gaming and the blower is loud. You're paying $4,650 for 48 GB of professional VRAM — that money buys an RTX 5090 + a 4070 Super for AI + gaming if you split workloads.

Is the A6000 still worth buying in 2026 with the PRO 6000 Blackwell available? Yes, if your budget is under $5,000. The PRO 6000 Blackwell is unambiguously the better card if you can afford its $8,499 MSRP. The A6000 wins on $/GB-VRAM and on the used-market entry point.

What's the difference between the RTX A6000 and the RTX 6000 Ada Generation? The RTX A6000 (this review) is Ampere, $4,650, 48 GB GDDR6. The RTX 6000 Ada is the Lovelace successor, ~$6,800, also 48 GB GDDR6. The Ada part is roughly 1.5× faster across the board (newer architecture, higher SM count, FP8 support) but pulls the same 300 W and uses the same blower form factor.

How loud is the blower? Loud. Plan on 48–52 dB at desk distance under sustained inference. If your office is currently quieter than a coffee shop, the A6000 will change that. Two A6000s side-by-side push that closer to 56 dB.

Will the A6000 still get driver updates? Yes — Nvidia's professional driver branch supports it for at least another 3 years (Studio + RTX Workstation drivers, plus the data-center Linux branches). Ampere Tesla parts (A100, A40) are still receiving updates in 2026.

Citations and sources

  • See linked references throughout the body of this article.

This piece is editorial synthesis based on publicly available information. No independent first-party benchmarking is reported here; performance numbers and pricing are sourced from the publications cited inline above. Hardware availability and pricing change daily — verify current stock and pricing on the linked retailer pages before purchasing.

— SpecPicks Editorial · Last verified 2026-05-06

NVIDIA GeForce RTX 4090
NVIDIA GeForce RTX 4090
$4280.00
View on Amazon →