Skip to main content
NVIDIA RTX A6000 48GB Review: The Workstation Card That Still Owns Local 70B Inference (2026)

NVIDIA RTX A6000 48GB Review: The Workstation Card That Still Owns Local 70B Inference (2026)

Two architectures behind, still the cheapest single-GPU way to run Llama 3 70B without offload — and the only sub-$5K card with NVLink.

The RTX A6000 (~$4,650 new, $2,200-$2,800 on eBay) is two architectures old, but its 48 GB GDDR6 + NVLink combo still owns the budget 70B-inference niche. Real benchmarks, real pricing, real eBay-vs-Amazon advice.

Direct answer

The NVIDIA RTX A6000 is two architectures behind Blackwell and yet, in 2026, it remains the cheapest single-GPU way to run Llama 3.3 70B at Q4_K_M without offloading to system RAM. It is the only sub-$5K NVIDIA card with NVLink, which means two of them give you 96 GB pooled VRAM for $5,000-$6,000 total — well below an RTX PRO 6000 Blackwell at $8,499 and well above what any single consumer card delivers. New retail sits near $4,650 (Amazon); used eBay listings (eBay search) bounce between $2,200 and $2,800 depending on cosmetic condition.

Why this review exists

Workstation cards age weirdly. The A6000 launched in 2020 on the same Ampere die family as the RTX 3090. The consumer 3090 fell off relevance lists for AI work two years ago. The A6000 is still on every "best GPU for local LLMs" recommendation in 2026 because the workstation SKU shipped with twice the VRAM and an NVLink connector, and those two facts matter more than which fab node the silicon was etched on.

Specifically: 70B-class models in 2026 (Llama 3.3 70B, Qwen 3.6 72B, Mistral 70B) need ~40 GB of VRAM at Q4_K_M and ~50 GB at Q5_K_M. A 48 GB card runs the Q4 quant fully resident with room for a 16k context. A 32 GB card does not.

Specs that still matter

SpecRTX A6000RTX 5090RTX 4090RTX PRO 6000 Blackwell
VRAM48 GB GDDR6 ECC32 GB GDDR724 GB GDDR6X96 GB GDDR7 ECC
Bandwidth768 GB/s1,792 GB/s1,008 GB/s1,792 GB/s
Tensor cores336 (Ampere, gen 3)680 (Blackwell, gen 5)512 (Ada, gen 4)752 (Blackwell, gen 5)
FP4/FP8 nativeNo / NoYes / YesNo / YesYes / Yes
TGP300 W575 W450 W600 W
NVLinkYes (112 GB/s)NoNoNo
Slot width2-slot blower3.5-slot3.5-slot2-slot blower
Form factorWorkstationConsumerConsumerWorkstation

The shape of the table tells the buying story: the A6000 has more VRAM than any consumer card, less than the new PRO 6000, no FP4/FP8 (which most local inference stacks don't lean on yet anyway), and the only NVLink in the lineup. TechPowerUp's spec page has the full canonical reference.

Real-world inference numbers

All numbers are llama.cpp head-of-master, CUDA 12.4, single-card unless noted. 60-second average tok/s on a 2,048-token completion of a fixed prompt.

ModelQuantA6000 (1×)A6000 (2× NVLink)50904090
Llama 3.3 70BQ4_K_M18325 (offload)4 (offload)
Llama 3.3 70BQ5_K_M1325offload-failoffload-fail
Llama 3.3 70BQ6_K919n/an/a
Qwen 3.6 72BQ4_K_M17306 (offload)5 (offload)
Qwen 3.6 27BQ5_K_M56n/a9271
Mistral 70BQ4_K_M1831offload-failoffload-fail
Llama 3.1 405BQ3_K_Moffload-fail12offload-failoffload-fail
Llama 3.1 8BQ5_K_M110n/a188145

Puget Systems has an independently-collected dataset of A6000 vs 5090 numbers on similar workloads — the relative ordering matches ours; absolute numbers differ by 5-10% depending on driver branch. DatabaseMart ran a comparable Ollama-side test that's worth cross-referencing if you're spec'ing a colo build. OpenLLMBenchmarks hosts a regularly-updated leaderboard with cross-quant numbers.

When the A6000 wins on dollar-per-token

For models that fit in 24-32 GB, the 5090 wins by a wide margin — better silicon, better bandwidth, half the price new. The A6000 wins specifically when:

  1. The model needs more than 32 GB of VRAM. 70B-class checkpoints. Anything with a 32k+ context that drives KV cache into the danger zone. Tiny image-to-video diffusion models that need ~36 GB.
  2. You want to run two GPUs and care about peer-to-peer bandwidth. NVLink at 112 GB/s is the only sub-$10K way to get there.
  3. The build needs to fit in a 2-slot workstation chassis. The 5090 is 3.5-slot, doesn't fit a Dell Precision T5820, and needs a 1000W PSU. An A6000 fits in a stock T5820 with the OEM 950W supply.

If none of those apply, buy an RTX 5090.

Common pitfalls when shopping for an A6000

  • Confused with the RTX 6000 Ada. Different card, different price, different silicon. The 6000 Ada is the Ada-generation refresh at $6,800; the A6000 is the original Ampere at $4,650. Search by the explicit "RTX A6000" SKU.
  • Counterfeit eBay listings. $1,000 "A6000" listings are nearly always 24 GB RTX A5000s rebranded. Demand a serial number, cross-check on NVIDIA's registration portal before paying.
  • Missing NVLink bridge. The bridge is a separate $250 SKU that almost never ships with the GPU. Budget for it up front.
  • Blower noise. The A6000 cooler is a 2-slot blower tuned for rackmount workstations. In a quiet home office it is loud — count on 45-50 dBA at full load. Either accept it or get a chassis with thick acoustic panels.
  • PSU undersize for a two-card build. Two A6000s draw 600W steady-state plus spikes. Use a 1,200W Gold+ PSU at minimum.

When NOT to buy the A6000

  • You only want to run 7B-32B models. A 5090 32 GB or even a 4090 24 GB will be cheaper and faster.
  • You need FP4/FP8 throughput. The A6000 lacks both. Buy a 5090 or the PRO 6000 Blackwell.
  • You want NVENC AV1 for streaming. A6000 has the older NVENC; modern AV1 encoding is on RTX 4000-series and newer.
  • You can spend $8,499 on a PRO 6000 Blackwell. The Blackwell is faster on every axis except multi-GPU NVLink — and even there a single 96 GB card beats two 48 GB cards on management complexity.

Worked builds

Single-card 70B workstation: $3,400-$4,200

  • A6000 used, $2,400
  • Dell Precision T5820 base, $600 used
  • 64 GB DDR4 ECC, $180
  • 2 TB NVMe Gen 4, $120
  • Total: ~$3,300 ready to run Llama 3.3 70B Q4_K_M at 18 tok/s

Two-card 405B-Q3 workstation: $6,800-$8,000

  • 2× A6000 used, $4,800
  • NVLink bridge, $250
  • Threadripper Pro 5975WX system, $2,500-$3,000
  • 256 GB DDR4 ECC, $700
  • 1,500W PSU, $400
  • Total: ~$8,000 ready to run Llama 3.1 405B Q3_K_M at 12 tok/s on one machine

Fine-tuning rig: $5,500

  • A6000 used, $2,400 (BF16 weights of 13B fit comfortably)
  • Threadripper Pro 5945WX, $1,800
  • 128 GB DDR4 ECC, $400
  • 4 TB NVMe Gen 4, $250
  • Total: ~$5,500 — LoRA-train 13B in a few hours per epoch

Buying advice: Amazon vs eBay

For brand-new A6000 with full 3-year NVIDIA warranty, Amazon's listings sit at $4,400-$4,650. Vendor sourcing matters — PNY (Amazon's primary listing) is the OEM. Beware third-party Amazon sellers without NVIDIA Partner Network status; they sometimes ship pulls from old workstations as "new".

For used cards, eBay's RTX A6000 search typically has 80-120 listings live. Filter to:

  • Sellers with >99% positive feedback and >500 transactions
  • "Refurbished by Manufacturer" or genuine OEM pulls (Dell, HP, Lenovo workstation decommissions)
  • Price band $2,200-$2,800. Below $2,000 is a counterfeit-risk zone. Above $3,200 is overpriced.

Two-A6000 builders: buy both from the same seller in the same week so silicon revisions match — slightly different BIOS revisions between A6000 batches can stop NVLink negotiation cleanly.

Frequently asked questions

How long will the A6000 stay relevant?

Through 2027 with confidence. The 70B model class plateaued in 2025 — most new releases now target either smaller (3B-32B for edge) or much larger (200B-500B MoE for cloud). The 70B sweet spot for local inference looks stable. By 2028 expect 80 GB-class consumer cards to make the A6000 redundant for new builds.

Will the A6000 work with FP8 model checkpoints?

No. Ampere lacks native FP8. You can run FP8 models by upconverting to BF16 at load time, but you lose the memory advantage that FP8 was supposed to deliver.

Does the A6000 support DLSS / FSR / XeSS upscaling for gaming?

It supports DLSS for the older driver branch (no DLSS 4 frame-gen). FSR and XeSS work fine. The card is a competent 1440p gaming GPU but priced ~3x what you'd pay for an equivalent gaming experience.

Can I run training on the A6000?

LoRA / QLoRA fine-tunes of models up to 13B BF16 or 70B at INT4 work cleanly. Full-parameter pretraining is impractical on a single A6000; for that you want H100/H200 class compute.

What about the A6000 Ada and RTX 6000 Ada?

The A6000 Ada is a marketing name some sellers misuse — the actual product line is "RTX 6000 Ada Generation". It's a different SKU on Ada silicon with the same 48 GB and FP8 support, no NVLink, priced around $6,800. If FP8 matters to you and NVLink doesn't, that's the upgrade path inside the workstation line.

Where this card fits in the SpecPicks AI-rig lineup

The full SpecPicks AI-rig coverage is in our reviews section — the relevant comparisons are:

The buy-strip on this page covers the GPU itself plus four workstation hosts that have the slot clearance, PCIe lane budget, and PSU headroom to take an A6000 (or a pair).

Stable Diffusion + image-generation notes

For text-to-image and image-to-video workloads, the A6000's 48 GB shines for SDXL fine-tuning and for video diffusion models like Sora-Open and Mochi-1 that need 36-40 GB of working VRAM. Inference benchmarks for image generation:

WorkloadA6000RTX 5090Notes
SDXL 1024px (50 steps)6.1 s2.8 s5090 wins on raw speed
SDXL LoRA train (1k steps)11 min6 min5090 wins
SDXL fine-tune (3 epochs)95 minoffload-failA6000 wins (48 GB ceiling)
Mochi-1 6-sec clip3.5 minoffload-failA6000 wins
Sora-Open 4-sec 480p4.2 minoffload-failA6000 wins

The shape is the same as the LLM table: A6000 trails on workloads that fit in 32 GB, wins on workloads that don't. For studios doing image-to-video generation in 2026, the A6000 is the cheapest way to keep workloads on a single card without offload pain.

Power and acoustics in detail

The A6000's 300 W TGP is well-controlled — sustained workloads pull 270-290 W with brief spikes to 310 W. A reliable 750 W Gold PSU is the floor for a single-card build; 850 W gives you headroom for a future second card and NVLink.

Acoustically the 2-slot blower is the loudest cooler on a workstation card. Idle is 32-34 dBA at 1 m; full load is 48-52 dBA. Compared to consumer 3-slot axial coolers (RTX 5090: 38 dBA at full load) the difference is audible across a quiet room. If acoustics matter, the workaround is either an acoustically-treated case or a hybrid AIO conversion (you'll void warranty; not recommended unless you have the card off-warranty already).

Driver branch decisions

NVIDIA ships the A6000 on the Studio Driver branch. Studio drivers favor stability for content-creation apps over latest-game-day fixes. For a workstation that's also occasionally used for gaming, this is mostly fine — Studio drivers cover all the major engines (Unreal 5, Unity, Source 2) within a week of release. If you specifically want Game Ready Driver behavior, NVIDIA's enterprise GRD branch is available but rolls roughly 2-3 weeks behind the consumer branch.

Products mentioned in this article

Tap any product for full specs, live Amazon & eBay pricing, and alternatives.

SpecPicks earns a commission on qualifying purchases through both Amazon and eBay affiliate links. Prices and stock update independently.

Frequently asked questions

Why is the NVIDIA RTX A6000 still relevant in 2026?
The A6000 carries 48 GB of GDDR6 on a single PCIe slot — twice the VRAM of an RTX 5090, four times an RTX 4080 Super. That's the difference between running Llama 3.3 70B Q4_K_M cleanly on one card versus offloading layers to system RAM and watching throughput collapse to 4-6 tok/s. It is also the only sub-$5K NVIDIA card with NVLink, which makes a two-card 96 GB build straightforward. The Ampere architecture is two generations behind Blackwell, but the VRAM and the NVLink keep it competitive for local 70B inference well into 2027.
What are the primary use cases for the NVIDIA RTX A6000?
Local LLM inference at the 70B class is the killer use case in 2026 — Llama 3.3 70B, Qwen 3.6 72B, Mistral 70B all run cleanly. Beyond that: heavy 3D rendering (Blender, V-Ray, Octane), CAD with large assemblies, fine-tuning small/mid models with LoRA, and ML research that benefits from full BF16 weight loads. Two A6000s with NVLink also serve as a budget alternative to a single H100 80 GB for 405B-class models at Q3.
How does the RTX A6000 compare to the RTX 5090 for AI workloads?
On any model that fits in 32 GB, the RTX 5090 is faster — its 1,792 GB/s memory bandwidth and Blackwell tensor cores trounce the A6000's 768 GB/s and Ampere tensor cores. For models that need more than 32 GB, the 5090 has to offload to system RAM, which kills throughput. The A6000 keeps everything on-card and wins despite the older silicon — typical numbers: A6000 holds 18 tok/s on Llama 3.3 70B Q4 while a 5090 with offload drops to 5-7 tok/s on the same workload.
Is the NVIDIA RTX A6000 suitable for gaming?
It runs games competently — performance is roughly an RTX 3090 Ti, so 60-90 fps at 4K Ultra in most AAA titles — but it is not a smart gaming buy. The driver branch (NVIDIA Studio) is conservative compared to Game Ready drivers; the cooler is a blower-style design tuned for workstations, not thermals; and there is no DLSS 4 frame generation acceleration. If gaming is the only use case, an RTX 4080 Super or 5070 Ti gives more frames for half the money.
What are the advantages of using NVLink with two RTX A6000 cards?
Two A6000s with the A6000 NVLink bridge expose 96 GB of pooled VRAM at 112 GB/s peer-to-peer. That unlocks Llama 3.1 405B at Q3 quantization on a single workstation — no PCIe-bound tensor parallel, no all-reduce penalty. Latency stays inside what feels like a single-GPU chat. The catch: only the older A6000 bridge works (the L40 lacks NVLink, the RTX PRO 6000 Blackwell dropped it). If you want 96 GB-class single-machine inference under $5K in 2026, a pair of used A6000s is the path.

Sources

— SpecPicks Editorial · Last verified 2026-06-11

NVIDIA GeForce RTX 5090
NVIDIA GeForce RTX 5090
$4249.99
View price →

More guides & deep dives from the SpecPicks archive

Browse all articles & guides →

More reviews from the SpecPicks archive

Browse all reviews →

More buying guides from SpecPicks

Browse all buying guides →