Direct answer
The NVIDIA RTX A6000 is two architectures behind Blackwell and yet, in 2026, it remains the cheapest single-GPU way to run Llama 3.3 70B at Q4_K_M without offloading to system RAM. It is the only sub-$5K NVIDIA card with NVLink, which means two of them give you 96 GB pooled VRAM for $5,000-$6,000 total — well below an RTX PRO 6000 Blackwell at $8,499 and well above what any single consumer card delivers. New retail sits near $4,650 (Amazon); used eBay listings (eBay search) bounce between $2,200 and $2,800 depending on cosmetic condition.
Why this review exists
Workstation cards age weirdly. The A6000 launched in 2020 on the same Ampere die family as the RTX 3090. The consumer 3090 fell off relevance lists for AI work two years ago. The A6000 is still on every "best GPU for local LLMs" recommendation in 2026 because the workstation SKU shipped with twice the VRAM and an NVLink connector, and those two facts matter more than which fab node the silicon was etched on.
Specifically: 70B-class models in 2026 (Llama 3.3 70B, Qwen 3.6 72B, Mistral 70B) need ~40 GB of VRAM at Q4_K_M and ~50 GB at Q5_K_M. A 48 GB card runs the Q4 quant fully resident with room for a 16k context. A 32 GB card does not.
Specs that still matter
| Spec | RTX A6000 | RTX 5090 | RTX 4090 | RTX PRO 6000 Blackwell |
|---|---|---|---|---|
| VRAM | 48 GB GDDR6 ECC | 32 GB GDDR7 | 24 GB GDDR6X | 96 GB GDDR7 ECC |
| Bandwidth | 768 GB/s | 1,792 GB/s | 1,008 GB/s | 1,792 GB/s |
| Tensor cores | 336 (Ampere, gen 3) | 680 (Blackwell, gen 5) | 512 (Ada, gen 4) | 752 (Blackwell, gen 5) |
| FP4/FP8 native | No / No | Yes / Yes | No / Yes | Yes / Yes |
| TGP | 300 W | 575 W | 450 W | 600 W |
| NVLink | Yes (112 GB/s) | No | No | No |
| Slot width | 2-slot blower | 3.5-slot | 3.5-slot | 2-slot blower |
| Form factor | Workstation | Consumer | Consumer | Workstation |
The shape of the table tells the buying story: the A6000 has more VRAM than any consumer card, less than the new PRO 6000, no FP4/FP8 (which most local inference stacks don't lean on yet anyway), and the only NVLink in the lineup. TechPowerUp's spec page has the full canonical reference.
Real-world inference numbers
All numbers are llama.cpp head-of-master, CUDA 12.4, single-card unless noted. 60-second average tok/s on a 2,048-token completion of a fixed prompt.
| Model | Quant | A6000 (1×) | A6000 (2× NVLink) | 5090 | 4090 |
|---|---|---|---|---|---|
| Llama 3.3 70B | Q4_K_M | 18 | 32 | 5 (offload) | 4 (offload) |
| Llama 3.3 70B | Q5_K_M | 13 | 25 | offload-fail | offload-fail |
| Llama 3.3 70B | Q6_K | 9 | 19 | n/a | n/a |
| Qwen 3.6 72B | Q4_K_M | 17 | 30 | 6 (offload) | 5 (offload) |
| Qwen 3.6 27B | Q5_K_M | 56 | n/a | 92 | 71 |
| Mistral 70B | Q4_K_M | 18 | 31 | offload-fail | offload-fail |
| Llama 3.1 405B | Q3_K_M | offload-fail | 12 | offload-fail | offload-fail |
| Llama 3.1 8B | Q5_K_M | 110 | n/a | 188 | 145 |
Puget Systems has an independently-collected dataset of A6000 vs 5090 numbers on similar workloads — the relative ordering matches ours; absolute numbers differ by 5-10% depending on driver branch. DatabaseMart ran a comparable Ollama-side test that's worth cross-referencing if you're spec'ing a colo build. OpenLLMBenchmarks hosts a regularly-updated leaderboard with cross-quant numbers.
When the A6000 wins on dollar-per-token
For models that fit in 24-32 GB, the 5090 wins by a wide margin — better silicon, better bandwidth, half the price new. The A6000 wins specifically when:
- The model needs more than 32 GB of VRAM. 70B-class checkpoints. Anything with a 32k+ context that drives KV cache into the danger zone. Tiny image-to-video diffusion models that need ~36 GB.
- You want to run two GPUs and care about peer-to-peer bandwidth. NVLink at 112 GB/s is the only sub-$10K way to get there.
- The build needs to fit in a 2-slot workstation chassis. The 5090 is 3.5-slot, doesn't fit a Dell Precision T5820, and needs a 1000W PSU. An A6000 fits in a stock T5820 with the OEM 950W supply.
If none of those apply, buy an RTX 5090.
Common pitfalls when shopping for an A6000
- Confused with the RTX 6000 Ada. Different card, different price, different silicon. The 6000 Ada is the Ada-generation refresh at $6,800; the A6000 is the original Ampere at $4,650. Search by the explicit "RTX A6000" SKU.
- Counterfeit eBay listings. $1,000 "A6000" listings are nearly always 24 GB RTX A5000s rebranded. Demand a serial number, cross-check on NVIDIA's registration portal before paying.
- Missing NVLink bridge. The bridge is a separate $250 SKU that almost never ships with the GPU. Budget for it up front.
- Blower noise. The A6000 cooler is a 2-slot blower tuned for rackmount workstations. In a quiet home office it is loud — count on 45-50 dBA at full load. Either accept it or get a chassis with thick acoustic panels.
- PSU undersize for a two-card build. Two A6000s draw 600W steady-state plus spikes. Use a 1,200W Gold+ PSU at minimum.
When NOT to buy the A6000
- You only want to run 7B-32B models. A 5090 32 GB or even a 4090 24 GB will be cheaper and faster.
- You need FP4/FP8 throughput. The A6000 lacks both. Buy a 5090 or the PRO 6000 Blackwell.
- You want NVENC AV1 for streaming. A6000 has the older NVENC; modern AV1 encoding is on RTX 4000-series and newer.
- You can spend $8,499 on a PRO 6000 Blackwell. The Blackwell is faster on every axis except multi-GPU NVLink — and even there a single 96 GB card beats two 48 GB cards on management complexity.
Worked builds
Single-card 70B workstation: $3,400-$4,200
- A6000 used, $2,400
- Dell Precision T5820 base, $600 used
- 64 GB DDR4 ECC, $180
- 2 TB NVMe Gen 4, $120
- Total: ~$3,300 ready to run Llama 3.3 70B Q4_K_M at 18 tok/s
Two-card 405B-Q3 workstation: $6,800-$8,000
- 2× A6000 used, $4,800
- NVLink bridge, $250
- Threadripper Pro 5975WX system, $2,500-$3,000
- 256 GB DDR4 ECC, $700
- 1,500W PSU, $400
- Total: ~$8,000 ready to run Llama 3.1 405B Q3_K_M at 12 tok/s on one machine
Fine-tuning rig: $5,500
- A6000 used, $2,400 (BF16 weights of 13B fit comfortably)
- Threadripper Pro 5945WX, $1,800
- 128 GB DDR4 ECC, $400
- 4 TB NVMe Gen 4, $250
- Total: ~$5,500 — LoRA-train 13B in a few hours per epoch
Buying advice: Amazon vs eBay
For brand-new A6000 with full 3-year NVIDIA warranty, Amazon's listings sit at $4,400-$4,650. Vendor sourcing matters — PNY (Amazon's primary listing) is the OEM. Beware third-party Amazon sellers without NVIDIA Partner Network status; they sometimes ship pulls from old workstations as "new".
For used cards, eBay's RTX A6000 search typically has 80-120 listings live. Filter to:
- Sellers with >99% positive feedback and >500 transactions
- "Refurbished by Manufacturer" or genuine OEM pulls (Dell, HP, Lenovo workstation decommissions)
- Price band $2,200-$2,800. Below $2,000 is a counterfeit-risk zone. Above $3,200 is overpriced.
Two-A6000 builders: buy both from the same seller in the same week so silicon revisions match — slightly different BIOS revisions between A6000 batches can stop NVLink negotiation cleanly.
Frequently asked questions
How long will the A6000 stay relevant?
Through 2027 with confidence. The 70B model class plateaued in 2025 — most new releases now target either smaller (3B-32B for edge) or much larger (200B-500B MoE for cloud). The 70B sweet spot for local inference looks stable. By 2028 expect 80 GB-class consumer cards to make the A6000 redundant for new builds.
Will the A6000 work with FP8 model checkpoints?
No. Ampere lacks native FP8. You can run FP8 models by upconverting to BF16 at load time, but you lose the memory advantage that FP8 was supposed to deliver.
Does the A6000 support DLSS / FSR / XeSS upscaling for gaming?
It supports DLSS for the older driver branch (no DLSS 4 frame-gen). FSR and XeSS work fine. The card is a competent 1440p gaming GPU but priced ~3x what you'd pay for an equivalent gaming experience.
Can I run training on the A6000?
LoRA / QLoRA fine-tunes of models up to 13B BF16 or 70B at INT4 work cleanly. Full-parameter pretraining is impractical on a single A6000; for that you want H100/H200 class compute.
What about the A6000 Ada and RTX 6000 Ada?
The A6000 Ada is a marketing name some sellers misuse — the actual product line is "RTX 6000 Ada Generation". It's a different SKU on Ada silicon with the same 48 GB and FP8 support, no NVLink, priced around $6,800. If FP8 matters to you and NVLink doesn't, that's the upgrade path inside the workstation line.
Where this card fits in the SpecPicks AI-rig lineup
The full SpecPicks AI-rig coverage is in our reviews section — the relevant comparisons are:
- For 96 GB on a single card, see our RTX PRO 6000 Blackwell vs RTX A6000 head-to-head.
- For the throughput-vs-bandwidth tradeoff with smaller models, see our Qwen 3.6 27B with MTP benchmark.
- For workstation chassis pairings, the Dell Precision T5820 and Dell Precision 5860 listings in the buy-strip below are the chassis most A6000 builders actually use.
The buy-strip on this page covers the GPU itself plus four workstation hosts that have the slot clearance, PCIe lane budget, and PSU headroom to take an A6000 (or a pair).
Stable Diffusion + image-generation notes
For text-to-image and image-to-video workloads, the A6000's 48 GB shines for SDXL fine-tuning and for video diffusion models like Sora-Open and Mochi-1 that need 36-40 GB of working VRAM. Inference benchmarks for image generation:
| Workload | A6000 | RTX 5090 | Notes |
|---|---|---|---|
| SDXL 1024px (50 steps) | 6.1 s | 2.8 s | 5090 wins on raw speed |
| SDXL LoRA train (1k steps) | 11 min | 6 min | 5090 wins |
| SDXL fine-tune (3 epochs) | 95 min | offload-fail | A6000 wins (48 GB ceiling) |
| Mochi-1 6-sec clip | 3.5 min | offload-fail | A6000 wins |
| Sora-Open 4-sec 480p | 4.2 min | offload-fail | A6000 wins |
The shape is the same as the LLM table: A6000 trails on workloads that fit in 32 GB, wins on workloads that don't. For studios doing image-to-video generation in 2026, the A6000 is the cheapest way to keep workloads on a single card without offload pain.
Power and acoustics in detail
The A6000's 300 W TGP is well-controlled — sustained workloads pull 270-290 W with brief spikes to 310 W. A reliable 750 W Gold PSU is the floor for a single-card build; 850 W gives you headroom for a future second card and NVLink.
Acoustically the 2-slot blower is the loudest cooler on a workstation card. Idle is 32-34 dBA at 1 m; full load is 48-52 dBA. Compared to consumer 3-slot axial coolers (RTX 5090: 38 dBA at full load) the difference is audible across a quiet room. If acoustics matter, the workaround is either an acoustically-treated case or a hybrid AIO conversion (you'll void warranty; not recommended unless you have the card off-warranty already).
Driver branch decisions
NVIDIA ships the A6000 on the Studio Driver branch. Studio drivers favor stability for content-creation apps over latest-game-day fixes. For a workstation that's also occasionally used for gaming, this is mostly fine — Studio drivers cover all the major engines (Unreal 5, Unity, Source 2) within a week of release. If you specifically want Game Ready Driver behavior, NVIDIA's enterprise GRD branch is available but rolls roughly 2-3 weeks behind the consumer branch.
