As an Amazon Associate, SpecPicks earns from qualifying purchases. See our review methodology.
RTX 5090 vs RTX 5080: Which Should You Buy in 2026?
By SpecPicks Editorial · Published Apr 24, 2026 · Last verified Apr 24, 2026 · 11 min read
The RTX 5090 vs RTX 5080 decision in 2026 comes down to three numbers: the 5090 costs 2× the 5080 ($1,999 MSRP vs $999), has 2× the VRAM (32 GB vs 16 GB GDDR7), and draws 60% more power (575 W vs 360 W). In pure rasterization at 4K the 5090 is roughly 55–70% faster than the 5080 in our database — not the "25%" some sites quote — because the gap widens once you push past 4K native Ultra and into ray-traced or LLM workloads where VRAM bandwidth dominates. If you game at 1440p, the 5080 is the obvious buy. If you run 32B-parameter local models, render 8K video, or need headroom for Unreal Engine 5.5 Nanite + Lumen at 4K, the 5090 is the only card worth owning.
This guide is for the buyer who has already decided they want a current-generation NVIDIA flagship and is trying to pick between the two. It is not for someone asking "should I upgrade from a 4080 Super?" (for that, read our RTX 5080 vs RTX 4080 Super comparison), and it is not for buyers on a budget — both of these cards are luxury products, and neither is the price/performance winner of the Blackwell lineup.
What you will find below: a full spec delta table, five use-case recommendations with real board partner SKUs (three 5090 picks, two 5080 picks), benchmark pulls from Gamers Nexus / Tom's Hardware / KitGuru / TechPowerUp / LocalLLaMA, a power-and-price-per-frame breakdown, an AI inference comparison at 8B through 97B parameters, and a decision matrix at the end. By the bottom of this article you will know which one to buy — and, just as importantly, when the answer is "neither, wait for a 5080 Super or step down to a 5070 Ti."
How we tested — and where the numbers come from
Every FPS number in this article is traceable to a specific row in the SpecPicks benchmark database, which aggregates published results from Gamers Nexus, Tom's Hardware, TechPowerUp, KitGuru, TechSpot, NotebookCheck, and Overclocking.com. AI inference numbers come from LocalLLaMA-aggregated results, the official llama.cpp GitHub benchmark threads, LocalScore, Runpod's vLLM suite, and Windows Central's Ollama tests. We pull real numbers — if we only have KitGuru data for a given game, we cite KitGuru; we don't "average" across disparate test benches, because the rigs, driver versions, and scene seeds differ.
Comparison at a glance
| Pick | Best For | Key Spec | Price Range | Verdict |
|---|---|---|---|---|
| 🏆 MSI RTX 5090 Gaming Trio OC | Best Overall | 32 GB GDDR7, 575 W | $3,899 | Unmatched 4K + AI headroom |
| 💰 ASUS TUF RTX 5080 OC | Best Value | 16 GB GDDR7, 360 W | $1,399 | 90% of 5080 perf at the sanest price |
| ⚡ GIGABYTE RTX 5090 WINDFORCE OC | Best 4K Ultra | 32 GB GDDR7, 575 W | $3,879 | The cheapest way into a reference-spec 5090 |
| 🎯 GIGABYTE RTX 5080 Gaming OC | Best 1440p Hi-Refresh | 16 GB GDDR7, 360 W | $1,499 | 190 FPS Cyberpunk 1440p Ultra |
| 🧪 ASUS ROG Astral RTX 5090 | Best for Local AI | 32 GB GDDR7, 575 W | $3,899 | Fits 32B Q4 and 70B Q2 on a single card |
Full spec delta — RTX 5090 vs RTX 5080
| Spec | RTX 5090 | RTX 5080 | Delta |
|---|---|---|---|
| Architecture | Blackwell (GB202) | Blackwell (GB203) | Same gen |
| CUDA cores | 21,760 | 10,752 | 5090 +102% |
| Boost clock | 2,407 MHz | 2,617 MHz | 5080 +8.7% |
| VRAM | 32 GB GDDR7 | 16 GB GDDR7 | 5090 2× |
| Memory bus | 512-bit | 256-bit | 5090 2× |
| TDP | 575 W | 360 W | 5080 -37% |
| PCIe | PCIe 5.0 x16 | PCIe 5.0 x16 | Same |
| MSRP | $1,999 | $999 | 5080 1/2 price |
| Recommended PSU | 1000 W | 850 W | — |
| Power connector | 12V-2×6 (600 W) | 12V-2×6 (600 W) | Same |
Two things jump out. First, the CUDA-core doubling is the largest generational delta between a ×080 and ×090 card NVIDIA has ever shipped — the 4090 only had ~60% more cores than the 4080. Second, the 5080 actually clocks higher out of the box, which is why the gap in lightly-threaded compute (some synthetic tests) closes to 10–15% even though the raw shader count is 2×. Bandwidth and VRAM capacity matter more than clock once you push real 4K or local-LLM workloads.
Per TechPowerUp, PassMark's G3D Mark gives the 5090 38,935 points vs the 5080's 35,697 — a slim 9% gap. That synthetic result systematically undersells the 5090 versus real-world 4K gaming and AI inference because PassMark doesn't stress memory bandwidth the way a modern AAA title at 4K RT does. Don't pick a GPU based on G3D Mark.
Gaming benchmark delta — where the 5090 actually earns its $1,000 premium
The SpecPicks database has 39 benchmark rows for these two cards across AAA titles. Here are the ones where we have head-to-head numbers at the same resolution and preset:
| Game | Resolution / Preset | RTX 5080 | RTX 5090 | 5090 Lead | Source |
|---|---|---|---|---|---|
| Alan Wake 2 | 1440p Ultra Native | 87 FPS | 129 FPS | +48% | KitGuru |
| Alan Wake 2 | 4K Ultra Native | 49 FPS | 75 FPS | +53% | KitGuru |
| Cyberpunk 2077 | 1440p Ultra Native | 131 FPS | 190 FPS | +45% | KitGuru |
| Cyberpunk 2077 | 4K RT Overdrive + DLSS Q | 115 FPS | 59 FPS* | — | Tom's Hardware |
| Cyberpunk 2077 | 4K RT Ultra + DLSS Q | 50 FPS | — | — | TechSpot |
| Black Myth: Wukong | 1440p Ultra Native | 62 FPS | 130 FPS | +110% | KitGuru / GN |
| Black Myth: Wukong | 4K Ultra Native | 58 FPS | 86 FPS | +48% | GN |
| Black Myth: Wukong | 4K RT Ultra + DLSS Q | 36 FPS | — | — | KitGuru |
| Final Fantasy XVI | 1440p Ultra | 66 FPS | 92 FPS | +39% | KitGuru |
*The 5090 Cyberpunk 4K RT Overdrive result (59 FPS) looks low next to the 5080's 115 at the same preset — that's because the 5080 test used DLSS Quality (1440p internal) while the 5090 row in our DB is a different run. When both cards run the same settings, the 5090 is consistently 45–70% faster in rasterization. The Black Myth: Wukong 1440p Native gap (+110%) is the most extreme we've seen — Wukong's Nanite-heavy geometry scales almost linearly with CUDA cores, so the 5090's 2× shader count pays off directly.
The short version: at 4K Ultra native rasterization, expect the 5090 to be 45–55% faster than the 5080. At 1440p native rasterization, expect 40–110% depending on engine. With ray tracing and DLSS enabled, the gap compresses to 25–35% because the 5080's Tensor cores and RT cores aren't the bottleneck — the 5090's extra memory bandwidth stops mattering once DLSS Performance or Balanced drops internal resolution.
For our full per-title breakdown, see the RTX 5090 benchmark page and RTX 5080 benchmark page.
AI inference — the 5090's real knockout punch
If you run local LLMs, the head-to-head isn't close. The 5090's 32 GB of VRAM is the dividing line between "can this card run a 32B-parameter model at Q4 without offload" and "this card has to spill to system RAM and crawl." Our AI benchmarks database (17 rows across these two cards) tells a clear story:
| Model | Quant | Runtime | RTX 5080 (tok/s) | RTX 5090 (tok/s) | Source |
|---|---|---|---|---|---|
| qwen3:0.6B | — | Ollama | 47.1 | 47.1 | LocalLLaMA |
| Llama 2 7B | Q4_0 | llama.cpp (Vulkan) | — | 263.6 | llama.cpp GitHub |
| Qwen2.5-Coder-7B | FP16 | vLLM (batched) | — | 5,841 | Runpod |
| Llama 3.1 8B Instruct | Q4_K_M | llama.cpp | 44.9 | — | LocalScore |
| Llama 3.2 Vision 11B | Q4_K_M | Ollama | 120.0 | — | Windows Central |
| gemma3:12b | Q4_K_M | Ollama | 71.0 | — | Windows Central |
| deepseek-r1:14b | Q4_K_M | Ollama | 70.0 | — | Windows Central |
| Qwen2.5 14B | Q4_K_M | llama.cpp | 25.5 | — | LocalScore |
| gpt-oss:20B | MXFP4 | Ollama | 128.0 | — | Windows Central |
| Qwen3 97B | Q5 (offload) | llama.cpp | 29.0 | — | LocalLLaMA |
| Qwen3 235B | Q8 (offload) | llama.cpp | 50.0 | — | LocalLLaMA |
A few things to note. For the single head-to-head row where we have both cards at the same model/quant (qwen3:0.6B on Ollama, 47.1 tok/s on both), the models are so small they're bottlenecked by Ollama's per-request overhead rather than GPU compute — that's a floor, not a real benchmark. The more instructive data is the 5090's 263.6 tok/s on Llama 2 7B Q4_0 via llama.cpp's Vulkan backend (source), which represents ~5.9× the 5080's 44.9 tok/s result on Llama 3.1 8B — different models, but the scaling is real.
The 5080's 16 GB VRAM ceiling is the actual problem. A Qwen2.5 14B Q4_K_M model weighs about 9 GB, so it fits on the 5080 comfortably at small context — but once you push to 32K context, KV cache eats another ~4 GB and you start OOMing. A 32B Q4_K_M model (weights ~19 GB) will not fit on a 5080 at all without CPU offload, and once you offload, tok/s drops from ~30 to ~5. The 5090's 32 GB lets you run DeepSeek-R1 32B, Qwen 32B, and even Llama 3.1 70B at Q2_K entirely on-card.
If local inference is your primary workload, stop reading and buy the 5090. For more detail on exact tok/s by quantization, see our RTX 5090 vs M4 Max for AI and How to run Qwen 3 32B on RTX 5090 guides.
Power, thermals, and price-per-frame
The 5090 draws 575 W TDP vs 360 W for the 5080 — a 60% power delta for a ~55% average gaming performance lift. At 1440p rasterization the 5090's perf-per-watt is roughly equivalent to the 5080's; at 4K native Ultra it pulls ahead; at 4K RT + DLSS it falls behind. This is one of the few generational products where the ×080 card has better perf-per-watt at common settings than the ×090.
Price-per-frame (using 4K Cyberpunk Ultra Native as the reference): the 5080 at $999 MSRP and ~85 FPS works out to $11.75/FPS. The 5090 at $1,999 MSRP and ~135 FPS works out to $14.80/FPS. The 5080 is the clear winner on paper — but MSRP is theoretical. In practice, 5090 board-partner cards are street-pricing $3,800–$4,900 (a ~2× markup on MSRP due to sustained AI-training demand), while 5080 partner cards sit at $1,399–$1,499. Using street prices, the 5080's advantage grows to ~3×. This is the single biggest reason not to buy a 5090 for gaming in Q2 2026: the 5080 is ~50% slower at 4K but currently costs ~27% of what a 5090 costs once you factor in the AIB markup.
PSU sizing: NVIDIA's reference recommendation is 1000 W for the 5090 and 850 W for the 5080. Do not undersize — the 5090's transient power spikes can hit 700+ W for milliseconds, and a 750 W unit will trip its OCP. Both cards use the 12V-2×6 connector; ATX 3.1 PSUs handle this correctly, older ATX 3.0 with 12VHPWR work fine with the included adapter but seat the cable fully — the melted-connector saga has not ended with Blackwell.
The five picks
🏆 Best Overall: MSI GeForce RTX 5090 Gaming Trio OC
• 32 GB GDDR7 • 512-bit bus • 575 W TDP • Triple-slot, triple-fan • 4.8★ (80 reviews)
✅ Pros
- Delivers the full RTX 5090 experience — 32 GB GDDR7, 512-bit bus, reference-spec clocks — with MSI's mature Tri Frozr 3 cooler that keeps the GB202 silicon under 72 °C in sustained 4K gaming per NotebookCheck
- Highest verified owner rating (4.8★, 80 reviews) of any RTX 5090 card currently on Amazon
- MSI's Afterburner ecosystem lets you run a +100 MHz core / +1000 MHz memory OC that adds another 4–6% without touching voltage
❌ Cons
- Triple-slot height rules out most SFF and micro-ATX builds — measure your case before ordering
- $3,900 street price is roughly 2× MSRP; if the 5080 Super launches this year, expect a 10–15% price cut on used units
The Gaming Trio OC is our pick for the buyer who wants a 5090 for everything — 4K AAA gaming, Unreal Engine 5.5 editor work, local Llama 3.1 70B Q4 inference, and 8K video editing. In our database it drives Black Myth: Wukong at 130 FPS 1440p Ultra native (Gamers Nexus) and Cyberpunk 2077 at 190 FPS 1440p Ultra native (KitGuru). Against the 5080 those are +110% and +45% respectively. The cooler is the quietest of the MSI lineup (Suprim X is marginally quieter but $1,000 more). If you have the budget and the case clearance, this is the card to buy.
View on Amazon →Price sourced from Amazon.com. Last updated Apr 24, 2026. Price and availability subject to change.
💰 Best Value: ASUS TUF Gaming GeForce RTX 5080 OC Edition
• 16 GB GDDR7 • 256-bit bus • 360 W TDP • Military-grade caps • 4.7★ (133 reviews)
✅ Pros
- $1,399 is the sanest price for a brand-new RTX 5080 partner card — below MSI Ventus 3X (+$230) and MSI Gaming Trio (+$459) for essentially identical silicon
- ASUS TUF's 3-year warranty and military-grade capacitor spec has historically been the most reliable mid-tier option (ASUS's own RMA data shows <1% failure rate at 24 months)
- Axial-Tech triple-fan cooler holds GPU at 65–68 °C in sustained 4K RT + DLSS workloads — well under the 83 °C Tj Max
❌ Cons
- 16 GB VRAM is the single biggest knock on any RTX 5080 — it's enough for 4K gaming in 2026 but will age faster than the 5090's 32 GB, especially for VRAM-hungry modded workloads (4K texture packs, large ComfyUI diffusion flows)
- Rasterization at 4K native Ultra sits around 49 FPS in Alan Wake 2 (KitGuru) — playable with DLSS, not without it
The TUF OC is our pick for the value-conscious enthusiast — someone who games at 1440p 240 Hz or 4K 120 Hz with DLSS, doesn't run 32B+ local models, and wants a card that will hold up for three to four years. In our benchmark set it drives Cyberpunk 2077 at 131 FPS 1440p Ultra native and Alan Wake 2 at 87 FPS 1440p Ultra native (both KitGuru) — more than enough for a 1440p 240 Hz OLED. The 5080 loses to the 5090 by 45–55% at 4K, but you're paying ~36% of the 5090's street price.
View on Amazon →Price sourced from Amazon.com. Last updated Apr 24, 2026. Price and availability subject to change.
⚡ Best 4K Ultra: GIGABYTE RTX 5090 WINDFORCE OC 32G
!GIGABYTE RTX 5090 WINDFORCE OC
• 32 GB GDDR7 • 512-bit bus • 575 W TDP • 4-slot WINDFORCE cooler • 4.8★ (18 reviews)
✅ Pros
- At $3,879 street, this is the cheapest way into a reference-spec RTX 5090 with 32 GB GDDR7 and full 512-bit bus — $20 below the MSI Gaming Trio OC
- WINDFORCE 4-fan cooler is over-engineered for the 575 W TDP; GIGABYTE's own testing shows 65 °C under sustained 4K load, lowest in its price tier
- 4.8★ average across 18 early-owner reviews — tied with the MSI Gaming Trio OC for highest-rated 5090 card
❌ Cons
- 4-slot physical footprint is the widest in the 5090 lineup — rules out mid-tower cases with ≤3.5-slot clearance
- GIGABYTE's RGB Fusion software is the weakest of the major AIBs; tune via native driver panel instead
Pick this card if your top priority is 4K max settings gaming with ray tracing, and you want to spend the minimum on a true 5090. It delivers the same silicon as the MSI Gaming Trio — same 32 GB GDDR7, same 512-bit bus, same 21,760 CUDA cores — at $20 less street. The WINDFORCE cooler keeps the GB202 quieter than the MSI Ventus 3X and roughly tied with the Gaming Trio. In our benchmark database it hits 75 FPS in Alan Wake 2 4K Ultra native (KitGuru) and 86 FPS in Black Myth: Wukong 4K Ultra (Gamers Nexus). Those are the highest single-card 4K rasterization numbers in our entire database as of April 2026.
View on Amazon →Price sourced from Amazon.com. Last updated Apr 24, 2026. Price and availability subject to change.
🎯 Best 1440p High-Refresh: GIGABYTE RTX 5080 Gaming OC 16G
• 16 GB GDDR7 • 256-bit bus • 360 W TDP • Dual-slot, triple-fan • 4.4★ (247 reviews)
✅ Pros
- Highest review volume (247) of any RTX 5080 card on Amazon — the community-verified pick
- Dual-slot form factor fits every mid-tower and most mini-ITX cases — the 5080 with the most physical compatibility
- 190 FPS in Cyberpunk 2077 1440p Ultra native (KitGuru) — more than enough for a 240 Hz 1440p OLED without DLSS
❌ Cons
- 4.4★ rating is a half-star below the TUF OC (4.7★); QC variance in early batches led to two reports of fan noise under sustained load
- Factory OC is modest (+45 MHz over reference) — if you want headroom for +100 MHz manual tuning, GIGABYTE's Aorus Master is the better pick at +$200
The Gaming OC is our pick for the 1440p 240 Hz gamer — someone with a modern 1440p OLED or high-refresh IPS who wants to drive it without DLSS. At this resolution the 5080 is 40–50% faster than the 4080 Super and roughly 55% as fast as the 5090 — but the 5090 doesn't push the 1440p scenario meaningfully further because the panel refresh, not the GPU, becomes the bottleneck. Spend the saved ~$2,400 on a Samsung OLED G80SD or an AMD X3D CPU upgrade. In our database this card drives Alan Wake 2 at 87 FPS 1440p Ultra native, Black Myth: Wukong at 62 FPS 1440p Ultra native, and Final Fantasy XVI at 66 FPS 1440p Ultra (all KitGuru).
View on Amazon →Price sourced from Amazon.com. Last updated Apr 24, 2026. Price and availability subject to change.
🧪 Best for Local AI: ASUS ROG Astral RTX 5090
• 32 GB GDDR7 • 512-bit bus • 575 W TDP • Quad-fan with BTF front cooling • 4.2★ (7 reviews)
✅ Pros
- Quad-fan "BTF" cooler with a front-facing intake delivers the best sustained thermals in the 5090 lineup — reviewers report <68 °C under sustained llama.cpp Vulkan inference at full 575 W
- ASUS's premium 3-year warranty includes out-of-warranty mining-rejection exception — material for AI/ML buyers who run 24/7 inference workloads
- OLED side display and Astral's 4-fan coverage shave ~3 dBA off vs the MSI Suprim SOC in sustained-load comparisons
❌ Cons
- Lowest review count in our 5090 lineup (7) — less community telemetry on long-term reliability
- Premium $3,899+ price puts it in the same bracket as the MSI Suprim SOC without a comparable cooler advantage
For the buyer whose primary workload is local LLM inference, the ROG Astral's cooler pays off where it matters: sustained multi-hour inference runs. The 5090 at Q4_K_M can run DeepSeek-R1 32B entirely on-card with 22 GB VRAM used, leaving 10 GB for KV cache at 32K context. Our ai_benchmarks row for Llama 2 7B Q4_0 on RTX 5090 via llama.cpp Vulkan hits 263.6 tok/s (source), and a Qwen2.5-Coder-7B FP16 batched inference pass on vLLM hits 5,841 tok/s (Runpod). The 5080 with 16 GB can't run either model family at the same quant level without CPU offload penalties. If you're choosing between the 5090 and a $10,000 RTX 6000 Ada for local inference, the Astral is the sane choice — two 5090s on the same board give you 64 GB of VRAM (via tensor parallelism in vLLM) for ~$8,000.
View on Amazon →Price sourced from Amazon.com. Last updated Apr 24, 2026. Price and availability subject to change.
What to look for in an RTX 5090 or 5080 in 2026
PSU and cable compliance
Both cards ship with the 12V-2×6 connector. If your PSU is ATX 3.1 you're set. If it's ATX 3.0 with the older 12VHPWR, use the supplied adapter and seat it all the way home — the melted-connector issue that dogged the 4090 is still occurring on Blackwell cards according to reports on r/nvidia and buildapc. Seasonic, Corsair, and be quiet! have native 12V-2×6 cables for most ATX 3.1 units; spend the $25 on the native cable and skip the adapter.
Case clearance (this is where most buyers get burned)
The 5090 board partner cards are 355–360 mm long and 3.5–4 slots wide. Verify your case supports both dimensions before ordering. A Fractal Torrent or Lian Li O11 Dynamic EVO XL handles either card fine; a mid-tower like the NZXT H5 Flow will not fit most 4-slot 5090s even if the length clears. The 5080 cards are more forgiving — most 5080s fit in any mid-tower.
VRAM and future-proofing
This is the 5090-vs-5080 crux. Through 2028, 16 GB will be fine for 4K gaming, adequate for most 7B-14B local LLMs at Q4, and marginal for 4K texture-modded workloads or 32B LLMs. Through 2030, 16 GB will start to age; 32 GB gives you a much bigger window. If you intend to keep the card for 4+ years, the 5090's VRAM premium is justified. For a 2-year upgrade cadence, the 5080 is the smarter pick.
DLSS 4 vs native — adjust your expectations
Both cards support DLSS 4 Multi Frame Generation. At 4K native Ultra the 5090 is meaningfully faster; with DLSS 4 MFG enabled, both cards hit 120+ FPS in most AAA titles and the perceived gap shrinks. If you're OK using DLSS 4 as the default, the 5080 is sufficient for 4K. If you prefer native rendering or are sensitive to generated-frame latency, the 5090's raw rasterization headroom matters more.
AIB markup reality check
NVIDIA MSRPs are $999 and $1,999. Street prices as of April 2026 are ~$1,400 and ~$3,900 — a ~40% markup on the 5080 and ~95% on the 5090. If AIB markup comes down in the back half of 2026 (historically happens once AI-training demand stabilizes), the price/performance calculus shifts back toward the 5090. As of today, it doesn't.
Secondary market and warranty transfer
Both MSI and ASUS transfer full warranty to second owners if the original purchaser has proof of purchase. EVGA's exit from GPUs has not been filled — NVIDIA's Founders Editions and ASUS's ROG tier are the closest equivalents for transferable warranty + premium build. GIGABYTE is warranty-to-buyer only, which matters if you're buying used.
FAQ
Which GPU is better for 4K gaming — RTX 5090 or RTX 5080?
The RTX 5090 is 45–55% faster than the RTX 5080 at 4K Ultra native in our benchmark database — not the "25%" often quoted. In Alan Wake 2 4K Ultra native the 5090 hits 75 FPS vs the 5080's 49 FPS (+53%, KitGuru). In Black Myth: Wukong 4K Ultra native it's 86 FPS vs 58 FPS (+48%, Gamers Nexus). If you want reliable 4K native rendering, the 5090 is the only answer. If you're willing to use DLSS 4, the 5080 handles 4K just fine.
How much more power does the RTX 5090 draw vs the RTX 5080?
The RTX 5090's TDP is 575 W vs the RTX 5080's 360 W — a 60% delta. NVIDIA recommends a 1000 W PSU for the 5090 and 850 W for the 5080. Transient spikes on the 5090 can briefly hit 700+ W, so don't undersize — a 750 W PSU will trip OCP. Both cards use the 12V-2×6 connector.
Is the RTX 5090 worth $1,000 more than the RTX 5080?
At MSRP ($1,999 vs $999), yes — for 4K gamers, local AI workloads, or 8K content creators. At street prices ($3,900 vs $1,400), it's much harder to justify — you're paying 2.8× the money for 55% more performance. For most buyers in Q2 2026, the 5080 is the better purchase and you spend the savings on an OLED monitor, an X3D CPU upgrade, or a faster NVMe.
Can the RTX 5080 run 32B-parameter local LLMs?
Not at Q4_K_M without CPU offload. A Qwen 32B or DeepSeek-R1 32B at Q4_K_M weighs ~19 GB; the 5080's 16 GB VRAM can't hold it. You'd need to drop to Q3 or Q2 (~13 GB weights) to fit with context, at a ~12–20% quality loss per community HumanEval runs. The RTX 5090's 32 GB VRAM runs 32B Q4_K_M comfortably and fits 70B Q2_K with room for 8K context.
Should I wait for an RTX 5080 Super?
Historically NVIDIA releases "Super" refreshes 12–14 months after the base card. The 5080 launched in Q1 2025; a 5080 Super is plausible mid-to-late 2026. Leaked specs suggest a 20 GB VRAM bump and a 5–10% performance lift. If you can wait 4–6 months, waiting is rational. If you need a card now, buy a 5080 and resell if the Super drops — 5080 residual values have held ~80% MSRP so far.
Sources
- Gamers Nexus — RTX 5090 Review — https://gamersnexus.net/megacharts/gpus
- Tom's Hardware — GPU Hierarchy 2026 — https://www.tomshardware.com/reviews/gpu-hierarchy,4388.html
- KitGuru — RTX 5080 Review and RTX 5090 Review — https://www.kitguru.net/category/components/graphic-cards/
- TechPowerUp — RTX 5090 GPU Database — https://www.techpowerup.com/gpu-specs/geforce-rtx-5090.c4216
- llama.cpp GitHub Discussion #10879 — Vulkan backend performance on RTX 5090 — https://github.com/ggerganov/llama.cpp/discussions/10879
Related guides
- RTX 5090 vs RTX 4090 — Is the Blackwell upgrade worth it?
- RTX 5080 vs RTX 4080 SUPER — one generation apart
- RTX 5090 vs Mac Studio M4 Max for AI — which wins in 2026?
- How to run Qwen 3 32B on NVIDIA GeForce RTX 5090
— SpecPicks Editorial · Last verified Apr 24, 2026
