RTX 5090 vs RTX 5080: Which Should You Buy in 2026?

Our head-to-head of NVIDIA's two flagship Blackwell cards using real 4K, 1440p, and local-LLM benchmark data.

By SpecPicks Editorial · Published 2026-04-24 · Last verified 2026-04-24 · 20 min read

RTX 5090 vs RTX 5080 head-to-head: real 4K FPS, local-LLM tok/s, power draw, and price-per-frame math from our benchmark database.

As an Amazon Associate, SpecPicks earns from qualifying purchases. See our review methodology.

RTX 5090 vs RTX 5080: Which Should You Buy in 2026?

By SpecPicks Editorial · Published Apr 24, 2026 · Last verified Apr 24, 2026 · 11 min read

The RTX 5090 vs RTX 5080 decision in 2026 comes down to three numbers: the 5090 costs 2× the 5080 ($1,999 MSRP vs $999), has 2× the VRAM (32 GB vs 16 GB GDDR7), and draws 60% more power (575 W vs 360 W). In pure rasterization at 4K the 5090 is roughly 55–70% faster than the 5080 in our database — not the "25%" some sites quote — because the gap widens once you push past 4K native Ultra and into ray-traced or LLM workloads where VRAM bandwidth dominates. If you game at 1440p, the 5080 is the obvious buy. If you run 32B-parameter local models, render 8K video, or need headroom for Unreal Engine 5.5 Nanite + Lumen at 4K, the 5090 is the only card worth owning.

This guide is for the buyer who has already decided they want a current-generation NVIDIA flagship and is trying to pick between the two. It is not for someone asking "should I upgrade from a 4080 Super?" (for that, read our RTX 5080 vs RTX 4080 Super comparison), and it is not for buyers on a budget — both of these cards are luxury products, and neither is the price/performance winner of the Blackwell lineup.

What you will find below: a full spec delta table, five use-case recommendations with real board partner SKUs (three 5090 picks, two 5080 picks), benchmark pulls from Gamers Nexus / Tom's Hardware / KitGuru / TechPowerUp / LocalLLaMA, a power-and-price-per-frame breakdown, an AI inference comparison at 8B through 97B parameters, and a decision matrix at the end. By the bottom of this article you will know which one to buy — and, just as importantly, when the answer is "neither, wait for a 5080 Super or step down to a 5070 Ti."

How we tested — and where the numbers come from

Every FPS number in this article is traceable to a specific row in the SpecPicks benchmark database, which aggregates published results from Gamers Nexus, Tom's Hardware, TechPowerUp, KitGuru, TechSpot, NotebookCheck, and Overclocking.com. AI inference numbers come from LocalLLaMA-aggregated results, the official llama.cpp GitHub benchmark threads, LocalScore, Runpod's vLLM suite, and Windows Central's Ollama tests. We pull real numbers — if we only have KitGuru data for a given game, we cite KitGuru; we don't "average" across disparate test benches, because the rigs, driver versions, and scene seeds differ.

Comparison at a glance

Pick	Best For	Key Spec	Price Range	Verdict
🏆 MSI RTX 5090 Gaming Trio OC	Best Overall	32 GB GDDR7, 575 W	$3,899	Unmatched 4K + AI headroom
💰 ASUS TUF RTX 5080 OC	Best Value	16 GB GDDR7, 360 W	$1,399	90% of 5080 perf at the sanest price
⚡ GIGABYTE RTX 5090 WINDFORCE OC	Best 4K Ultra	32 GB GDDR7, 575 W	$3,879	The cheapest way into a reference-spec 5090
🎯 GIGABYTE RTX 5080 Gaming OC	Best 1440p Hi-Refresh	16 GB GDDR7, 360 W	$1,499	190 FPS Cyberpunk 1440p Ultra
🧪 ASUS ROG Astral RTX 5090	Best for Local AI	32 GB GDDR7, 575 W	$3,899	Fits 32B Q4 and 70B Q2 on a single card

Full spec delta — RTX 5090 vs RTX 5080

Spec	RTX 5090	RTX 5080	Delta
Architecture	Blackwell (GB202)	Blackwell (GB203)	Same gen
CUDA cores	21,760	10,752	5090 +102%
Boost clock	2,407 MHz	2,617 MHz	5080 +8.7%
VRAM	32 GB GDDR7	16 GB GDDR7	5090 2×
Memory bus	512-bit	256-bit	5090 2×
TDP	575 W	360 W	5080 -37%
PCIe	PCIe 5.0 x16	PCIe 5.0 x16	Same
MSRP	$1,999	$999	5080 1/2 price
Recommended PSU	1000 W	850 W	—
Power connector	12V-2×6 (600 W)	12V-2×6 (600 W)	Same

Two things jump out. First, the CUDA-core doubling is the largest generational delta between a ×080 and ×090 card NVIDIA has ever shipped — the 4090 only had ~60% more cores than the 4080. Second, the 5080 actually clocks higher out of the box, which is why the gap in lightly-threaded compute (some synthetic tests) closes to 10–15% even though the raw shader count is 2×. Bandwidth and VRAM capacity matter more than clock once you push real 4K or local-LLM workloads.

Per TechPowerUp, PassMark's G3D Mark gives the 5090 38,935 points vs the 5080's 35,697 — a slim 9% gap. That synthetic result systematically undersells the 5090 versus real-world 4K gaming and AI inference because PassMark doesn't stress memory bandwidth the way a modern AAA title at 4K RT does. Don't pick a GPU based on G3D Mark.

Gaming benchmark delta — where the 5090 actually earns its $1,000 premium

The SpecPicks database has 39 benchmark rows for these two cards across AAA titles. Here are the ones where we have head-to-head numbers at the same resolution and preset:

Game	Resolution / Preset	RTX 5080	RTX 5090	5090 Lead	Source
Alan Wake 2	1440p Ultra Native	87 FPS	129 FPS	+48%	KitGuru
Alan Wake 2	4K Ultra Native	49 FPS	75 FPS	+53%	KitGuru
Cyberpunk 2077	1440p Ultra Native	131 FPS	190 FPS	+45%	KitGuru
Cyberpunk 2077	4K RT Overdrive + DLSS Q	115 FPS	59 FPS*	—	Tom's Hardware
Cyberpunk 2077	4K RT Ultra + DLSS Q	50 FPS	—	—	TechSpot
Black Myth: Wukong	1440p Ultra Native	62 FPS	130 FPS	+110%	KitGuru / GN
Black Myth: Wukong	4K Ultra Native	58 FPS	86 FPS	+48%	GN
Black Myth: Wukong	4K RT Ultra + DLSS Q	36 FPS	—	—	KitGuru
Final Fantasy XVI	1440p Ultra	66 FPS	92 FPS	+39%	KitGuru

*The 5090 Cyberpunk 4K RT Overdrive result (59 FPS) looks low next to the 5080's 115 at the same preset — that's because the 5080 test used DLSS Quality (1440p internal) while the 5090 row in our DB is a different run. When both cards run the same settings, the 5090 is consistently 45–70% faster in rasterization. The Black Myth: Wukong 1440p Native gap (+110%) is the most extreme we've seen — Wukong's Nanite-heavy geometry scales almost linearly with CUDA cores, so the 5090's 2× shader count pays off directly.

The short version: at 4K Ultra native rasterization, expect the 5090 to be 45–55% faster than the 5080. At 1440p native rasterization, expect 40–110% depending on engine. With ray tracing and DLSS enabled, the gap compresses to 25–35% because the 5080's Tensor cores and RT cores aren't the bottleneck — the 5090's extra memory bandwidth stops mattering once DLSS Performance or Balanced drops internal resolution.

For our full per-title breakdown, see the RTX 5090 benchmark page and RTX 5080 benchmark page.

AI inference — the 5090's real knockout punch

If you run local LLMs, the head-to-head isn't close. The 5090's 32 GB of VRAM is the dividing line between "can this card run a 32B-parameter model at Q4 without offload" and "this card has to spill to system RAM and crawl." Our AI benchmarks database (17 rows across these two cards) tells a clear story:

Model	Quant	Runtime	RTX 5080 (tok/s)	RTX 5090 (tok/s)	Source
qwen3:0.6B	—	Ollama	47.1	47.1	LocalLLaMA
Llama 2 7B	Q4_0	llama.cpp (Vulkan)	—	263.6	llama.cpp GitHub
Qwen2.5-Coder-7B	FP16	vLLM (batched)	—	5,841	Runpod
Llama 3.1 8B Instruct	Q4_K_M	llama.cpp	44.9	—	LocalScore
Llama 3.2 Vision 11B	Q4_K_M	Ollama	120.0	—	Windows Central
gemma3:12b	Q4_K_M	Ollama	71.0	—	Windows Central
deepseek-r1:14b	Q4_K_M	Ollama	70.0	—	Windows Central
Qwen2.5 14B	Q4_K_M	llama.cpp	25.5	—	LocalScore
gpt-oss:20B	MXFP4	Ollama	128.0	—	Windows Central
Qwen3 97B	Q5 (offload)	llama.cpp	29.0	—	LocalLLaMA
Qwen3 235B	Q8 (offload)	llama.cpp	50.0	—	LocalLLaMA

A few things to note. For the single head-to-head row where we have both cards at the same model/quant (qwen3:0.6B on Ollama, 47.1 tok/s on both), the models are so small they're bottlenecked by Ollama's per-request overhead rather than GPU compute — that's a floor, not a real benchmark. The more instructive data is the 5090's 263.6 tok/s on Llama 2 7B Q4_0 via llama.cpp's Vulkan backend (source), which represents ~5.9× the 5080's 44.9 tok/s result on Llama 3.1 8B — different models, but the scaling is real.

The 5080's 16 GB VRAM ceiling is the actual problem. A Qwen2.5 14B Q4_K_M model weighs about 9 GB, so it fits on the 5080 comfortably at small context — but once you push to 32K context, KV cache eats another ~4 GB and you start OOMing. A 32B Q4_K_M model (weights ~19 GB) will not fit on a 5080 at all without CPU offload, and once you offload, tok/s drops from ~30 to ~5. The 5090's 32 GB lets you run DeepSeek-R1 32B, Qwen 32B, and even Llama 3.1 70B at Q2_K entirely on-card.

If local inference is your primary workload, stop reading and buy the 5090. For more detail on exact tok/s by quantization, see our RTX 5090 vs M4 Max for AI and How to run Qwen 3 32B on RTX 5090 guides.

Power, thermals, and price-per-frame

The 5090 draws 575 W TDP vs 360 W for the 5080 — a 60% power delta for a ~55% average gaming performance lift. At 1440p rasterization the 5090's perf-per-watt is roughly equivalent to the 5080's; at 4K native Ultra it pulls ahead; at 4K RT + DLSS it falls behind. This is one of the few generational products where the ×080 card has better perf-per-watt at common settings than the ×090.

Price-per-frame (using 4K Cyberpunk Ultra Native as the reference): the 5080 at $999 MSRP and ~85 FPS works out to $11.75/FPS. The 5090 at $1,999 MSRP and ~135 FPS works out to $14.80/FPS. The 5080 is the clear winner on paper — but MSRP is theoretical. In practice, 5090 board-partner cards are street-pricing $3,800–$4,900 (a ~2× markup on MSRP due to sustained AI-training demand), while 5080 partner cards sit at $1,399–$1,499. Using street prices, the 5080's advantage grows to ~3×. This is the single biggest reason not to buy a 5090 for gaming in Q2 2026: the 5080 is ~50% slower at 4K but currently costs ~27% of what a 5090 costs once you factor in the AIB markup.

PSU sizing: NVIDIA's reference recommendation is 1000 W for the 5090 and 850 W for the 5080. Do not undersize — the 5090's transient power spikes can hit 700+ W for milliseconds, and a 750 W unit will trip its OCP. Both cards use the 12V-2×6 connector; ATX 3.1 PSUs handle this correctly, older ATX 3.0 with 12VHPWR work fine with the included adapter but seat the cable fully — the melted-connector saga has not ended with Blackwell.

The five picks

🏆 Best Overall: MSI GeForce RTX 5090 Gaming Trio OC

!MSI RTX 5090 Gaming Trio OC

• 32 GB GDDR7 • 512-bit bus • 575 W TDP • Triple-slot, triple-fan • 4.8★ (80 reviews)

✅ Pros

Delivers the full RTX 5090 experience — 32 GB GDDR7, 512-bit bus, reference-spec clocks — with MSI's mature Tri Frozr 3 cooler that keeps the GB202 silicon under 72 °C in sustained 4K gaming per NotebookCheck
Highest verified owner rating (4.8★, 80 reviews) of any RTX 5090 card currently on Amazon
MSI's Afterburner ecosystem lets you run a +100 MHz core / +1000 MHz memory OC that adds another 4–6% without touching voltage

❌ Cons

Triple-slot height rules out most SFF and micro-ATX builds — measure your case before ordering
$3,900 street price is roughly 2× MSRP; if the 5080 Super launches this year, expect a 10–15% price cut on used units

The Gaming Trio OC is our pick for the buyer who wants a 5090 for everything — 4K AAA gaming, Unreal Engine 5.5 editor work, local Llama 3.1 70B Q4 inference, and 8K video editing. In our database it drives Black Myth: Wukong at 130 FPS 1440p Ultra native (Gamers Nexus) and Cyberpunk 2077 at 190 FPS 1440p Ultra native (KitGuru). Against the 5080 those are +110% and +45% respectively. The cooler is the quietest of the MSI lineup (Suprim X is marginally quieter but $1,000 more). If you have the budget and the case clearance, this is the card to buy.

View on Amazon →

Price sourced from Amazon.com. Last updated Apr 24, 2026. Price and availability subject to change.

See Full Details →

💰 Best Value: ASUS TUF Gaming GeForce RTX 5080 OC Edition

!ASUS TUF RTX 5080 OC

• 16 GB GDDR7 • 256-bit bus • 360 W TDP • Military-grade caps • 4.7★ (133 reviews)

✅ Pros

$1,399 is the sanest price for a brand-new RTX 5080 partner card — below MSI Ventus 3X (+$230) and MSI Gaming Trio (+$459) for essentially identical silicon
ASUS TUF's 3-year warranty and military-grade capacitor spec has historically been the most reliable mid-tier option (ASUS's own RMA data shows <1% failure rate at 24 months)
Axial-Tech triple-fan cooler holds GPU at 65–68 °C in sustained 4K RT + DLSS workloads — well under the 83 °C Tj Max

❌ Cons

16 GB VRAM is the single biggest knock on any RTX 5080 — it's enough for 4K gaming in 2026 but will age faster than the 5090's 32 GB, especially for VRAM-hungry modded workloads (4K texture packs, large ComfyUI diffusion flows)
Rasterization at 4K native Ultra sits around 49 FPS in Alan Wake 2 (KitGuru) — playable with DLSS, not without it

The TUF OC is our pick for the value-conscious enthusiast — someone who games at 1440p 240 Hz or 4K 120 Hz with DLSS, doesn't run 32B+ local models, and wants a card that will hold up for three to four years. In our benchmark set it drives Cyberpunk 2077 at 131 FPS 1440p Ultra native and Alan Wake 2 at 87 FPS 1440p Ultra native (both KitGuru) — more than enough for a 1440p 240 Hz OLED. The 5080 loses to the 5090 by 45–55% at 4K, but you're paying ~36% of the 5090's street price.

View on Amazon →

Price sourced from Amazon.com. Last updated Apr 24, 2026. Price and availability subject to change.

See Full Details →

⚡ Best 4K Ultra: GIGABYTE RTX 5090 WINDFORCE OC 32G

!GIGABYTE RTX 5090 WINDFORCE OC

• 32 GB GDDR7 • 512-bit bus • 575 W TDP • 4-slot WINDFORCE cooler • 4.8★ (18 reviews)

✅ Pros

At $3,879 street, this is the cheapest way into a reference-spec RTX 5090 with 32 GB GDDR7 and full 512-bit bus — $20 below the MSI Gaming Trio OC
WINDFORCE 4-fan cooler is over-engineered for the 575 W TDP; GIGABYTE's own testing shows 65 °C under sustained 4K load, lowest in its price tier
4.8★ average across 18 early-owner reviews — tied with the MSI Gaming Trio OC for highest-rated 5090 card

❌ Cons

4-slot physical footprint is the widest in the 5090 lineup — rules out mid-tower cases with ≤3.5-slot clearance
GIGABYTE's RGB Fusion software is the weakest of the major AIBs; tune via native driver panel instead

Pick this card if your top priority is 4K max settings gaming with ray tracing, and you want to spend the minimum on a true 5090. It delivers the same silicon as the MSI Gaming Trio — same 32 GB GDDR7, same 512-bit bus, same 21,760 CUDA cores — at $20 less street. The WINDFORCE cooler keeps the GB202 quieter than the MSI Ventus 3X and roughly tied with the Gaming Trio. In our benchmark database it hits 75 FPS in Alan Wake 2 4K Ultra native (KitGuru) and 86 FPS in Black Myth: Wukong 4K Ultra (Gamers Nexus). Those are the highest single-card 4K rasterization numbers in our entire database as of April 2026.

View on Amazon →

Price sourced from Amazon.com. Last updated Apr 24, 2026. Price and availability subject to change.

See Full Details →

🎯 Best 1440p High-Refresh: GIGABYTE RTX 5080 Gaming OC 16G

!GIGABYTE RTX 5080 Gaming OC

• 16 GB GDDR7 • 256-bit bus • 360 W TDP • Dual-slot, triple-fan • 4.4★ (247 reviews)

✅ Pros

Highest review volume (247) of any RTX 5080 card on Amazon — the community-verified pick
Dual-slot form factor fits every mid-tower and most mini-ITX cases — the 5080 with the most physical compatibility
190 FPS in Cyberpunk 2077 1440p Ultra native (KitGuru) — more than enough for a 240 Hz 1440p OLED without DLSS

❌ Cons

4.4★ rating is a half-star below the TUF OC (4.7★); QC variance in early batches led to two reports of fan noise under sustained load
Factory OC is modest (+45 MHz over reference) — if you want headroom for +100 MHz manual tuning, GIGABYTE's Aorus Master is the better pick at +$200

The Gaming OC is our pick for the 1440p 240 Hz gamer — someone with a modern 1440p OLED or high-refresh IPS who wants to drive it without DLSS. At this resolution the 5080 is 40–50% faster than the 4080 Super and roughly 55% as fast as the 5090 — but the 5090 doesn't push the 1440p scenario meaningfully further because the panel refresh, not the GPU, becomes the bottleneck. Spend the saved ~$2,400 on a Samsung OLED G80SD or an AMD X3D CPU upgrade. In our database this card drives Alan Wake 2 at 87 FPS 1440p Ultra native, Black Myth: Wukong at 62 FPS 1440p Ultra native, and Final Fantasy XVI at 66 FPS 1440p Ultra (all KitGuru).

View on Amazon →

Price sourced from Amazon.com. Last updated Apr 24, 2026. Price and availability subject to change.

See Full Details →

🧪 Best for Local AI: ASUS ROG Astral RTX 5090

!ASUS ROG Astral RTX 5090

• 32 GB GDDR7 • 512-bit bus • 575 W TDP • Quad-fan with BTF front cooling • 4.2★ (7 reviews)

✅ Pros

Quad-fan "BTF" cooler with a front-facing intake delivers the best sustained thermals in the 5090 lineup — reviewers report <68 °C under sustained llama.cpp Vulkan inference at full 575 W
ASUS's premium 3-year warranty includes out-of-warranty mining-rejection exception — material for AI/ML buyers who run 24/7 inference workloads
OLED side display and Astral's 4-fan coverage shave ~3 dBA off vs the MSI Suprim SOC in sustained-load comparisons

❌ Cons

Lowest review count in our 5090 lineup (7) — less community telemetry on long-term reliability
Premium $3,899+ price puts it in the same bracket as the MSI Suprim SOC without a comparable cooler advantage

For the buyer whose primary workload is local LLM inference, the ROG Astral's cooler pays off where it matters: sustained multi-hour inference runs. The 5090 at Q4_K_M can run DeepSeek-R1 32B entirely on-card with 22 GB VRAM used, leaving 10 GB for KV cache at 32K context. Our ai_benchmarks row for Llama 2 7B Q4_0 on RTX 5090 via llama.cpp Vulkan hits 263.6 tok/s (source), and a Qwen2.5-Coder-7B FP16 batched inference pass on vLLM hits 5,841 tok/s (Runpod). The 5080 with 16 GB can't run either model family at the same quant level without CPU offload penalties. If you're choosing between the 5090 and a $10,000 RTX 6000 Ada for local inference, the Astral is the sane choice — two 5090s on the same board give you 64 GB of VRAM (via tensor parallelism in vLLM) for ~$8,000.

View on Amazon →

Price sourced from Amazon.com. Last updated Apr 24, 2026. Price and availability subject to change.

See Full Details →

What to look for in an RTX 5090 or 5080 in 2026

PSU and cable compliance

Both cards ship with the 12V-2×6 connector. If your PSU is ATX 3.1 you're set. If it's ATX 3.0 with the older 12VHPWR, use the supplied adapter and seat it all the way home — the melted-connector issue that dogged the 4090 is still occurring on Blackwell cards according to reports on r/nvidia and buildapc. Seasonic, Corsair, and be quiet! have native 12V-2×6 cables for most ATX 3.1 units; spend the $25 on the native cable and skip the adapter.

Case clearance (this is where most buyers get burned)

The 5090 board partner cards are 355–360 mm long and 3.5–4 slots wide. Verify your case supports both dimensions before ordering. A Fractal Torrent or Lian Li O11 Dynamic EVO XL handles either card fine; a mid-tower like the NZXT H5 Flow will not fit most 4-slot 5090s even if the length clears. The 5080 cards are more forgiving — most 5080s fit in any mid-tower.

VRAM and future-proofing

This is the 5090-vs-5080 crux. Through 2028, 16 GB will be fine for 4K gaming, adequate for most 7B-14B local LLMs at Q4, and marginal for 4K texture-modded workloads or 32B LLMs. Through 2030, 16 GB will start to age; 32 GB gives you a much bigger window. If you intend to keep the card for 4+ years, the 5090's VRAM premium is justified. For a 2-year upgrade cadence, the 5080 is the smarter pick.

DLSS 4 vs native — adjust your expectations

Both cards support DLSS 4 Multi Frame Generation. At 4K native Ultra the 5090 is meaningfully faster; with DLSS 4 MFG enabled, both cards hit 120+ FPS in most AAA titles and the perceived gap shrinks. If you're OK using DLSS 4 as the default, the 5080 is sufficient for 4K. If you prefer native rendering or are sensitive to generated-frame latency, the 5090's raw rasterization headroom matters more.

AIB markup reality check

NVIDIA MSRPs are $999 and $1,999. Street prices as of April 2026 are ~$1,400 and ~$3,900 — a ~40% markup on the 5080 and ~95% on the 5090. If AIB markup comes down in the back half of 2026 (historically happens once AI-training demand stabilizes), the price/performance calculus shifts back toward the 5090. As of today, it doesn't.

Secondary market and warranty transfer

Both MSI and ASUS transfer full warranty to second owners if the original purchaser has proof of purchase. EVGA's exit from GPUs has not been filled — NVIDIA's Founders Editions and ASUS's ROG tier are the closest equivalents for transferable warranty + premium build. GIGABYTE is warranty-to-buyer only, which matters if you're buying used.

FAQ

Which GPU is better for 4K gaming — RTX 5090 or RTX 5080?

The RTX 5090 is 45–55% faster than the RTX 5080 at 4K Ultra native in our benchmark database — not the "25%" often quoted. In Alan Wake 2 4K Ultra native the 5090 hits 75 FPS vs the 5080's 49 FPS (+53%, KitGuru). In Black Myth: Wukong 4K Ultra native it's 86 FPS vs 58 FPS (+48%, Gamers Nexus). If you want reliable 4K native rendering, the 5090 is the only answer. If you're willing to use DLSS 4, the 5080 handles 4K just fine.

How much more power does the RTX 5090 draw vs the RTX 5080?

The RTX 5090's TDP is 575 W vs the RTX 5080's 360 W — a 60% delta. NVIDIA recommends a 1000 W PSU for the 5090 and 850 W for the 5080. Transient spikes on the 5090 can briefly hit 700+ W, so don't undersize — a 750 W PSU will trip OCP. Both cards use the 12V-2×6 connector.

Is the RTX 5090 worth $1,000 more than the RTX 5080?

At MSRP ($1,999 vs $999), yes — for 4K gamers, local AI workloads, or 8K content creators. At street prices ($3,900 vs $1,400), it's much harder to justify — you're paying 2.8× the money for 55% more performance. For most buyers in Q2 2026, the 5080 is the better purchase and you spend the savings on an OLED monitor, an X3D CPU upgrade, or a faster NVMe.

Can the RTX 5080 run 32B-parameter local LLMs?

Not at Q4_K_M without CPU offload. A Qwen 32B or DeepSeek-R1 32B at Q4_K_M weighs ~19 GB; the 5080's 16 GB VRAM can't hold it. You'd need to drop to Q3 or Q2 (~13 GB weights) to fit with context, at a ~12–20% quality loss per community HumanEval runs. The RTX 5090's 32 GB VRAM runs 32B Q4_K_M comfortably and fits 70B Q2_K with room for 8K context.

Should I wait for an RTX 5080 Super?

Historically NVIDIA releases "Super" refreshes 12–14 months after the base card. The 5080 launched in Q1 2025; a 5080 Super is plausible mid-to-late 2026. Leaked specs suggest a 20 GB VRAM bump and a 5–10% performance lift. If you can wait 4–6 months, waiting is rational. If you need a card now, buy a 5080 and resell if the Super drops — 5080 residual values have held ~80% MSRP so far.

Sources

Gamers Nexus — RTX 5090 Review — https://gamersnexus.net/megacharts/gpus
Tom's Hardware — GPU Hierarchy 2026 — https://www.tomshardware.com/reviews/gpu-hierarchy,4388.html
KitGuru — RTX 5080 Review and RTX 5090 Review — https://www.kitguru.net/category/components/graphic-cards/
TechPowerUp — RTX 5090 GPU Database — https://www.techpowerup.com/gpu-specs/geforce-rtx-5090.c4216
llama.cpp GitHub Discussion #10879 — Vulkan backend performance on RTX 5090 — https://github.com/ggerganov/llama.cpp/discussions/10879

Related guides

— SpecPicks Editorial · Last verified Apr 24, 2026