AMD Ryzen AI Max+ PRO 495 192GB: What the PassMark Leak Tells Us

AMD Ryzen AI Max+ PRO 495 192GB: What the PassMark Leak Tells Us

The 192GB unified-memory APU benchmarked in PassMark — what models it unlocks, projected token throughput, and whether to wait

The Ryzen AI Max+ PRO 495 leaked in PassMark with 192GB unified memory. At 192GB you can run Llama 3 405B at q2 and DeepSeek-V3 at q3 on a single APU. Here's the full quantization matrix and the wait-vs-buy math.

The Ryzen AI Max+ PRO 495 appeared in PassMark with 192GB unified memory — enough to run Llama 3 405B at q2, DeepSeek-V3 at q3 (usable quality), and Qwen2-72B at q8 on a single APU without any GPU. If the leak is accurate, this APU clears every consumer unified-memory ceiling that matters for local LLM inference as of 2026.


AMD Ryzen AI Max+ PRO 495 192GB: What the PassMark Leak Tells Us

By Mike Perry | Published May 2026


The AMD Ryzen AI Max+ 395 with 128GB unified memory is already the most capable single-chip local LLM platform available as of Q1 2026 — see the full AMD Ryzen AI Max+ 395 vs RTX 3060 12GB analysis. The 395 comfortably runs Llama 3 70B at q8 and DeepSeek-V3 at q2, but it runs out of room for 400B+ parameter models at anything above very low quantization.

The AI Max+ PRO 495 leaked in the PassMark baseline database with 192GB unified memory — 64GB more than the 395's maximum. The PassMark entry, first reported by Tom's Hardware, appears in the database as a pre-launch OEM test submission. Whether the number survives to retail or gets revised, 192GB is the architecture's apparent ceiling for the PRO workstation SKU.

This article breaks down what 192GB means for local LLM inference: which models it unlocks, how throughput scales, and whether you should wait for the 495 or buy a 395 now.


Key Takeaways

  • PassMark leak shows Ryzen AI Max+ PRO 495 with 192GB unified memory — 64GB over the 395's 128GB max
  • At 192GB: Llama 3 405B fits at q2, DeepSeek-V3 at q3, Qwen2-72B at q8 with room for 64K+ context KV cache
  • Projected token throughput: ~14-18 tok/s on 70B q8 (based on 395 scaling data), ~6-9 tok/s on 405B q2
  • Mini-PC availability estimate: Q3-Q4 2026 if AMD announces at CES 2026
  • Wait vs buy: buy a 395 now if you need 70B q8 today; wait if 400B tier is the specific use case

What Did the PassMark Leak Actually Reveal?

The PassMark baseline entry shows:

  • Processor: AMD Ryzen AI Max+ PRO 495
  • Memory: 192 GB
  • PassMark score: Not publicly disclosed (entry visible in raw database queries)
  • System class: Workstation laptop or mini-PC (not consumer segment)

Per PassMark's baseline database, these entries are submitted by hardware manufacturers during pre-launch validation testing. They're often removed within days of discovery by the manufacturer's PR team. The 192GB figure specifically aligns with AMD's LPDDR5X dual-channel design: two 96GB modules at 5500 MT/s = 192GB total, which would require next-generation LPDDR5X 96Gb-die stacking (16 layers × 6GB per stack, or similar).

Is 192GB architecturally plausible? Yes — per AMD's AI 300 series product page, the Strix Halo platform (which both the 395 and 495 use) supports up to 256GB in theory via the LPDDR5X controller's addressing range. The 395 shipped at 128GB max due to LPDDR5X 64Gb die availability at launch. 96Gb dies were rumored for H2 2025 production — the 495 PRO's 192GB would be the first shipping product to use them.


How Does 192GB Unified Memory Change the Local-LLM Ceiling?

Unified memory APUs (Strix Halo architecture) access the entire memory pool at GPU bandwidth — approximately 512 GB/s on the 395, expected similar on the 495. The entire LLM weight matrix sits in memory the GPU reads directly, unlike discrete GPU builds where VRAM capacity (32GB on RTX 5090) is the hard ceiling.

The capacity ceiling shift:

MemoryModels that fit at q4Models that fit at q8
32GB (RTX 5090 VRAM)Llama 3 70B (~38GB)Llama 3 8B (~8GB)
128GB (AI Max+ 395)DeepSeek-V3 (~85GB)Llama 3 70B (~70GB)
192GB (AI Max+ PRO 495)Llama 3 405B (~110GB)DeepSeek-V3 q5 (~145GB? near limit)

Note: q4 = 4-bit quantization; q8 = 8-bit (higher quality, more memory). DeepSeek-V3 at q5 on 192GB is tight (~145GB estimated vs 192GB available, leaving ~47GB for KV cache and system overhead). It may work with smaller context windows; 64K tokens at 70B q8 uses ~20GB for KV cache.


Quantization Matrix: What Models Fit on 192GB

Based on GGUF quantization model sizes from the Hugging Face hub and LocalLLaMA community measurements (as of May 2026):

Modelq2 sizeq3 sizeq4 sizeq6 sizeq8 sizefp16 size
Llama 3.1 8B~3GB~4GB~5GB~6GB~8GB~16GB
Llama 3.1 70B~28GB~38GB~43GB~57GB~74GB~147GB
Llama 3.1 405B~108GB~145GB~180GB
DeepSeek-V3 (671B MoE)~85GB~120GB~155GB
Qwen2.5 72B~27GB~37GB~45GB~58GB~74GB~148GB
Mixtral 8x22B~36GB~49GB~60GB~79GB~95GB~190GB
Gemma 2 27B~11GB~14GB~17GB~23GB~29GB~57GB

At 192GB (with ~170GB usable after system overhead and KV cache reserve):

  • ✅ Llama 3.1 405B at q2 — fits with room for context
  • ✅ DeepSeek-V3 at q3 — fits but tight
  • ✅ Llama 3.1 70B at q8 — fits with ~100GB spare
  • ✅ Qwen2.5 72B at q8 — fits cleanly
  • ✅ Mixtral 8x22B at q8 — fits (95GB)
  • ❌ Llama 3.1 405B at q3 (~145GB) + large KV cache — marginal

Prefill vs Generation Token-Throughput Projections

Based on measured Ryzen AI Max+ 395 (128GB) performance at similar quantization levels, and projected scaling for the 495:

The 395 achieves approximately:

  • 16-20 tok/s generation on Llama 3 70B q4 (per LocalLLaMA benchmark threads)
  • 8-12 tok/s on DeepSeek-V3 q2 (MoE-sparse, fewer active weights per token)
  • 50-60 tok/s on Llama 3 8B q8 (small model, fully bandwidth-bound)

The 495 PRO is expected to have the same or slightly higher memory bandwidth than the 395 (same Strix Halo architecture, similar iGPU shader count). Projections for 192GB-specific models:

Model + QuantExpected tok/s (gen)Notes
Llama 3.1 405B q2~4-7 tok/sMemory bandwidth limited; usable but slow
DeepSeek-V3 q3~8-12 tok/sMoE efficiency helps; better than dense 70B at q8
Llama 3.1 70B q8~14-18 tok/sCore strength of the platform
Qwen2.5 72B q8~13-17 tok/sSimilar to 70B

Context window note: At 405B q2 with 128K context, the KV cache alone is ~30-50GB. 192GB gives enough headroom to run 405B q2 at 64K context comfortably, which is the practical use case for long-document analysis.


Context-Length Impact: 200K Tokens on a Single APU?

KV cache scales linearly with context length and model size. For a 70B model in fp16, each 1K context tokens requires approximately 160MB of KV cache. At 200K tokens, that's ~32GB for the KV cache alone, which comfortably fits within 192GB alongside a 70B q8 model (~74GB weights).

For 405B models, KV cache is larger (~450MB per 1K tokens in fp16, ~225MB at q4 KV cache compression). At 200K context, that's ~45-90GB — technically possible on 192GB but leaves minimal headroom. A realistic 200K context at 405B q2 on 192GB requires aggressive KV cache quantization (Q4_KV or Q8_KV compression).

Bottom line: 192GB makes long-context inference for 70B models fully practical without any trade-offs. 200K+ context on 405B is technically possible but requires careful KV cache configuration.


Mini-PC Vendor Likelihood

Based on the 395's vendor trajectory (announced January 2025, mini-PCs shipping by Q3 2025), the expected vendors for the 495 PRO:

  • GMKtec EVO-X3 or equivalent: GMKtec was first to market with the 395; likely repeat for the 495 PRO. Typical price premium: $200-400 over 128GB equivalents.
  • Beelink GTR9 Ultra: Beelink followed GMKtec by 2-3 months on the 395; similar timeline expected.
  • Framework Desktop: Framework's modular desktop used the 395; the PRO variant targets workstation buyers — high likelihood of Framework adoption for repairability-focused users.
  • Corsair One: The Corsair One PRO series (uses the H100i Elite cooler at B0BQJ72D7R for its desktop builds) regularly adopts high-TDP desktop APU platforms. A Strix Halo 192GB build fits the One's form factor.

Enterprise allocation may prioritize the PRO SKU to workstation OEMs (HP ZBook, Lenovo ThinkPad P-series) before consumer mini-PC vendors get volume.


Performance-Per-Dollar + Performance-Per-Watt Math

vs AI Max+ 395 (128GB):

At projected pricing ($200-400 premium for 192GB over 128GB), the 495 PRO delivers the same tok/s throughput for 70B models (same architecture, same bandwidth) but with more capacity. The cost justification is purely about whether you need 400B+ models — if you don't, the 395 is strictly better value.

vs Threadripper PRO + RTX 5090:

MetricAI Max+ PRO 495 est.Threadripper PRO + RTX 5090
Max model size192GB (APU memory)32GB (VRAM) + system RAM offload
70B q8 tok/s (gen)~16 tok/s~85 tok/s (on-VRAM)
400B+ model supportYes (q2-q3)Only with CPU offload (slow)
System power (idle)~35-55W~150-250W
Estimated system cost$3,000-5,000$8,000-12,000+
Form factorMini-PC (~2L)Tower workstation

The APU wins on model capacity per dollar, power efficiency, and form factor. The discrete GPU wins on tok/s for models that fit in VRAM. For 400B+ inference specifically, there is no affordable discrete GPU alternative — you'd need 4× RTX 5090 in NVLink ($8,000+ just in GPUs) to beat the 495's model size.


Bottom Line: Wait for the 495 or Buy a 395 Today?

Buy the 395 today if:

  • You need 70B q8 local inference now — the 395 already delivers this without compromise
  • Your primary use cases are Llama 3 70B, Qwen2.5 72B, Mistral 22B — all fit well within 128GB
  • You don't have a specific need for 400B+ models or 200K context on 400B

Wait for the 495 PRO if:

  • 400B+ parameter models (Llama 3.1 405B, full-precision DeepSeek) are your specific target
  • You need 200K+ context windows on 70B at q8 without KV cache compression
  • You're building a shared inference server where total model capacity matters
  • You're not in a hurry — mini-PC availability is Q3-Q4 2026 at earliest

The 395 is the right buy for 90% of local AI inference use cases in 2026. The 495 PRO is for the remaining 10% who specifically need to run 400B-class models or extreme context windows on a single, compact, energy-efficient system.


FAQ

Is the 192GB number on the AI Max+ PRO 495 confirmed? Per Tom's Hardware's leak coverage citing the PassMark database entry, the 192GB figure appears in a pre-launch OEM test submission. It aligns with AMD's Strix Halo architecture supporting up to 256GB via LPDDR5X — plausible with next-gen 96Gb-die modules.

What models become possible at 192GB that aren't on the 395's 128GB? Per LocalLLaMA quantization community testing, 192GB adds Llama 3.1 405B at q2 (~108GB), DeepSeek-V3 at q3 (~120GB), and sufficient KV cache headroom for 200K+ context windows on 70B q8 models.

How does this compare to a Threadripper PRO + RTX 5090 build? Per Puget Systems benchmarks and projected APU throughput, the discrete GPU build delivers ~3x higher tok/s on 70B models but is limited to 32GB VRAM for model size. The 495 APU wins on total model capacity and cost-per-GB-of-inference; the discrete GPU wins on raw throughput for models that fit in VRAM.

When will mini-PCs with the 495 actually ship? Per AMD's historical Strix Halo cadence, mini-PC availability is estimated Q3-Q4 2026 if AMD announced the 495 at CES 2026. GMKtec, Beelink, and Framework Desktop are the likely first-to-market vendors.

What CPU cooler do these mini-PCs typically use? Per teardowns of AI Max+ 395 mini-PCs, the APU runs under a large vapor chamber + dual-fan setup designed for 45-65W sustained TDP. The PRO variant targets workstation thermal discipline — adequate for sustained LLM inference without throttling.


Sources


Related Guides


Last verified May 10, 2026. Leak details are unconfirmed; treat projections as estimates until official AMD announcement.

Products mentioned in this article

Live prices from Amazon and eBay — both shown for every product so you can pick the channel that fits.

SpecPicks earns a commission on qualifying purchases through both Amazon and eBay affiliate links. Prices and stock update independently.

Frequently asked questions

Is the 192GB number on the AI Max+ PRO 495 confirmed?
Per Tom's Hardware's leak coverage citing the PassMark database entry, the 192GB figure appears in the 'Memory Size' field of the passmark baseline result for a system labeled 'AMD Ryzen AI Max+ PRO 495'. PassMark baselines are submitted by OEM/ODM pre-launch testing machines and are occasionally removed post-leak. The 192GB figure aligns with AMD's dual-channel LPDDR5X architecture supporting up to 192GB via two 96GB modules — a plausible configuration for a workstation-tier PRO SKU.
What models become possible at 192GB that aren't on the 395's 128GB?
Per LocalLLaMA quantization community testing, 128GB unified memory comfortably runs Llama 3 70B at q8 (~70GB), DeepSeek-V3 at q2 (~85GB), and Mixtral 8x22B at q4 (~95GB). At 192GB, you add Llama 3 405B at q2 (~110GB), DeepSeek-V3 at q3 (~120GB with better quality), and the full Qwen2-72B at q8 (~74GB with 118GB to spare for context and KV cache). The 192GB barrier specifically unlocks the 400B+ parameter tier at usable quantization levels.
How does this compare to a Threadripper PRO + RTX 5090 build?
Per Puget Systems benchmarks of comparable workstation rigs and projected APU throughput from 395 scaling data, a Threadripper PRO 7985WX + RTX 5090 (32GB VRAM) delivers ~3x higher token generation throughput versus the expected 395/495 APU on 70B models, because discrete GPU VRAM bandwidth (~1.7 TB/s for the 5090) far exceeds unified memory bandwidth (~512 GB/s for the 395). The APU wins on total model size (192GB vs 32GB accessible) and cost-per-GB-of-model; the discrete GPU wins on tokens-per-second for models that fit in VRAM.
When will mini-PCs with the 495 actually ship?
Per AMD's historical Strix Halo cadence (announce at CES, mini-PC vendors ship 4-8 months later), the 395 was announced January 2025 with GMKtec and Beelink mini-PCs shipping by Q3 2025. If the 495 PRO follows the same pattern after a 2025 CES announcement cycle, mini-PC availability is plausible in Q3-Q4 2026. Framework Desktop and Corsair One-class systems may ship earlier with enterprise-prioritized PRO allocations.
What CPU cooler do these mini-PCs typically use?
Per teardown reports of existing AI Max+ 395 mini-PCs (GMKtec EVO-X2, Framework Desktop, Beelink GTR9 Pro), the APU runs under a large vapor chamber + dual-fan cooler designed for 55-65W sustained TDP. The PRO variant likely targets 45W TDP (workstation thermal discipline). Desktop mini-PC enclosures route airflow directly over the vapor chamber — adequate for sustained LLM inference without thermal throttling, unlike laptop implementations that throttle under sustained 100% iGPU load.

Sources

— SpecPicks Editorial · Last verified 2026-05-13