The Ryzen AI Max+ PRO 495 appeared in PassMark with 192GB unified memory — enough to run Llama 3 405B at q2, DeepSeek-V3 at q3 (usable quality), and Qwen2-72B at q8 on a single APU without any GPU. If the leak is accurate, this APU clears every consumer unified-memory ceiling that matters for local LLM inference as of 2026.
AMD Ryzen AI Max+ PRO 495 192GB: What the PassMark Leak Tells Us
By Mike Perry | Published May 2026
The AMD Ryzen AI Max+ 395 with 128GB unified memory is already the most capable single-chip local LLM platform available as of Q1 2026 — see the full AMD Ryzen AI Max+ 395 vs RTX 3060 12GB analysis. The 395 comfortably runs Llama 3 70B at q8 and DeepSeek-V3 at q2, but it runs out of room for 400B+ parameter models at anything above very low quantization.
The AI Max+ PRO 495 leaked in the PassMark baseline database with 192GB unified memory — 64GB more than the 395's maximum. The PassMark entry, first reported by Tom's Hardware, appears in the database as a pre-launch OEM test submission. Whether the number survives to retail or gets revised, 192GB is the architecture's apparent ceiling for the PRO workstation SKU.
This article breaks down what 192GB means for local LLM inference: which models it unlocks, how throughput scales, and whether you should wait for the 495 or buy a 395 now.
Key Takeaways
- PassMark leak shows Ryzen AI Max+ PRO 495 with 192GB unified memory — 64GB over the 395's 128GB max
- At 192GB: Llama 3 405B fits at q2, DeepSeek-V3 at q3, Qwen2-72B at q8 with room for 64K+ context KV cache
- Projected token throughput: ~14-18 tok/s on 70B q8 (based on 395 scaling data), ~6-9 tok/s on 405B q2
- Mini-PC availability estimate: Q3-Q4 2026 if AMD announces at CES 2026
- Wait vs buy: buy a 395 now if you need 70B q8 today; wait if 400B tier is the specific use case
What Did the PassMark Leak Actually Reveal?
The PassMark baseline entry shows:
- Processor: AMD Ryzen AI Max+ PRO 495
- Memory: 192 GB
- PassMark score: Not publicly disclosed (entry visible in raw database queries)
- System class: Workstation laptop or mini-PC (not consumer segment)
Per PassMark's baseline database, these entries are submitted by hardware manufacturers during pre-launch validation testing. They're often removed within days of discovery by the manufacturer's PR team. The 192GB figure specifically aligns with AMD's LPDDR5X dual-channel design: two 96GB modules at 5500 MT/s = 192GB total, which would require next-generation LPDDR5X 96Gb-die stacking (16 layers × 6GB per stack, or similar).
Is 192GB architecturally plausible? Yes — per AMD's AI 300 series product page, the Strix Halo platform (which both the 395 and 495 use) supports up to 256GB in theory via the LPDDR5X controller's addressing range. The 395 shipped at 128GB max due to LPDDR5X 64Gb die availability at launch. 96Gb dies were rumored for H2 2025 production — the 495 PRO's 192GB would be the first shipping product to use them.
How Does 192GB Unified Memory Change the Local-LLM Ceiling?
Unified memory APUs (Strix Halo architecture) access the entire memory pool at GPU bandwidth — approximately 512 GB/s on the 395, expected similar on the 495. The entire LLM weight matrix sits in memory the GPU reads directly, unlike discrete GPU builds where VRAM capacity (32GB on RTX 5090) is the hard ceiling.
The capacity ceiling shift:
| Memory | Models that fit at q4 | Models that fit at q8 |
|---|---|---|
| 32GB (RTX 5090 VRAM) | Llama 3 70B (~38GB) | Llama 3 8B (~8GB) |
| 128GB (AI Max+ 395) | DeepSeek-V3 (~85GB) | Llama 3 70B (~70GB) |
| 192GB (AI Max+ PRO 495) | Llama 3 405B (~110GB) | DeepSeek-V3 q5 (~145GB? near limit) |
Note: q4 = 4-bit quantization; q8 = 8-bit (higher quality, more memory). DeepSeek-V3 at q5 on 192GB is tight (~145GB estimated vs 192GB available, leaving ~47GB for KV cache and system overhead). It may work with smaller context windows; 64K tokens at 70B q8 uses ~20GB for KV cache.
Quantization Matrix: What Models Fit on 192GB
Based on GGUF quantization model sizes from the Hugging Face hub and LocalLLaMA community measurements (as of May 2026):
| Model | q2 size | q3 size | q4 size | q6 size | q8 size | fp16 size |
|---|---|---|---|---|---|---|
| Llama 3.1 8B | ~3GB | ~4GB | ~5GB | ~6GB | ~8GB | ~16GB |
| Llama 3.1 70B | ~28GB | ~38GB | ~43GB | ~57GB | ~74GB | ~147GB |
| Llama 3.1 405B | ~108GB | ~145GB | ~180GB | ❌ | ❌ | ❌ |
| DeepSeek-V3 (671B MoE) | ~85GB | ~120GB | ~155GB | ❌ | ❌ | ❌ |
| Qwen2.5 72B | ~27GB | ~37GB | ~45GB | ~58GB | ~74GB | ~148GB |
| Mixtral 8x22B | ~36GB | ~49GB | ~60GB | ~79GB | ~95GB | ~190GB |
| Gemma 2 27B | ~11GB | ~14GB | ~17GB | ~23GB | ~29GB | ~57GB |
At 192GB (with ~170GB usable after system overhead and KV cache reserve):
- ✅ Llama 3.1 405B at q2 — fits with room for context
- ✅ DeepSeek-V3 at q3 — fits but tight
- ✅ Llama 3.1 70B at q8 — fits with ~100GB spare
- ✅ Qwen2.5 72B at q8 — fits cleanly
- ✅ Mixtral 8x22B at q8 — fits (95GB)
- ❌ Llama 3.1 405B at q3 (~145GB) + large KV cache — marginal
Prefill vs Generation Token-Throughput Projections
Based on measured Ryzen AI Max+ 395 (128GB) performance at similar quantization levels, and projected scaling for the 495:
The 395 achieves approximately:
- 16-20 tok/s generation on Llama 3 70B q4 (per LocalLLaMA benchmark threads)
- 8-12 tok/s on DeepSeek-V3 q2 (MoE-sparse, fewer active weights per token)
- 50-60 tok/s on Llama 3 8B q8 (small model, fully bandwidth-bound)
The 495 PRO is expected to have the same or slightly higher memory bandwidth than the 395 (same Strix Halo architecture, similar iGPU shader count). Projections for 192GB-specific models:
| Model + Quant | Expected tok/s (gen) | Notes |
|---|---|---|
| Llama 3.1 405B q2 | ~4-7 tok/s | Memory bandwidth limited; usable but slow |
| DeepSeek-V3 q3 | ~8-12 tok/s | MoE efficiency helps; better than dense 70B at q8 |
| Llama 3.1 70B q8 | ~14-18 tok/s | Core strength of the platform |
| Qwen2.5 72B q8 | ~13-17 tok/s | Similar to 70B |
Context window note: At 405B q2 with 128K context, the KV cache alone is ~30-50GB. 192GB gives enough headroom to run 405B q2 at 64K context comfortably, which is the practical use case for long-document analysis.
Context-Length Impact: 200K Tokens on a Single APU?
KV cache scales linearly with context length and model size. For a 70B model in fp16, each 1K context tokens requires approximately 160MB of KV cache. At 200K tokens, that's ~32GB for the KV cache alone, which comfortably fits within 192GB alongside a 70B q8 model (~74GB weights).
For 405B models, KV cache is larger (~450MB per 1K tokens in fp16, ~225MB at q4 KV cache compression). At 200K context, that's ~45-90GB — technically possible on 192GB but leaves minimal headroom. A realistic 200K context at 405B q2 on 192GB requires aggressive KV cache quantization (Q4_KV or Q8_KV compression).
Bottom line: 192GB makes long-context inference for 70B models fully practical without any trade-offs. 200K+ context on 405B is technically possible but requires careful KV cache configuration.
Mini-PC Vendor Likelihood
Based on the 395's vendor trajectory (announced January 2025, mini-PCs shipping by Q3 2025), the expected vendors for the 495 PRO:
- GMKtec EVO-X3 or equivalent: GMKtec was first to market with the 395; likely repeat for the 495 PRO. Typical price premium: $200-400 over 128GB equivalents.
- Beelink GTR9 Ultra: Beelink followed GMKtec by 2-3 months on the 395; similar timeline expected.
- Framework Desktop: Framework's modular desktop used the 395; the PRO variant targets workstation buyers — high likelihood of Framework adoption for repairability-focused users.
- Corsair One: The Corsair One PRO series (uses the H100i Elite cooler at B0BQJ72D7R for its desktop builds) regularly adopts high-TDP desktop APU platforms. A Strix Halo 192GB build fits the One's form factor.
Enterprise allocation may prioritize the PRO SKU to workstation OEMs (HP ZBook, Lenovo ThinkPad P-series) before consumer mini-PC vendors get volume.
Performance-Per-Dollar + Performance-Per-Watt Math
vs AI Max+ 395 (128GB):
At projected pricing ($200-400 premium for 192GB over 128GB), the 495 PRO delivers the same tok/s throughput for 70B models (same architecture, same bandwidth) but with more capacity. The cost justification is purely about whether you need 400B+ models — if you don't, the 395 is strictly better value.
vs Threadripper PRO + RTX 5090:
| Metric | AI Max+ PRO 495 est. | Threadripper PRO + RTX 5090 |
|---|---|---|
| Max model size | 192GB (APU memory) | 32GB (VRAM) + system RAM offload |
| 70B q8 tok/s (gen) | ~16 tok/s | ~85 tok/s (on-VRAM) |
| 400B+ model support | Yes (q2-q3) | Only with CPU offload (slow) |
| System power (idle) | ~35-55W | ~150-250W |
| Estimated system cost | $3,000-5,000 | $8,000-12,000+ |
| Form factor | Mini-PC (~2L) | Tower workstation |
The APU wins on model capacity per dollar, power efficiency, and form factor. The discrete GPU wins on tok/s for models that fit in VRAM. For 400B+ inference specifically, there is no affordable discrete GPU alternative — you'd need 4× RTX 5090 in NVLink ($8,000+ just in GPUs) to beat the 495's model size.
Bottom Line: Wait for the 495 or Buy a 395 Today?
Buy the 395 today if:
- You need 70B q8 local inference now — the 395 already delivers this without compromise
- Your primary use cases are Llama 3 70B, Qwen2.5 72B, Mistral 22B — all fit well within 128GB
- You don't have a specific need for 400B+ models or 200K context on 400B
Wait for the 495 PRO if:
- 400B+ parameter models (Llama 3.1 405B, full-precision DeepSeek) are your specific target
- You need 200K+ context windows on 70B at q8 without KV cache compression
- You're building a shared inference server where total model capacity matters
- You're not in a hurry — mini-PC availability is Q3-Q4 2026 at earliest
The 395 is the right buy for 90% of local AI inference use cases in 2026. The 495 PRO is for the remaining 10% who specifically need to run 400B-class models or extreme context windows on a single, compact, energy-efficient system.
FAQ
Is the 192GB number on the AI Max+ PRO 495 confirmed? Per Tom's Hardware's leak coverage citing the PassMark database entry, the 192GB figure appears in a pre-launch OEM test submission. It aligns with AMD's Strix Halo architecture supporting up to 256GB via LPDDR5X — plausible with next-gen 96Gb-die modules.
What models become possible at 192GB that aren't on the 395's 128GB? Per LocalLLaMA quantization community testing, 192GB adds Llama 3.1 405B at q2 (~108GB), DeepSeek-V3 at q3 (~120GB), and sufficient KV cache headroom for 200K+ context windows on 70B q8 models.
How does this compare to a Threadripper PRO + RTX 5090 build? Per Puget Systems benchmarks and projected APU throughput, the discrete GPU build delivers ~3x higher tok/s on 70B models but is limited to 32GB VRAM for model size. The 495 APU wins on total model capacity and cost-per-GB-of-inference; the discrete GPU wins on raw throughput for models that fit in VRAM.
When will mini-PCs with the 495 actually ship? Per AMD's historical Strix Halo cadence, mini-PC availability is estimated Q3-Q4 2026 if AMD announced the 495 at CES 2026. GMKtec, Beelink, and Framework Desktop are the likely first-to-market vendors.
What CPU cooler do these mini-PCs typically use? Per teardowns of AI Max+ 395 mini-PCs, the APU runs under a large vapor chamber + dual-fan setup designed for 45-65W sustained TDP. The PRO variant targets workstation thermal discipline — adequate for sustained LLM inference without throttling.
Sources
- Tom's Hardware — AMD Ryzen AI Max+ PRO 495 PassMark Leak
- AMD Ryzen AI 300 Series Product Page
- PassMark Baseline Database V11
Related Guides
- AMD Ryzen AI Max+ 395 vs RTX 3060 12GB for Local LLM Inference (2026)
- MTP Decoding on RTX 3060 12GB: When Multi-Token Prediction Helps (and Hurts)
- Best CPU Cooler for Ryzen 7 5800X Overclocking (2026)
- Best Gaming CPU for 1440p Builds in 2026
Last verified May 10, 2026. Leak details are unconfirmed; treat projections as estimates until official AMD announcement.
