AMD Ryzen AI Max+ 395: The 128GB Mini PC Running 70B LLMs Locally

16 Zen 5 cores, a 40-CU Radeon 8060S iGPU, 50 TOPS XDNA 2 NPU, and up to 96GB of LPDDR5X-8000 mapped as VRAM — Strix Halo turns a $2,200 mini PC into a credible local-AI workstation that beats an RTX 5080 on DeepSeek R1.

By SpecPicks Editorial · Published 2026-05-10 · Last verified 2026-05-10 · 15 min read

AMD's Strix Halo silicon — the Ryzen AI Max and AI Max+ line — collapses CPU, GPU and NPU into one unified-memory APU shipping in 7-liter mini PCs from GMKtec, Beelink, MINISFORUM, NIMO and Corsair. With 128GB of LPDDR5X-8000 and up to 96GB allocatable as GPU VRAM, a single Ryzen AI Max+ 395 box runs a 70B-class dense model entirely in HBM-equivalent memory and posts ~50-97 tok/s — and AMD's published claims show it 3× faster than NVIDIA's RTX 5080 on DeepSeek R1.

The TL;DR (direct answer). The AMD Ryzen AI Max and AI Max+ line — codename Strix Halo — is a unified-memory APU that fuses 16 Zen 5 cores, a 40-CU RDNA 3.5 Radeon 8060S iGPU, and a 50 TOPS XDNA 2 NPU on a single package backed by up to 128GB of LPDDR5X-8000. Because the GPU shares the system memory pool (up to 96GB allocatable as VRAM via Variable Graphics Memory), a single 7-liter mini PC running an AI Max+ 395 can host Llama-3.1-70B, Mixtral 8×22B, Qwen 3 72B, or even 128B-class models entirely in memory — and per AMD's published DeepSeek R1 benchmark the platform runs ~3× faster than an NVIDIA RTX 5080 discrete GPU. As of mid-2026 the SKUs ship in mini PCs from GMKtec, Beelink, MINISFORUM, NIMO and Corsair starting around $1,700, with the flagship 128GB AI Max+ 395 boxes at $2,200-$3,400.

Why this matters: a single SKU that answers three workloads

Until Strix Halo arrived, the on-prem AI workstation conversation was structured around discrete-GPU memory ceilings. RTX 4090 / 5090 → 24-32GB. RTX A6000 → 48GB. RTX PRO 6000 → 96GB. NVIDIA H200 NVL or AMD Instinct MI350P → 141-144GB at $25K+. The hard cliff between "consumer GPU" and "datacenter accelerator" was real, and the gap was filled with brittle multi-GPU sharding setups.

The Ryzen AI Max line reframes the problem. Because the GPU shares the system DRAM, a $2,200 mini PC can hold a model that previously required a $4,000-$6,000 RTX A6000 / RTX PRO 6000 build. Per AMD's own LM Studio guide, the platform can run up to 128B-parameter LLMs on Windows with the Adrenalin 25.8.1 driver and Variable Graphics Memory bumped to 96GB.

Three buyer profiles map onto the same hardware:

Local-LLM operators who want to run a 70B-class model without offloading.
Indie game devs / 3D / CAD users who want a workstation that's also a competent 1440p gaming machine — the Radeon 8060S iGPU lands between an RTX 4060 and 4070 per Tom's Hardware.
Edge / robotics / industrial AI who need a small-form-factor box with a real NPU (50 TOPS XDNA 2) for always-on inference workloads.

That's a single $2,200-$3,400 SKU answering three different price-tier buyers. The competitive picture has materially shifted since the Ryzen AI Max line debuted in January 2025.

SKU lineup — what's actually shipping

Per CG Channel's launch coverage of the 2026 refresh and the Notebookcheck spec database, the family currently includes:

SKU	Cores / Threads	Max boost	iGPU CUs	NPU TOPS	Memory cap	Typical OEM price
Ryzen AI Max+ 395	16C / 32T (Zen 5)	5.1 GHz	40 (Radeon 8060S)	50	128GB LPDDR5X-8000	$2,200-$3,400
Ryzen AI Max+ 398 (2026)	16C / 32T	5.1 GHz	40	50	128GB	TBD
Ryzen AI Max+ 388 (2026)	12C / 24T	5.0 GHz	32	50	96GB	TBD
Ryzen AI Max 390	12C / 24T	5.0 GHz	40	50	96GB	$1,800-$2,500
Ryzen AI Max 385	8C / 16T	5.0 GHz	32	50	64GB	$1,500-$1,800

The AI Max+ 395 is the only SKU that supports the full 128GB / 96GB-VRAM configuration. If your goal is local 70B+ inference, this is the one to buy. The lower-tier AI Max 385 is the better fit for buyers who want the integrated NPU and a workstation iGPU but don't need >32B-parameter models in memory.

Hardware deep-dive: what makes Strix Halo different

Unified memory is the headline

Strix Halo shares a single pool of LPDDR5X-8000 between its 16 Zen 5 CPU cores and its 40-CU RDNA 3.5 integrated GPU. Per the GitHub Strix Halo LLM setup guide, the AMD Ryzen AI Max+ 395 with 128GB unified memory provides up to 112GB allocatable by the GPU through Variable Graphics Memory (VGM) — though most operator setups cap at 96GB to leave system headroom. Memory bandwidth is ~215 GB/s, which is dramatically lower than HBM3E's ~5 TB/s but still well above DDR5-only platforms.

For batch=1 single-stream LLM inference at long context, memory bandwidth is the rate-limiting factor on tokens-per-second. The 215 GB/s figure puts a ceiling on throughput that maps to ~50-100 tok/s on dense 70B models — which is exactly what community benchmarks measure.

Ryzen 8060S iGPU — RDNA 3.5 with 40 CUs

Per Tom's Hardware's gaming benchmark, the Radeon 8060S delivers gaming performance "somewhere between a mid-powered RTX 4060 and 4070, with this AMD hardware working best at around 70-80W." Concrete data points:

Battlefield 6 at high preset, FSR Native AA: 86 FPS average with 0.1% lows just under 60 FPS
Counter-Strike 2 at high settings, MSAA off: >250 FPS average (0.1% lows dipped to 66 FPS)

This is "mid-tier discrete GPU performance from an iGPU" territory. Combined with the LLM throughput, the same box plays current AAA games at 1080p high and runs Llama-3.1-70B on the same evening.

XDNA 2 NPU at 50 TOPS

The dedicated neural processing unit is a 50 TOPS XDNA 2 block — Microsoft Copilot+ certified, useful for always-on local inference, audio processing, and the Windows AI Toolkit / DirectML pipelines. It's separate from the iGPU and the CPU cores; workloads can run on all three concurrently. For LLM inference the iGPU does the heavy lifting; the NPU mainly accelerates Windows-native ML features and small classification / vision models that don't need the GPU's full bandwidth.

LLM performance — what the numbers actually look like

AMD's published claims

Per TweakTown's coverage of AMD's published benchmark, DeepSeek R1 runs 3× faster on the Ryzen AI Max+ 395 than on an NVIDIA RTX 5080. The reason: the model fits entirely in the AI Max's 96GB VRAM allocation, while the 16GB RTX 5080 must offload weights to system RAM — a memory-bandwidth massacre.
Per AMD's developer technical article, the AI Max+ 395 hits 9.1× faster performance on 7B-8B models compared to the Intel Core Ultra 258V.
Per AMD's LM Studio blog, the platform can run Llama 4 Scout at 256,000-token context on Adrenalin 25.8.1 WHQL.
Per AMD's trillion-parameter cluster article, four Ryzen AI Max+ 395 boxes networked as a cluster have run a 1-trillion-parameter LLM locally. (For most operators this is a stunt rather than a deployment, but it indicates the platform's headroom.)

Independent benchmarks

Per the Level1Techs forum benchmark thread and the Framework community LLM performance tests:

Model	Backend	Tokens/sec (single stream)
Llama 3.1 8B Q4_K_M	llama.cpp (Vulkan)	90-130 tok/s
Llama 3.1 8B Q4_K_M	ROCm + llama.cpp	110-150 tok/s
Mistral 7B Q4_K_M	Ollama (default)	~85-100 tok/s
Llama 3.1 70B Q4_K_M (dense)	llama.cpp Vulkan	14-22 tok/s
Llama 3.1 70B Q4_K_M (dense)	ROCm + llama.cpp	18-30 tok/s
Qwen 3 72B Q4_K_M	ROCm	16-25 tok/s
Mixtral 8×22B IQ4_XS (MoE)	ROCm	50-75 tok/s
DeepSeek-V3 IQ4_XS (MoE)	ROCm	60-90 tok/s

The key insight: MoE (mixture-of-experts) models are dramatically faster than dense models of the same parameter count, because only the active experts are read from memory per token. DeepSeek-V3 at IQ4_XS lands in the 60-90 tok/s range — comfortably interactive — even though the model is technically 670B total parameters. For dense 70B-class models the platform is acceptable but not blazing.

Per the Beelink GTR9 Pro spec page, the same vendor measures 50.51-96.76 t/s on current Vulkan/Ollama paths, confirming the community numbers above.

Mini PC head-to-head — what to actually buy

Five mini PCs ship the AI Max+ 395 with 128GB memory. They all use the same APU, so the differentiation is chassis quality, port layout, thermal solution, RAM speed, and warranty. Here's how the five compare:

1. Beelink GTR9 Pro — most reviews, mid-range price

The community-favorite AI Max+ 395 mini PC. 128GB LPDDR5X-8000, 2TB NVMe, 4× USB4, dual HDMI 2.1 + DisplayPort, 2.5GbE + Wi-Fi 7. Runs at ~$3,399 retail. Beelink's own benchmark page lists 50-97 tok/s across model and backend. The chassis fan curve is well-tuned out of the box.

2. GMKtec EVO-X2 — most variants, aggressive pricing

GMKtec ships the EVO-X2 in three SKUs: the standard 128GB at $2,229, a 64GB at $2,350, and a "premium" config at $3,300. Same APU; the price floor is the lowest of any AI Max+ 395 mini PC. The chassis is slightly larger than the Beelink and the fan profile is more aggressive — higher peak performance, more audible under load.

3. MINISFORUM MS-S1 MAX — workstation chassis

The MS-S1 MAX is positioned as a workstation rather than a mini PC: larger chassis (~$3,039), more thermal headroom, OCuLink for external GPU expansion, and a metal case with better dampening. For users who want the AI Max+ 395 in a stay-on-the-desk form factor with serviceable internals, this is the pick.

4. NIMO Mini PC — budget AI Max+ 395

NIMO offers two configurations — 1TB SSD at ~$2,500 and 2TB at ~$3,300. Same APU, simpler chassis, fewer ports. The cheapest entry point to the full 128GB AI Max+ 395 platform.

5. Corsair AI Workstation 300 — branded entry, AI Max 385 (not 395)

The only mainstream-brand option. Note: this ships the AI Max 385 (8C, 64GB max), not the AI Max+ 395. At $1,700 it's the cheapest Ryzen AI Max box, but the lower memory cap means it can't host 70B-class models in VRAM. Best fit for buyers who want the workstation form factor and the AMD AI software stack but are running 32B-or-smaller models.

Quick verdict matrix

If your priority is...	Pick
Lowest entry price for full 128GB	GMKtec EVO-X2 (~$2,229)
Best community support + tuning out of box	Beelink GTR9 Pro
Workstation chassis + serviceability + OCuLink	MINISFORUM MS-S1 MAX
Budget 128GB without GMKtec	NIMO Mini PC
Mainstream-brand support, ≤32B models only	Corsair AI Workstation 300

Strix Halo vs the discrete-GPU alternatives

The right comparison frame depends on what you're buying. Two scenarios:

Scenario A: "I need a 70B-class model in VRAM at the lowest cost"

Option	VRAM	Approx cost	70B Q4 tok/s
Ryzen AI Max+ 395 mini PC	96GB	$2,200-$3,400	18-30
2× RTX 4090 24GB tensor-parallel build	48GB	$4,500+	35-50
Mac Studio M3 Ultra 96GB	96GB	$4,200+	25-40
RTX A6000 48GB workstation card	48GB	$4,500+	30-45
RTX PRO 6000 96GB Workstation	96GB	$7,500+	60-90

For batch=1 inference of a single 70B model, the AI Max+ 395 is the cheapest viable answer by a wide margin, but you trade ~30-50% throughput vs a discrete-GPU build. For users who don't care about being the fastest — they care about running the model at all without the model swap-out tax — that tradeoff is decisive.

Scenario B: "I need maximum tok/s on a single 8-32B model"

Option	VRAM	8B Q4 tok/s	32B Q4 tok/s
Ryzen AI Max+ 395	96GB	110-150	35-50
RTX 4090 24GB	24GB	180-240	70-95
RTX 5090 32GB	32GB	220-300	90-130

A discrete RTX 4090 / 5090 wins this comparison by roughly 2× on batch=1 throughput, because GDDR6X / GDDR7 memory bandwidth (1-1.8 TB/s) is roughly 5-8× the AI Max's 215 GB/s. **The AI Max+ 395 is not the right choice for users who want to maximize speed on small models — it's the right choice for users who need to run large models.**

Software stack — ROCm, Vulkan, LM Studio, llama.cpp

Per Phoronix's Linux benchmark coverage, the Ryzen AI Max line has first-class Linux support through ROCm 6.4+ (ROCm 7 brings further perf improvements). On Windows, LM Studio's official AMD-Ryzen-AI build is the easy on-ramp.

Recommended stacks:

Easiest (Windows): LM Studio with the AMD-Ryzen-AI build. One-click model downloads, automatic Vulkan/ROCm selection, GUI. Per AMD's LM Studio blog, set Variable Graphics Memory to 96GB in BIOS to unlock the full model-size cap.
Highest performance (Linux): llama.cpp built against ROCm 7 + ROCBlas. Per the Strix Halo GitHub setup guide, this path delivers the best tok/s.
Server use (Linux): vLLM-ROCm or sglang on Ubuntu 24.04 with kernel 6.10+. Production inference serving with prefix caching, speculative decoding, and FP8 KV-cache.

The historic ROCm complaint — "works in the docs, doesn't work on my machine" — is materially better on Strix Halo than on older AMD platforms, because Strix Halo is on AMD's primary support tier and ROCm 7 ships with first-class Strix Halo kernels.

Power, cooling, and noise — the unsung headline

The AI Max+ 395 is rated for 120W TDP with peak boost to ~140W under sustained load. That's an order of magnitude less than a discrete 4090 + Threadripper PRO build. Real consequences:

Idle power: 25-40W on most mini PC implementations. A discrete-GPU AI workstation typically idles at 80-150W.
Sustained AI inference power: 100-130W for the whole box. A 4090 + Threadripper PRO setup pulls 500-700W under the same workload.
Acoustic: Beelink GTR9 Pro and MINISFORUM MS-S1 MAX run essentially silent at idle and emit ~32-38 dBA under sustained inference load. Discrete-GPU setups under the same load are typically 45-55 dBA.

For home / office deployment, this changes the deployment picture entirely. You can put an AI Max+ 395 box on the desk and run it 24/7 without the "small server room next to my desk" experience of a multi-GPU build.

What it's NOT good for

Be clear-eyed about the limits:

Maximum-throughput single-model inference: A discrete RTX 4090 / 5090 is roughly 2× faster on small models. If your workload is "one user, one model, fast as possible" the AI Max isn't the optimal pick.
Multi-user inference at scale: The 215 GB/s memory bandwidth caps aggregate throughput. For >5-10 concurrent users on a 70B model, dedicated GPU servers win.
Training / fine-tuning: Strix Halo can run LoRA fine-tuning of small models, but full-precision fine-tuning of 70B-class models is not its workload. Use cloud or a discrete RTX 5090 / A6000 build.
Maximum-fidelity gaming: For 4K + ray tracing at high settings, a discrete RTX 4070 Ti / 5070 still wins. The 8060S is "great for an iGPU" not "competitive with a $700 discrete GPU."
PCIe expansion: Most AI Max+ 395 mini PCs ship with limited or no PCIe expansion. If you want to add an external GPU later, MINISFORUM MS-S1 MAX (with OCuLink) is your only mainstream option.

Buying recommendations

For local-LLM-first buyers ($2,200-$3,400 budget): Beelink GTR9 Pro at $3,399, or GMKtec EVO-X2 at $2,229 if you want to save $1,000 and don't mind a slightly more aggressive fan curve. Both ship the full 128GB / AI Max+ 395 / 2TB SSD configuration.

For workstation-style buyers who want serviceability + OCuLink: MINISFORUM MS-S1 MAX at $3,039.

For budget AI Max+ 395 buyers: NIMO Mini PC at $2,500 (1TB SSD) or $3,300 (2TB SSD). Same APU, simpler chassis.

For mainstream-brand support but smaller-model use: Corsair AI Workstation 300 at $1,699, but understand you're getting the AI Max 385 (8 cores, 64GB max) — not the 16-core 128GB AI Max+ 395. Don't buy this if your workload involves 70B+ models.

For laptop users: Per Ultrabook Review's Strix Halo laptop list, the ASUS ROG Flow Z13 (2025) is the only mainstream consumer laptop shipping AI Max+ 395 today. Expect more options through 2026.

FAQ

How much VRAM does the Ryzen AI Max+ 395 actually have?

The AI Max+ 395 has no dedicated VRAM. Instead, it shares the system's LPDDR5X-8000 memory pool. On a 128GB box, AMD's Variable Graphics Memory (VGM) feature lets you allocate up to 96GB as GPU memory through the BIOS — and the Strix Halo setup guide on GitHub reports up to 112GB allocatable in some configurations. Memory bandwidth is ~215 GB/s, which is the rate-limiting factor for single-stream inference throughput.

How does the AI Max+ 395 compare to an RTX 5080 for AI workloads?

For models that fit in the RTX 5080's 16GB VRAM, the discrete GPU is roughly 2× faster on batch=1 inference because of GDDR7's much higher memory bandwidth (~960 GB/s vs ~215 GB/s). For models that exceed 16GB — DeepSeek R1, Llama-3.1-70B, Mixtral 8×22B, Qwen 3 72B — the AI Max+ 395 is roughly 3× faster because the model fits entirely in its 96GB VRAM allocation while the RTX 5080 has to swap weights from system RAM. AMD's official DeepSeek R1 benchmark documents this 3× advantage.

Can I run Llama-3.1-70B locally on the Ryzen AI Max+ 395?

Yes. The full 70B model at Q4_K_M quantization fits in roughly 40GB of VRAM, leaving 56GB of the 96GB allocation free for KV-cache at long context (32K-128K tokens). Community benchmarks measure 18-30 tok/s on single-stream inference — slower than a multi-GPU discrete build but acceptable for one or two users at a time. For higher throughput, use a Mixture-of-Experts model like DeepSeek-V3 or Mixtral 8×22B which run at 50-90 tok/s on the same hardware.

Which mini PC is the best buy?

For most buyers, the Beelink GTR9 Pro at ~$3,399 is the best balance: well-tuned out of the box, large community support, all the modern ports (USB4, Wi-Fi 7, 2.5GbE), and Beelink's documented LLM benchmarks. If you want to save $1,000, the GMKtec EVO-X2 at $2,229 is the same APU and 128GB RAM in a slightly louder chassis. For workstation-class buyers needing OCuLink for future GPU expansion, the MINISFORUM MS-S1 MAX at $3,039 is the pick.

Is ROCm reliable enough on Strix Halo to depend on for production?

Per Phoronix's Linux review and the Framework community's benchmark thread, ROCm 6.4+ has first-class Strix Halo support and ROCm 7 brings further improvements. The major LLM-inference paths — llama.cpp, vLLM, sglang, PyTorch — all run on Strix Halo today with no-porting-work installs. Bleeding-edge research code can still be CUDA-only, but for shipping inference workloads the gap to NVIDIA's mature stack is now days of configuration rather than weeks of porting.

Will the iGPU performance be enough for gaming on the side?

Yes for 1080p high and 1440p medium settings. Per Tom's Hardware, the Radeon 8060S iGPU lands between an RTX 4060 and RTX 4070 in current games — Battlefield 6 hits 86 FPS at 1080p high, Counter-Strike 2 runs over 250 FPS. For 4K + ray tracing you'd still want a discrete GPU, but for most current-gen gaming at 1080p-1440p the AI Max+ 395 is genuinely capable.

What's the difference between Ryzen AI Max and AI Max+?

Within the Strix Halo family, the + suffix denotes the higher-tier SKU with all 16 cores and the full 40-CU iGPU enabled. Non-Plus AI Max parts (e.g. AI Max 385, AI Max 390) ship with reduced core counts (8-12C) and/or fewer iGPU CUs (32 instead of 40). For buyers focused on local-LLM workloads, the AI Max+ 395 is the only SKU that supports the full 128GB configuration with 96GB allocatable as VRAM.

Bottom line

The Ryzen AI Max+ 395 collapses the on-prem AI workstation conversation in a way no NVIDIA part can match at this price point: 96GB of GPU-accessible memory in a 7-liter, 130W mini PC for $2,200-$3,400. For the buyer profile of "I need to run a 70B-class LLM locally without spending $5,000+ on a multi-GPU build," there is currently no better option. Discrete RTX 4090 / 5090 builds remain faster on small models and produce better gaming performance, and datacenter PCIe accelerators like the AMD Instinct MI350P or NVIDIA H200 NVL still own the high end at 144-Plus-GB / serious throughput. But the Strix Halo line plants AMD's flag firmly in the "credible local-AI workstation under $3,500" category — a category that essentially didn't exist 18 months ago.

For most readers debating "should I build a multi-GPU AI rig or buy a Mac Studio M3 Ultra," there's now a third answer that's cheaper than both and runs models neither can match in the same chassis size.

Citations and sources

This piece is editorial synthesis based on AMD's published benchmarks, vendor product pages, and independent community measurements (Phoronix, Tom's Hardware, Level1Techs forum, Framework community). No first-party benchmarking is reported. Performance figures are vendor-published or community-measured numbers as cited.

Frequently asked questions

How much VRAM does the Ryzen AI Max+ 395 actually have?

The AI Max+ 395 has no dedicated VRAM. Instead, it shares the system's LPDDR5X-8000 memory pool. On a 128GB box, AMD's Variable Graphics Memory feature lets you allocate up to 96GB as GPU memory through the BIOS, and the GitHub Strix Halo setup guide reports up to 112GB allocatable in some configurations. Memory bandwidth is ~215 GB/s, which is the rate-limiting factor for single-stream inference throughput.

How does the AI Max+ 395 compare to an RTX 5080 for AI workloads?

For models that fit in the RTX 5080's 16GB VRAM, the discrete GPU is roughly 2x faster on batch=1 inference because of GDDR7's much higher memory bandwidth (~960 GB/s vs ~215 GB/s). For models that exceed 16GB — DeepSeek R1, Llama-3.1-70B, Mixtral 8x22B, Qwen 3 72B — the AI Max+ 395 is roughly 3x faster because the model fits entirely in its 96GB VRAM allocation while the RTX 5080 has to swap weights from system RAM. AMD's official DeepSeek R1 benchmark documents this 3x advantage.

Can I run Llama-3.1-70B locally on the Ryzen AI Max+ 395?

Yes. The full 70B model at Q4_K_M quantization fits in roughly 40GB of VRAM, leaving 56GB of the 96GB allocation free for KV-cache at long context. Community benchmarks measure 18-30 tok/s on single-stream inference — slower than a multi-GPU discrete build but acceptable for one or two users at a time. For higher throughput, use a Mixture-of-Experts model like DeepSeek-V3 or Mixtral 8x22B which run at 50-90 tok/s on the same hardware.

Which Ryzen AI Max+ 395 mini PC should I buy?

For most buyers, the Beelink GTR9 Pro at ~$3,399 is the best balance: well-tuned out of the box, large community support, all the modern ports (USB4, Wi-Fi 7, 2.5GbE), and documented LLM benchmarks. If you want to save $1,000, the GMKtec EVO-X2 at $2,229 is the same APU and 128GB RAM in a slightly louder chassis. For workstation-class buyers needing OCuLink for future GPU expansion, the MINISFORUM MS-S1 MAX at $3,039 is the pick.

Is ROCm reliable enough on Strix Halo to depend on for production?

Will the iGPU be capable enough for gaming on the side?

Yes for 1080p high and 1440p medium settings. Per Tom's Hardware benchmarks, the Radeon 8060S iGPU lands between an RTX 4060 and RTX 4070 in current games — Battlefield 6 hits 86 FPS at 1080p high, Counter-Strike 2 runs over 250 FPS. For 4K + ray tracing you'd still want a discrete GPU, but for most current-gen gaming at 1080p-1440p the AI Max+ 395 is genuinely capable as a secondary use case.

What's the difference between Ryzen AI Max and AI Max+?

Within the Strix Halo family, the + suffix denotes the higher-tier SKU with all 16 cores and the full 40-CU iGPU enabled. Non-Plus AI Max parts (AI Max 385, AI Max 390) ship with reduced core counts (8-12C) and/or fewer iGPU CUs (32 instead of 40). For buyers focused on local-LLM workloads, the AI Max+ 395 is the only SKU that supports the full 128GB configuration with 96GB allocatable as VRAM, so it's the right pick for 70B+ model use cases.

AMD Ryzen AI Max+ 395: The 128GB Mini PC Running 70B LLMs Locally

Why this matters: a single SKU that answers three workloads

SKU lineup — what's actually shipping

Hardware deep-dive: what makes Strix Halo different

Unified memory is the headline

Ryzen 8060S iGPU — RDNA 3.5 with 40 CUs

XDNA 2 NPU at 50 TOPS

LLM performance — what the numbers actually look like

AMD's published claims

Independent benchmarks

Mini PC head-to-head — what to actually buy

1. Beelink GTR9 Pro — most reviews, mid-range price

2. GMKtec EVO-X2 — most variants, aggressive pricing

3. MINISFORUM MS-S1 MAX — workstation chassis

4. NIMO Mini PC — budget AI Max+ 395

5. Corsair AI Workstation 300 — branded entry, AI Max 385 (not 395)

Quick verdict matrix

Strix Halo vs the discrete-GPU alternatives

Scenario A: "I need a 70B-class model in VRAM at the lowest cost"

Scenario B: "I need maximum tok/s on a single 8-32B model"

Software stack — ROCm, Vulkan, LM Studio, llama.cpp

Power, cooling, and noise — the unsung headline

What it's NOT good for

Buying recommendations

FAQ

How much VRAM does the Ryzen AI Max+ 395 actually have?

How does the AI Max+ 395 compare to an RTX 5080 for AI workloads?

Can I run Llama-3.1-70B locally on the Ryzen AI Max+ 395?

Which mini PC is the best buy?

Is ROCm reliable enough on Strix Halo to depend on for production?

Will the iGPU performance be enough for gaming on the side?

What's the difference between Ryzen AI Max and AI Max+?

Bottom line

Citations and sources

Frequently asked questions

Sources