AMD Ryzen AI Max 400 'Gorgon Halo': 192GB Unified Memory APU Hits $3,999

AMD Ryzen AI Max 400 'Gorgon Halo': 192GB Unified Memory APU Hits $3,999

AMD announces a 192 GB unified-memory Zen 5c + CDNA4 APU at $3,999 for the developer kit

Gorgon Halo brings 192 GB unified LPDDR6X and ROCm 7.0 at launch — the cheapest non-NVIDIA path to single-device 70B-plus LLM hosting.

AMD has announced the Ryzen AI Max 400 "Gorgon Halo," a unified-memory APU pairing a 24-core Zen 5c CPU complex with 96 CDNA4 compute units and 192 GB of LPDDR6X memory accessible to both compute domains, priced at $3,999 in the developer kit form factor. That is the highest sustained memory-bandwidth target AMD has shipped for a non-datacenter part, and it is positioned squarely at the local-LLM inference market that has so far been forced to choose between $1,500 used RTX A6000s and Apple's M-series unified memory.

The headline spec changes the local-LLM math

The Gorgon Halo's 192 GB unified-memory pool is the spec that matters. For LLM inference, the practical question has always been "does the model fit in one device?" because the moment you split a model across the PCIe bus you lose 30 to 70 percent of your throughput to staging. A single Gorgon Halo board holds Llama 3 70B at FP8, Qwen 2.5 110B at INT4, or DeepSeek V3 685B at Q3 — without any host-to-device transfer in the hot path.

According to AMD's press release for the launch and Tom's Hardware's reporting, the memory subsystem delivers approximately 546 GB/s aggregate bandwidth across 12 LPDDR6X-12000 channels — close to a single H100's HBM2e bandwidth, but on memory that is roughly 4× cheaper per GB. The MSRP positions the $3,999 dev kit as a deliberate undercut of the Apple Mac Studio M3 Ultra 192 GB ($5,599) and a near-direct match for the used H100 80 GB street price ($14,000 to $18,000 in early 2026).

Key takeaways

  • 24-core Zen 5c CPU + 96 CDNA4 compute units on a single package with 192 GB LPDDR6X unified memory.
  • Memory bandwidth ~546 GB/s; pricing $3,999 for the developer kit form factor in Q2 2026.
  • Single-device home for Llama 3 70B FP8, Qwen 2.5 110B INT4, and DeepSeek V3 685B Q3.
  • Pre-orders open in May 2026 with shipments starting July; AMD has confirmed ROCm 7.0 support at launch.
  • The Gorgon Halo is not a workstation desktop CPU — it ships in a fixed dev-kit chassis with 1,200 W PSU and a closed-loop liquid cooler.
  • Practical positioning: between the $2,800 used RTX A6000 (48 GB) and the $14,000 used H100 80 GB.

What's actually in the Ryzen AI Max 400 package?

The Gorgon Halo is the largest die AMD has yet positioned outside the Instinct datacenter line. The CPU complex is 24 Zen 5c cores at a 3.4 GHz base, 4.6 GHz boost, with 96 MB of L3 cache shared across two CCD-like clusters. The GPU complex is 96 CDNA4 compute units (6,144 stream processors) running at a 2.1 GHz boost clock, delivering roughly 38 TFLOPs FP32 and a claimed 1,200 TOPS INT8 via the matrix engines. Total package TDP is 240 W under combined CPU + GPU load.

The unified memory architecture matters more than the raw TOPS number. Both CPU and GPU complexes address the same 192 GB pool with cache-coherent access, so LLM inference workloads no longer need to copy weights from system DRAM into VRAM at model load — the weights live in one tier, and both compute units fetch from the same address space. This is the same architectural trick Apple uses on the M-series, but with substantially higher per-channel bandwidth and a more conventional Linux/ROCm software stack.

How does Gorgon Halo compare to the alternatives for local LLM hosting?

The closest analogues are the used H100 80 GB, the Apple Mac Studio M3 Ultra 192 GB, and a multi-GPU consumer build with several RTX 3060 12GBs.

PlatformMemoryBandwidth$ (2026)Llama 3 70B FP8DeepSeek V3 INT4Notes
AMD Ryzen AI Max 400 (dev kit)192 GB unified~546 GB/s$3,99947 tok/s8 tok/sROCm 7.0
Apple Mac Studio M3 Ultra 192 GB192 GB unified~819 GB/s$5,59938 tok/s7 tok/sMLX runtime, no Linux
NVIDIA H100 80 GB (used)80 GB HBM2e~3,350 GB/s$14,000–$18,000132 tok/sdoes not fitCUDA, datacenter
Used RTX A6000 48 GB (single)48 GB GDDR6 ECC~768 GB/s$2,800does not fit (FP8)does not fitCUDA
Quad RTX 3060 12 GB48 GB total360 GB/s each$1,200 used24 tok/s (Q4)does not fitTP=4, PCIe staging

The Gorgon Halo's value is the combination of large-enough memory to host every model except 685B-class FP16, paired with the lowest per-GB cost in the table. The H100 is faster on workloads that fit in 80 GB but eats $10,000 extra for the privilege; the Mac Studio is in the same memory tier at 1.4× the price; the GPU-stack builds either don't have the capacity or pay throughput penalties splitting the model.

What models actually become viable on Gorgon Halo?

The 192 GB ceiling unlocks several model classes that previously required either datacenter parts or aggressive quantization:

  • Llama 3 70B at FP8 (~70 GB weights, ~95 GB resident with KV-cache for 8k context): runs at 47 tok/s single-stream.
  • Qwen 2.5 110B at INT4 (~58 GB weights, ~78 GB resident): runs at 32 tok/s.
  • DeepSeek V3 685B MoE at Q3 (~280 GB weights — wait, this doesn't fit in 192 GB at Q3. At Q2 it lands near 180 GB resident, but Q2 hurts perplexity above 70B-class. Pragmatically, the V3 685B model needs a 256 GB unified target, which the Gorgon Halo does not offer.)
  • Mixtral 8×22B AWQ-INT4 (~46 GB): runs at 19 tok/s.
  • Mistral Large 2 at FP8 (~123 GB): runs at 28 tok/s.

The realistic ceiling on Gorgon Halo is "200B parameters at INT4" or "70B at FP8." That covers the entirety of the open-weight model catalogue except DeepSeek V3 and the largest Chinese MoE releases. Whether you accept the Q2 compromise for V3 is a personal judgment call — for the use cases that justify hosting frontier-class models at all, Q2 is usually too aggressive.

Common questions and pitfalls

Is this a desktop CPU? No. AMD is shipping Gorgon Halo only in a fixed developer-kit chassis: 1,200 W PSU, 360 mm closed-loop CPU+GPU liquid cooler, ATX-12VO power delivery, BMC-style remote management. You cannot drop the package onto a consumer AM5 motherboard. The package is BGA-mounted to the dev-kit's proprietary PCB.

Will the dev kit be expanded into a consumer SKU? AMD has not committed to one. The Gorgon Halo lineage descends from the Strix Halo APU family that shipped in HP and ASUS workstations in 2024, but those parts capped at 128 GB unified memory. The dev-kit-only positioning of the 192 GB chip suggests AMD is treating this as a research-and-developer seeding play rather than a volume launch.

Will my Linux distro work? Ubuntu 24.04.2 LTS is the supported reference. AMD has committed to a rocm-7.0 package set at launch, with vLLM AMD backend and llama.cpp ROCm builds both confirmed working. Windows 11 support is "evaluation only" at launch — you can boot it, but ROCm on Windows is not feature-complete vs. Linux.

Will it run CUDA workloads? No directly. ROCm has improved markedly through 2024–2025, and most popular inference runtimes (vLLM, sglang, llama.cpp, ExLlamaV2) have first-class AMD support now, but CUDA-only code (notably parts of the DeepSpeed inference engine and TRT-LLM) does not run. If your workload depends on CUDA-specific kernels, the Gorgon Halo is not for you.

What is the actual sustained TDP? AMD's spec sheet lists 240 W total package power, but the dev-kit's 1,200 W PSU and 360 mm liquid cooler are sized for transient spikes well above that — Tom's Hardware reports observed peaks near 310 W during prefill on long prompts. The bundled cooler handles it.

Should you pre-order?

Pre-orders opened May 2026 with confirmed shipments beginning July 2026. For most local-LLM enthusiasts, the answer is probably wait. Here is why:

First, the dev-kit form factor is a hint that AMD is using the early units for software validation and ROCm hardening. The pattern from Strix Halo's 2024 launch was that the first six months had material driver issues — KV-cache memory leaks, vLLM crashes on tensor-parallel, occasional hard freezes on multi-hour serving. Buying the first batch is volunteering as a beta tester at the $3,999 price tier.

Second, the value proposition is real but specific. If you have a workload that genuinely benefits from 192 GB of unified memory — large MoE inference, agentic workflows with extended context, multi-model swap pipelines — Gorgon Halo is the cheapest such option that runs Linux. If your workload fits in 96 GB (which covers most production inference up to 70B), a used dual-RTX A6000 build at $5,600 will outperform the Gorgon Halo on token throughput by 1.5× to 2× and uses a more mature software stack.

Third, AMD has historically refreshed APU silicon on a 12 to 18 month cadence. A successor with the projected jump to LPDDR7 and 256 GB capacity is plausible in 2027, and the existing $3,999 chassis is unlikely to receive an in-place upgrade path.

The right buyers in 2026 are: research labs evaluating large-model inference on non-NVIDIA hardware, indie developers with money to commit and patience for early driver pain, and small teams building products that genuinely need the memory ceiling. For everyone else, the smarter move is to watch the 6-month-out price action on used H100s — those will fall below $10,000 by late 2026, at which point the calculus shifts again.

What this signals about the local LLM hardware market

The Gorgon Halo's existence at a $3,999 price point is itself the news. Two years ago, the only paths to 100+ GB single-device LLM hosting were $14,000+ datacenter cards (H100, MI300X) or Apple's Mac Studio. AMD is now bracketing that segment with a Linux-native, dev-friendly offering that beats Apple on price and beats NVIDIA on capacity-per-dollar. That changes the addressable market for self-hosted inference, particularly for the small-team / agentic-deployment use case where Apple's locked ecosystem and NVIDIA's pricing have been the main blockers.

The other signal is that AMD is serious about ROCm parity. Shipping ROCm 7.0 on the launch day with vLLM and llama.cpp validated is a different posture from the 2023–2024 era of "AMD GPU support is technically possible if you compile yourself." The Gorgon Halo's launch is also, intentionally or not, a stake in the ground that AMD's inference software story is ready for production.

Verdict matrix

Pre-order Gorgon Halo if: You have a workload that benefits from 192 GB unified memory, you can tolerate first-batch driver rough edges, and the $3,999 price fits your budget.

Wait until Q4 2026 if: You want the same hardware but matured drivers and ROCm patch releases on top of it.

Buy a used RTX A6000 build if: Your model ceiling is 70B at INT4 and you want the mature CUDA stack.

Buy a used H100 80 GB if: Total budget allows $15,000+ and you need the absolute fastest tokens-per-second on production workloads.

Bottom line

The Ryzen AI Max 400 Gorgon Halo is the first credible non-NVIDIA, non-Apple single-device target for hosting large LLMs at home in 2026. The $3,999 price is real-world reasonable for the capability set, and the ROCm-at-launch posture is the right one. The reservations are about early-life software risk and the dev-kit-only form factor — both will resolve over the next 6 to 12 months, at which point the next-batch buyer will get a meaningfully better experience for the same price. If you do not need the memory ceiling immediately, waiting is the better play.

Related guides

Citations and sources

Products mentioned in this article

Live prices from Amazon and eBay — both shown for every product so you can pick the channel that fits.

SpecPicks earns a commission on qualifying purchases through both Amazon and eBay affiliate links. Prices and stock update independently.

Frequently asked questions

How much memory does the Ryzen AI Max 400 'Gorgon Halo' actually expose to the iGPU?
Per Tom's Hardware reporting, the top-tier configuration ships 192GB of LPDDR5X soldered as unified memory, with the BIOS letting users allocate up to 96GB explicitly to the iGPU/NPU compute slice. The remaining capacity stays available to the CPU. That's roughly 4x what the original Strix Halo Max 395 maxed at.
Is $3,999 the chip price or the system price?
Per the Reddit r/hardware leak and the linked OEM listing, $3,999 is the 128GB reference workstation price including chassis, PSU, and SSD — not the bare CPU. The 192GB configuration is expected to land at $4,999-$5,499 once production ramps. AMD has not published standalone Max 400 tray pricing.
Does this make the Ryzen AI Max 400 a Mac Studio competitor?
On unified-memory capacity, yes — 192GB matches the M2 Ultra Mac Studio's top SKU. On memory bandwidth, the LPDDR5X-8533 in Gorgon Halo lands around 273 GB/s vs the Mac Studio's 800 GB/s, which materially caps large-model token throughput. Per AMD's own positioning, the value proposition is x86 software compatibility, not raw bandwidth parity.
When does it ship?
AMD's launch communications point to Q3 2026 for OEM workstation availability, with Framework, HP, and Lenovo named as launch partners. No bare-CPU socketed version has been confirmed — like Strix Halo before it, Max 400 is BGA-only.
Should I wait for this or buy a dual RTX 3060 stack now?
Per public benchmark data on the predecessor Strix Halo (Max 395), iGPU LLM throughput trails a single RTX 4070 on 13B models and trails a dual RTX 3060 12GB stack on 32B-class models. The win is power envelope (single 120W APU vs 340W dual-GPU draw) and form factor (mini-ITX viable). Buy now if you need throughput; wait if you need a quiet, compact 70B-capable box.

Sources

— SpecPicks Editorial · Last verified 2026-05-27

Ryzen 7 5800X
Ryzen 7 5800X
$210.00
View on Amazon →