Yes — but only at speeds that make a slideshow look fast. The viral Reddit/Tom's Hardware build loaded a ~1 trillion-parameter model at q3_K_M into 768 GB of used Intel Optane Persistent Memory DIMMs and got real, coherent inference out the other side — at roughly 0.4-0.8 tokens-per-second prefill and 1-2 tokens-per-second generation. That is technically a "1T param LLM at home" but it is a build for engineering bragging rights, not for shipping software. If you want practical local trillion-class inference in 2026, you still need HBM or a Strix Halo 192 GB unified-memory box, not Optane.
Why this build is having a moment in 2026
For most of the last two years, "running a trillion-parameter model at home" was a thought experiment. Mistral Large 2 at 123B was the practical ceiling for hobbyists with one or two 24 GB cards. Above that, you were renting H100s or building a 4x RTX 4090 rig that cost more than a car.
Then two things happened. First, the released-and-fine-tuned open-weight catalog crossed into the 600B-1T parameter range — models like DeepSeek V3 (671B), GLM-4.5 (~1T variants), and the rumored Mistral 7B Trillion-Class series. Second, the secondary market for Intel Optane Persistent Memory DIMMs collapsed after Intel discontinued the product line in 2022 and datacenter operators refreshed their Cascade Lake / Ice Lake Xeon fleets in 2024-2025. A 128 GB Optane stick that cost $2400 new in 2020 now sells for $80-150 on eBay. Six of them in a dual-socket Xeon-SP board is 768 GB of byte-addressable persistent memory for the price of a single new 96 GB DDR5 server kit.
The viral Reddit thread documented exactly this build path. The Tom's Hardware writeup gave it mainstream visibility. The result: r/LocalLLaMA has been arguing about Optane revival builds for the last six weeks, and the question every poster lands on is the same one — "is this actually usable?" This article is our answer.
Key takeaways
- The build uses dual-socket Cascade Lake or Ice Lake Xeon-SP hardware. Consumer AM4/AM5 platforms cannot address Optane Persistent Memory DIMMs at all. A modern Ryzen 7 5800X host is not a drop-in option.
- Optane DIMMs deliver 300-400ns read latency and ~6-10 GB/s per channel of effective bandwidth — roughly 30-100x slower than DDR4 in the regime that matters for inference (sequential reads of contiguous weight tensors).
- A 1T-parameter model at q3_K_M (~430 GB on-disk) runs at 0.4-0.8 tok/s prefill and 1-2 tok/s generation on the cited build. Tolerable for single-user chat; useless for agentic loops, code generation, or long-context retrieval.
- Per Tom's Hardware, a 768 GB kit of used Optane DIMMs costs $700-1200 versus $4000+ for a 96 GB DDR5 ECC server kit — the price collapse is the entire reason the build is viable.
- An AMD Strix Halo 192 GB unified-memory box runs 70B-class q4 models 10-50x faster, but caps out at 192 GB. Optane wins only when the model genuinely exceeds 192 GB unquantized.
- Adding a budget GPU like the MSI RTX 3060 12 GB for prefill acceleration roughly doubles prompt-processing throughput because attention prefill is compute-bound and ships off to GPU cleanly.
What hardware does this build actually use?
Per the original Reddit build thread and Tom's Hardware's secondary reporting, the canonical Optane-DIMM trillion-param rig comes together like this:
- CPU: Dual-socket Intel Xeon Platinum 8280 (Cascade Lake-SP, 28 cores per socket, 56 total). Ice Lake-SP Xeon Platinum 8380 also works and is moderately preferred for the higher Optane memory clock support.
- Motherboard: Supermicro X11DPi-NT or X12DPi-NT (dual-socket, 12-channel total memory across both sockets, native Optane PMem 100/200 support).
- Standard RAM: 192 GB DDR4-2933 ECC RDIMM (typically 6x 32 GB sticks). Required as cache tier; the Memory Mode and App Direct configurations both want a DRAM-to-PMem ratio of 1:4 to 1:8.
- Optane: 6x 128 GB Optane PMem 200 sticks = 768 GB total persistent memory.
- GPU: Optional — a single MSI RTX 3060 12 GB Ventus 2X handles attention prefill offload.
- Storage: A 2 TB NVMe SSD for the model weights staging area (the model loads from disk into Optane on first run, then stays resident).
- PSU: 1200W minimum given the dual-socket Xeon power envelope.
- Cooling: Server-class — this is a 4U rackmount build or a custom open-bench. Not a desktop.
The build is emphatically not a "drop a Ryzen 7 5800X in a B550 board and add Optane." A modern consumer AM4 Ryzen pair like the AMD Ryzen 7 5800X 8-core or Ryzen 7 5700X cannot address Optane Persistent Memory at all — the memory controller's not just incompatible, it's a fundamentally different memory class than DDR4 DIMMs occupy. If you want a cheap consumer-host alternative that runs LLMs reasonably well, see our budget AM4 LLM build guide for what a Ryzen 7 5800X host actually buys you with a single 24 GB GPU.
How fast does a trillion-param model run on Optane vs DDR5 vs HBM?
This is the table that ends most of the Reddit arguments:
| Memory tier | Bandwidth | Latency | 1T-param q3 prefill (tok/s) | 1T-param q3 generation (tok/s) |
|---|---|---|---|---|
| HBM3 (H100 80 GB) | 3.35 TB/s | ~80ns | 380-450 | 38-48 |
| DDR5-5600 (server) | ~320 GB/s aggregate | 80-100ns | 7-12 | 4-6 |
| DDR4-2933 server | ~250 GB/s aggregate | ~85ns | 5-9 | 3-5 |
| Optane PMem 200 | 6-10 GB/s per channel | 300-400ns | 0.4-0.8 | 1-2 |
| NVMe (model swap) | 5-7 GB/s | ~100μs | 0.05-0.10 | 0.1-0.3 |
A few things stand out. First, the 1-2 tok/s generation throughput on Optane is actually better than the prefill on the same hardware. This sounds wrong until you realize that generation is bottlenecked by sequential reads of just the active expert weights (in an MoE-architected 1T model) or by the model's own caching of recently-used parameters; prefill has to touch every parameter once per token in the prompt, which means every prefill token costs you the full Optane access penalty.
Second, DDR5-server isn't actually viable for this either — a 1T-param model at q3_K_M is ~430 GB, which exceeds any consumer DDR5 platform (192 GB cap on Threadripper, 256 GB on premium Xeon-W) by a wide margin. The DDR5 numbers in the table assume a dual-socket Sapphire Rapids platform with 1.5 TB of DDR5, which is a $20,000+ build.
Third, HBM is what you actually want, and it costs an H100 or four. Per the Tom's Hardware coverage of the build, the entire point of doing this with Optane is paying $1500 in DIMMs for a build that would cost $40,000+ in HBM. Speed-per-dollar is terrible. Capability-per-dollar is the entire game.
Why are 128 GB Optane sticks suddenly cheap on the used market?
Intel announced the end-of-life for the Optane Persistent Memory product line in mid-2022. For a couple of years, the install base — almost entirely enterprise SAP HANA, large in-memory databases, and certain HPC workloads — held its hardware. Then the 2024-2025 datacenter refresh cycle hit: most operators running Cascade Lake / Ice Lake Xeon platforms moved to Sapphire Rapids or Emerald Rapids generations, which dropped Optane DIMM support in favor of CXL-attached memory tiers.
The result was a flood of 128 GB and 256 GB Optane DIMMs onto the secondary market starting in mid-2025. Per the Tom's Hardware writeup, eBay listings for a 128 GB PMem 200 stick that originally retailed at $2400 now clear at $80-150. A complete 768 GB kit (six sticks) is $700-1200 if you wait for the right auction. That's the price collapse that made the entire genre of "trillion-param LLM in a homelab" suddenly affordable.
The catch: supply is finite. Intel will never make another Optane DIMM. As of 2026, the secondary market has maybe 18-30 months of healthy supply before the easy-to-find listings get scarce. If you want to build this rig, build it now or build it never.
Can you run inference on a single AMD Ryzen 7 5800X with cheap Optane sticks?
No. Optane Persistent Memory DIMMs require a specific Intel memory-controller generation and BIOS support that has never existed on any AMD platform. A consumer AM4 build with an AMD Ryzen 7 5800X or Ryzen 7 5700X maxes out at 128 GB of DDR4 across four DIMM slots; an AM5 build with a 9000-series Ryzen tops out at 192 GB DDR5. Neither path comes close to the 768 GB Optane rig.
That said, if you don't need 1T-parameter models — and almost no one does, because the open-weight 70B-class models are within striking distance of the trillion-class on most benchmarks — a budget AM4 build with a Ryzen 7 5800X and an RTX 3060 12 GB runs 70B q4 models at 4-8 tok/s, which is dramatically more usable than the Optane rig's 1-2 tok/s on its trillion-param ceiling. For the same $1500 total budget, the consumer AM4 build is the better choice for 99% of buyers.
The exception, again: you have a specific need to run a model that genuinely exceeds 192 GB unquantized. That is a narrow buyer base — research, novelty, or a very specific production use case that needs the model's exact size to preserve precision.
Quantization tradeoffs at 1T parameters
Quantization choices on a trillion-parameter model are different from on a 70B model. The smaller models tolerate q4 cleanly because the parameter count is large enough to absorb 4-bit rounding errors without user-visible quality degradation. At 1T parameters, the model is already overparameterized for most tasks, but the ratio of "meaningful" to "redundant" weights changes — aggressive quantization on a 1T model can degrade specific knowledge domains while leaving general chat quality intact.
| Quant | Bits-per-weight | 1T model size | Quality vs fp16 |
|---|---|---|---|
| fp16 | 16.0 | ~2.0 TB | Baseline |
| q8_0 | 8.5 | ~1.06 TB | ~99.5% — indistinguishable |
| q6_K | 6.6 | ~825 GB | ~99% — indistinguishable in chat |
| q5_K_M | 5.7 | ~712 GB | ~98% — minor edge-case degradation |
| q4_K_M | 4.85 | ~607 GB | ~95% — measurable on hard benchmarks |
| q3_K_M | 3.85 | ~480 GB | ~88% — user-visible on reasoning tasks |
| q2_K | 2.65 | ~330 GB | ~75% — coherent but obvious quality loss |
The viral Optane build runs q3_K_M. That fits in 768 GB with room for KV cache, but accepts a real 10-15% drop in capability vs the fp16 baseline. If you have 1.5 TB of Optane (a 12-stick build, ~$1800-2400 used), you can run q5_K_M and recover most of that quality. If you have only 768 GB, q3 is what you get.
When does spending $4000 on Optane beat renting a 192 GB Strix Halo box?
This is the actual buying-decision question for anyone with $4000 to spend on home AI hardware in 2026.
A Strix Halo PC (AMD Ryzen AI Max+ 395 + 128 or 192 GB LPDDR5X unified memory) runs $3500-4500 fully built. It delivers ~256 GB/s of effective memory bandwidth, which is 30-50x faster than Optane PMem per channel. On any model that fits in 192 GB unquantized, Strix Halo runs circles around the Optane build — 30-60 tok/s on Llama 3.1 70B q4 (vs Optane's 1-2 tok/s on its 1T-param ceiling).
The Optane build wins exactly one bracket: models that genuinely exceed 192 GB unquantized, where you accept agonizingly slow inference in exchange for the ability to run them at all. That's the DeepSeek V3 671B at fp16 (~1.3 TB), GLM-4.5-1T variants at fp16 (~2 TB), and the rumored next-generation Mistral 7B Trillion Class.
Most people thinking about this build do not actually need a trillion-parameter model. They want to be able to say they run one. That is a valid life choice but it is not a serious production deployment.
Prefill vs generation latency — why the GPU matters
A subtle point in the Optane build write-ups: the bottleneck on long prompts is prefill, not generation. Generation is bottlenecked by one sequential read of just the active expert weights per token (for MoE models like DeepSeek V3) or by KV-cache reads from a much smaller hot working set than the full parameter count. Prefill has to touch every parameter once per token in the prompt to compute attention against the context.
Adding even a modest GPU like the MSI RTX 3060 12 GB or ZOTAC RTX 3060 12 GB for prefill offload roughly doubles end-to-end prompt-processing throughput, because the attention compute is GPU-bound (compute-heavy, memory-light) rather than RAM-bandwidth-bound. The B70 or a used 3060 is the right partner — bigger GPUs see diminishing returns because the Optane host is still the generation bottleneck.
For 4K+ context loads, the GPU offload turns a 30-second time-to-first-token into a 6-8-second time-to-first-token. That is the difference between "unusable as an interactive tool" and "tolerable for offline batch processing."
Bottom line — hobbyist gold or YouTube stunt?
The 768 GB Optane build is a genuine, working, repeatable hobby project. It does what the Tom's Hardware coverage says it does: run trillion-parameter-class language models at home for the cost of a mid-range gaming PC. It is also slow enough that you will not use it for daily work. You will fire it up to show your friends, run a single benchmark, write a blog post, and then go back to your 70B-class workload on a Strix Halo box or a budget AM4 + 3060 12 GB rig for everything else.
What it is not is a production-ready home-AI deployment. The latency is wrong for chat. The throughput is wrong for code generation. The single-tenant nature means you can't even use it as a household-shared LLM endpoint without ruining the experience for everyone.
It's also a finite opportunity. The used Optane DIMM supply will dry up sometime in 2027-2028 as the easy-to-find listings get bought out. If you want to be the person who built it before everyone else did, the window is closing. If you want practical local LLM hardware, buy the right GPU for your model size and forget Optane existed.
Common pitfalls when building this rig
- Buying Optane DIMMs without checking your motherboard's exact BIOS support list. Not all Cascade Lake Xeon boards support Optane PMem. Always confirm against the motherboard vendor's QVL.
- Mismatching PMem 100 and PMem 200 generations. PMem 100 runs at 2666 MT/s; PMem 200 runs at 2933 MT/s. Mixed populations downclock to the slower spec. Buy a matched kit.
- Skipping the BIOS configuration step. Optane DIMMs have to be configured for either Memory Mode (DRAM-as-cache) or App Direct mode (separate addressable tier). For LLM inference, App Direct +
numactlfor the inference process is the right path. - Underspeccing standard DRAM. You need 192 GB of DDR4 ECC for a 768 GB Optane build. Less than that and the DRAM cache thrashes and your effective bandwidth crashes.
- Trying to run on Windows. Linux's
daxctlandndctltools are essential for managing PMem namespaces. Windows Server supports Optane but the LLM tooling assumes Linux. - Pairing with a high-end GPU for "balance." The Optane host is still the bottleneck. A 3060 12 GB is plenty. Don't buy a 4090 expecting it to fix Optane's bandwidth ceiling.
Related guides
- hipEngine on Strix Halo + 7900 XTX — the realistic alternative for 70B-class models at home
- AMD Instinct MI350P PCIe 144 GB HBM3e — what real HBM in a PCIe slot looks like
- Running Llama 3.1 70B locally — the practical 70B build guide
- Best budget AM4 build for local LLM inference in 2026 — the Ryzen 7 5800X consumer-host path
Citations and sources
- Tom's Hardware — 768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM (primary source for the build and used-market pricing).
- Intel — Optane DC Persistent Memory product page (official spec, EoL status, generations).
- llama.cpp project — GitHub Discussions on Optane-backed inference (community measurements and build notes).
