For a 2026 local-LLM build on a budget, the AMD Ryzen 7 5800X at ~$180 used is the right CPU. You get 8 cores / 16 threads, full AVX-2 support, fast single-thread, and enough PCIe lanes to feed a discrete GPU without bottlenecking prefill. The Ryzen 7 5700X is the value alternative; the Ryzen 5 5600G is the avoid — its weaker memory controller and lower clocks hurt prefill throughput in ways that matter.
Why CPU still matters for GPU-based LLM inference
Two camps make incorrect predictions about CPU choice for local-LLM builds. The first says "doesn't matter, GPU does the work." The second says "matters enormously, more cores is more better." Both are wrong in opposite directions.
Per consolidated llama.cpp discussion benchmarks, the CPU on a GPU-based LLM build affects:
- Prefill throughput. The CPU runs the tokenization, batching, and any KV-cache management that doesn't happen on the GPU. A slower CPU adds 50–150ms to first-token latency on a 4K prompt.
- CPU offload performance. When you run a model larger than VRAM and offload some layers to system RAM, those layers run on the CPU. Cores, AVX-2 vs AVX-512, and memory bandwidth all matter.
- Concurrent workload responsiveness. Most local-LLM builders also use the box for other work — IDE, browser, Docker. A stronger CPU keeps that responsive while inference is running.
- System overhead. PCIe lanes, USB controllers, network throughput.
What CPU doesn't affect: tok/s on a fully-on-GPU dense model. If your 8B model fits comfortably in 12GB VRAM and never offloads, the CPU is essentially idle during decode.
The 5xxx Ryzen lineup in context
Three AM4 Ryzen chips dominate the budget local-LLM build conversation in 2026 because they share a platform, hit different price points, and span enough capability to bracket most use cases.
| Chip | Cores / threads | Boost clock | TDP | Approximate price (mid-2026) |
|---|---|---|---|---|
| Ryzen 7 5800X | 8C / 16T | 4.7 GHz | 105W | $170–$200 used / $230 new |
| Ryzen 7 5700X | 8C / 16T | 4.6 GHz | 65W | $140–$160 used / $200 new |
| Ryzen 5 5600G | 6C / 12T | 4.4 GHz | 65W | $110–$140 |
The 5800X and 5700X are essentially the same chip with different TDPs. The 5600G is a different beast — same Zen 3 cores but only six of them, lower clocks, and a substantially different memory controller setup because it's an APU with integrated graphics.
Key takeaways
- Best overall: Ryzen 7 5800X. 8 cores / 16 threads, fast clocks, $170–$200 used. Best prefill performance in the lineup.
- Best value: Ryzen 7 5700X. Same 8 cores, 65W TDP, ~$30 cheaper. 5% slower on prefill in exchange for half the heat.
- Skip the Ryzen 5 5600G. Six cores is fine; the deeper problem is its weaker memory controller (DDR4-3200 vs 3600+ stable on 5800X/5700X), which costs you 10–15% on CPU-offload workloads.
- Pair with 32GB DDR4-3600 dual-channel. Memory bandwidth is the secondary bottleneck for any CPU-offload work.
- AM4 is end-of-life but still healthy. Long-term BIOS support is good; the platform isn't going to suddenly stop working in 2027.
Spec delta — Ryzen 7 5800X vs 5700X vs 5600G
| Spec | Ryzen 7 5800X | Ryzen 7 5700X | Ryzen 5 5600G |
|---|---|---|---|
| Cores / threads | 8 / 16 | 8 / 16 | 6 / 12 |
| Base clock | 3.8 GHz | 3.4 GHz | 3.9 GHz |
| Boost clock | 4.7 GHz | 4.6 GHz | 4.4 GHz |
| L3 cache | 32MB | 32MB | 16MB |
| TDP | 105W | 65W | 65W |
| Integrated GPU | No | No | Yes (Vega 7) |
| PCIe | 4.0 x24 | 4.0 x24 | 3.0 x24 |
| Memory support | DDR4-3200 official, 3600+ stable | DDR4-3200 official, 3600+ stable | DDR4-3200 (3600 less reliable) |
| AVX-2 / AVX-512 | AVX-2 yes / AVX-512 no | AVX-2 yes / AVX-512 no | AVX-2 yes / AVX-512 no |
| Process | TSMC 7nm | TSMC 7nm | TSMC 7nm |
Two things to flag from this table:
- The 5600G is PCIe 3.0 only. The other two are PCIe 4.0. This affects GPU PCIe bandwidth when prefill chews through long prompts. PCIe 3.0 x16 = ~16 GB/s; PCIe 4.0 x16 = ~32 GB/s. The 3060 12GB doesn't saturate either, but for larger cards (RTX 3090 24GB) the difference becomes visible.
- L3 cache halves on the 5600G. 16MB vs 32MB. Llama.cpp's tokenizer and batch-building benefit from L3; this difference contributes to the 5600G's weaker prefill performance.
Real-world prefill numbers
Per public llama.cpp community benchmarks on the same RTX 3060 12GB + 32GB DDR4-3600 system swapping only the CPU, running Llama-3 8B q4_K_M with 4K prompt prefill:
| CPU | Prefill (prompt-tokens/sec) | First-token latency at 4K |
|---|---|---|
| Ryzen 7 5800X | ~2,950 | ~1.4s |
| Ryzen 7 5700X | ~2,800 | ~1.5s |
| Ryzen 5 5600G | ~2,450 | ~1.7s |
A 300ms first-token latency gap matters for interactive chat — it's the difference between feeling instant and feeling slightly laggy. For batch workloads (RAG over many documents, agent loops), the difference compounds.
CPU offload — when the CPU actually does the work
For models that don't fit fully in 12GB VRAM, llama.cpp's -ngl flag controls how many layers stay on GPU. The remaining layers run on the CPU. This is where the 5800X opens its biggest lead.
Running Qwen 2.5 14B q4_K_M (which doesn't fit in 12GB) on the same test rig with 20 of 32 layers offloaded to CPU:
| CPU | Combined throughput |
|---|---|
| Ryzen 7 5800X | 28–32 tok/s |
| Ryzen 7 5700X | 26–30 tok/s |
| Ryzen 5 5600G | 18–22 tok/s |
The 5600G's gap blows out here because the CPU is now under sustained load with the model weights streaming through L3 cache and main memory. Six cores @ 4.4 GHz with 16MB L3 simply can't keep up with eight cores @ 4.7 GHz with 32MB L3.
Cooling matters more than people expect
The 5800X's 105W TDP is real and Boost behavior assumes good cooling. Per the TechPowerUp Ryzen 7 5800X specifications and community thermal data, the chip will sustain near-boost on a quality air cooler like the Noctua NH-U12S or better. With a stock-class cooler or a low-clearance HSF in a budget chassis, the 5800X thermal-throttles to roughly 5700X performance — at which point you should have bought the 5700X.
The 5700X's 65W TDP runs comfortably on any tower air cooler; even a 120mm dual-fan tower like the Noctua NH-U12S is overkill. If you want a quiet build, the 5700X is the easier sell.
The 5600G ships with a stock Wraith Stealth that's adequate for its 65W TDP but loud at sustained load. A $30 budget cooler is a worthwhile upgrade.
Worked example — building a $700–$900 local-LLM box
Take a representative 2026 budget build:
- CPU: Ryzen 7 5800X used, $180
- Cooler: Noctua NH-U12S, $80 (or budget 120mm tower for $30)
- Motherboard: MSI B550M Pro, $90
- RAM: 32GB DDR4-3600 dual-channel, $60
- GPU: RTX 3060 12GB used, $290
- Storage: 1TB NVMe SSD, $70
- PSU: 650W Gold, $80
- Case: $50
Total: ~$900. Runs Llama-3 8B q5_K_M at 55–62 tok/s with sub-2-second first-token latency on 4K prompts, and handles Qwen 2.5 14B q4_K_M at ~28 tok/s with offload. Same build with a 5700X drops total cost to ~$870 and gives up about 5% prefill performance.
Swapping to the 5600G drops total cost to ~$830, but you give up meaningful prefill performance (10–15%) and you can't easily upgrade to a 5800X3D later without revisiting the BIOS.
What about the 5800X3D?
The Ryzen 7 5800X3D ($240–$280 used in 2026) is the gaming-focused variant with 96MB of 3D V-Cache. For gaming it's clearly better than the 5800X. For LLM inference the extra cache doesn't help much because the models don't fit in cache regardless. It runs at lower boost clocks (4.5 GHz vs 4.7 GHz) which slightly hurts prefill.
The 5800X3D is the right pick if your primary workload is gaming with LLM inference as a secondary use case. The 5800X is the right pick if LLM inference is the primary workload.
Common pitfalls
Three things bite local-LLM builders on the CPU side:
- Single-channel RAM. Using one DIMM instead of two halves your memory bandwidth and destroys CPU-offload performance. Always run dual-channel — two 16GB sticks, not one 32GB stick.
- Bargain-bin motherboard. A $70 A520 board limits you to PCIe 3.0 even with a 5800X. Spend the extra $20 on a B550 board.
- 5600G "for LLM" because it's cheap. It's cheap because it's worse at this workload. The savings disappear the first time you offload a 14B model.
Platform notes — AM4 in late 2026
AM4 is end-of-life as a new platform — AMD's current consumer focus is AM5 (Zen 4, Zen 5) — but the used market keeps AM4 alive and healthy for budget builds. Three things to know:
1. BIOS support is mature. Any modern B550 board ships AGESA firmware that supports the entire 5xxx lineup out of the box. If you buy an older B450 board secondhand, you may need a BIOS update before it'll POST with a Vermeer-die chip — buy from a seller who's already flashed it, or have a Zen 2 (3xxx) chip handy for the flash.
2. PCIe 4.0 requires B550 or X570. B450 caps PCIe at 3.0 for the GPU slot. For an RTX 3060 12GB, that's fine — the card doesn't saturate PCIe 4.0 on inference workloads. For a future RTX 3090 24GB upgrade, PCIe 4.0 starts to matter at large prefill prompts.
3. DDR4 is still cheap. 32GB DDR4-3600 dual-channel kits run $55–$75 in late 2026. The same capacity in DDR5 (for an AM5 build) runs $90–$130. The platform-cost gap is meaningful at the $700–$900 build tier.
The right read on AM4 for 2026: it's not the cutting edge, but it's the right value platform for a local-LLM build under $1,000. Spend the savings on more RAM or a better PSU.
When NOT to pick the 5800X
- You're building a quiet, low-power 24/7 box. Pick the 5700X. Same chip, half the heat.
- Your primary workload is gaming and LLM is occasional. Pick the 5800X3D.
- You're upgrading from a 3600X on a tight budget. The 5700X is a smaller financial jump.
When the 5600G can make sense
- You explicitly don't want a discrete GPU at all. The 5600G's iGPU lets you skip the dGPU. (LLM throughput will be poor — sub-5 tok/s on 7B models — but the build is functional.)
- You're inheriting a 5600G and don't want to spend. Fine, use it. Just don't pick one over the 5700X if both are options.
Verdict matrix
| If you want… | Pick |
|---|---|
| Best prefill, best offload performance | Ryzen 7 5800X |
| Best value, low TDP, quiet | Ryzen 7 5700X |
| Gaming primary, LLM secondary | 5800X3D |
| iGPU-only build, no dGPU | 5600G |
| Lowest absolute cost | 5700X (NOT 5600G — the perf-per-dollar isn't there for LLM) |
Bottom line
For a 2026 local-LLM build on AM4, the Ryzen 7 5800X is the best CPU pick at ~$180 used. You get 8 cores at 4.7 GHz boost, 32MB of L3, full PCIe 4.0 x24, and the strongest prefill throughput in this lineup.
If you want quieter and 65W TDP, pick the Ryzen 7 5700X and accept the 5% prefill penalty. Skip the 5600G for any serious LLM workload — its memory controller and L3 cache are the wrong shape for offload-heavy inference.
Pair whichever you pick with a RTX 3060 12GB, 32GB DDR4-3600 dual-channel, a B550 motherboard, and a 650W Gold PSU. That's a $700–$900 build that runs every 7B–9B model at q5_K_M with comfortable headroom for context and KV cache.
Related guides
- Best Budget GPU for Local LLM Inference in 2026
- Best CPU for a Local-LLM Homelab Under $300 in 2026
- Best CPU Cooler for Ryzen 7 5800X Heavy Workloads
- Ryzen 5600G vs Ryzen 7 5700X: Which for a Budget 1080p Build
- Best CPU Cooling for AMD Ryzen Builds in 2026
Citations and sources
- AMD — Ryzen 7 5800X product page
- llama.cpp GitHub — community benchmark discussions
- TechPowerUp — Ryzen 7 5800X specifications
This piece is editorial synthesis based on publicly available information. No independent first-party benchmarking is reported. Prices may vary; check the retailer listing for current availability.
