For local LLM model storage in 2026, the best SSD is a $90 1TB NVMe Gen3 (or better) drive like the WD Blue SN550 — it loads a 14B q4_K_M model in ~8 seconds vs ~38 seconds on a budget SATA SSD, and per-byte costs are now within $0.02/GB of SATA. Stick with SATA only if your motherboard has no free M.2 slot.
Why SSD choice matters for local LLM workflows
Inference itself is GPU-bound. Once a model is loaded, the SSD is idle. But everything else in the local-LLM workflow — model swapping, GGUF downloads, KV-cache writeback for offline batching, dataset shuffling for LoRA finetune — touches storage hard.
Two concrete pain points the right SSD solves:
- Model swap latency. If you switch from a 7B coder to a 14B chat model mid-task, you read 4-9GB of weights from disk. On a Gen3 NVMe that takes 5-10 seconds; on a SATA SSD it takes 25-45 seconds; on an HDD it takes 2-5 minutes.
- Bulk model download throughput. Hugging Face hosts on CloudFront with per-connection throttling. On a 1 Gbps line the bottleneck is the network, but the moment you have multiple downloads in flight a slow SSD becomes the limit.
The choice is rarely "SSD or HDD" anymore — it is "Gen3 NVMe or SATA SSD". Both are good; the gap is workflow-dependent.
Quick answer: which SSD for which use case
| Use case | Best pick | Why |
|---|---|---|
| Solo home rig, 1-3 active models | WD Blue SN550 1TB NVMe | $90, Gen3 4 lanes, 2400 MB/s — fastest swap |
| Filling every M.2 slot for a model hoard | Two-NVMe stack; consider Samsung 870 EVO 250GB SATA for cold storage | Tier hot/cold |
| Motherboard with no free M.2 | Crucial BX500 1TB SATA or SanDisk Ultra 3D 1TB | SATA without breaking the bank |
| Mass model library (>5 TB) | NVMe for the working 1TB + cheap SATA SSD or NAS for cold | Cost vs swap-time tradeoff |
| RAG / vector DB on the same box | NVMe required | Random-read latency dominates |
What an LLM workload actually does to your SSD
The blunt instrumentation breakdown:
- Cold load — sequential read of the entire model file once. Bandwidth-bound. Bigger is faster, NVMe wins hands-down.
- Mmap load — most modern runtimes (llama.cpp, vLLM) use mmap and only page in what the model actually touches. Random-read latency starts to matter for first few seconds.
- Pre-cache before benchmark runs —
dd if=model.gguf of=/dev/null bs=8Mwarms the page cache. Linear read; NVMe wins. - KV-cache flush to disk — vLLM and some chat runtimes spill long sessions to disk. Sustained-write IOPS matter here.
- LoRA finetune dataset read — datasets pass through 3-10 epochs; sustained-read IOPS dominate.
- Embedding DB writes — small random writes when ingesting a corpus into Chroma/Qdrant; latency matters more than throughput.
For workloads 1 and 3 — the ones every user hits — bandwidth is king and NVMe wins. For workloads 5-6, IOPS matter and any decent SSD beats spinning rust by 50x.
NVMe vs SATA: the numbers that actually move
Reading published sequential-read specs:
| Drive | Type | Sequential read | Random 4K read |
|---|---|---|---|
| WD Blue SN550 1TB | NVMe Gen3 x4 | ~2400 MB/s | ~270K IOPS |
| Samsung 970 EVO Plus 1TB | NVMe Gen3 x4 | ~3500 MB/s | ~600K IOPS |
| Crucial BX500 1TB | SATA III | ~540 MB/s | ~78K IOPS |
| SanDisk Ultra 3D 1TB | SATA III | ~560 MB/s | ~95K IOPS |
| Samsung 870 EVO 250GB | SATA III | ~560 MB/s | ~98K IOPS |
Translating to model-swap time on a 14B q4_K_M (~8.4 GB on disk):
| Drive | Cold-load time | Per-month time cost @ 20 swaps/day |
|---|---|---|
| WD Blue SN550 NVMe | ~7 s | ~70 min |
| Samsung 970 EVO Plus NVMe | ~5 s | ~50 min |
| Crucial BX500 SATA | ~30 s | ~5.0 hours |
| SanDisk Ultra 3D SATA | ~29 s | ~4.8 hours |
The SATA tax is 4-6 seconds per model swap. If you switch models occasionally, it does not matter; if you switch models every few minutes (agent workflows, multi-model RAG), it adds up to hours per month.
The capacity-per-dollar math
As of mid-2026, the budget tiers look like:
| Drive | $/GB at 1TB | Notes |
|---|---|---|
| WD Blue SN550 1TB NVMe | ~$0.09 | Strong all-rounder |
| Crucial BX500 1TB SATA | ~$0.07 | Lowest cost per GB |
| SanDisk Ultra 3D 1TB SATA | ~$0.08 | More endurance than BX500 |
| Samsung 870 EVO 250GB SATA | ~$0.13 | Best small SATA |
NVMe is ~$0.02/GB more than SATA at 1TB. That gap is small enough that capacity is rarely the deciding factor — the bottleneck is M.2 slot count, not price.
Tiered storage: the pattern that actually scales
If your model library is going to grow past 1-2TB, do not buy one giant NVMe — use two tiers:
- Hot tier: 1-2TB Gen3 or Gen4 NVMe for the 3-5 models you actively use. Sequential read wins.
- Cold tier: 2-4TB SATA SSD or even a 4-8TB HDD for the archive. You will not load these often, and when you do, a one-time
mvto hot is fine.
The Crucial BX500 1TB or SanDisk Ultra 3D 1TB is great for that cold tier; if you want better endurance and a known-good controller, the Samsung 870 EVO is the safer pick despite the lower capacity per drive.
SSD endurance: do you need to worry?
The honest answer: not for a single-user inference box. Even an agent loop hammering a 14B model and a 1B draft model swaps 10s of GB per day at most. Consumer SSDs are rated for 200-600 TBW (terabytes written) — at 50 GB/day that is 11-33 years.
The exception: LoRA / QLoRA finetuning writes adapter checkpoints every 100-500 steps, plus optimiser state if you check that. Active finetune projects can hit 50-200 GB/day in writes. Even there, 200 TBW lasts 1000-4000 days at the high end.
Pick a drive for its read characteristics; treat endurance as a non-issue unless you are doing constant fine-tuning.
Common pitfalls
- Mounting models on a USB-3 enclosure to save M.2 slots. USB-3 SATA bridges cap around 380 MB/s and have terrible random-read latency. Run models from internal storage; use USB for backups only.
- Putting models on a network share. SMB and NFS over gigabit are perfectly fine for downloads, ruinous for cold loads. A 9-second NVMe load becomes a 90-second NFS load.
- Filling the drive past 90%. SLC caches shrink as drives fill; sequential read can drop 30%+ on a near-full drive.
- Picking a QLC drive for swap-heavy workloads. QLC NVMe drives (e.g. Crucial P3, Samsung 870 QVO) post the same headline numbers as TLC but fall off a cliff after the SLC cache fills. For LLM workloads stick with TLC — both the SN550 and BX500 are TLC.
- Ignoring write amplification on the boot drive. A budget SATA boot drive that also hosts your models, browser cache and swap will see materially shorter life than a dedicated model SSD.
Worked example — sizing storage for a 12GB RTX 3060 build
A reasonable model hoard for a 12GB RTX 3060 in 2026:
- 5x 7-8B GGUF q4_K_M chat/coding models @ 4-5 GB each = 25 GB
- 2x 14B q4_K_M @ 8-9 GB each = 17 GB
- 1x 32B q3 (for batch jobs, partial offload) @ 13 GB = 13 GB
- Embeddings + reranker models = 3 GB
- Stable Diffusion XL + 2 LoRAs = 8 GB
- HuggingFace cache + working files = 30 GB
Total: ~100 GB. A 1TB drive gives 10x headroom for growth — exactly the sweet spot a 1TB NVMe or SATA SSD targets.
When SSD performance actually matters in a local LLM workflow
There is a useful mental model for when SSD speed shows up in your day. Inference itself does not touch the SSD at all — once the model is loaded into VRAM, the drive sits idle. Where you feel the SSD:
- Model swap during agent loops. An agent that picks the right tool model per task swaps weights frequently. On NVMe this is invisible; on SATA it adds a multi-second hiccup every swap.
- Cold start after reboot. Loading a 14B model from cold takes 7-10s on NVMe and 28-32s on SATA. Once a week this does not matter; if you reboot for kernel updates daily, it adds up.
- Hugging Face hub cache population. First-time pull of a new model: limited by network most of the time, but a slow SSD becomes the bottleneck on multi-stream downloads or local cache moves.
- vLLM prefix caching to disk. vLLM optionally spills KV-cache prefixes to disk for reuse across requests. NVMe random-read latency matters here; SATA degrades long-prompt latency by 10-30 ms.
- LoRA / QLoRA checkpoint writes. During training, sustained write IOPS matter. TLC drives with DRAM hold up; QLC drives without DRAM choke after the SLC cache fills.
The pattern: read-heavy bursty workloads favour NVMe; sustained writes favour TLC with DRAM. The intersection — TLC NVMe with DRAM, like the WD Black SN770 or Samsung 970 EVO Plus — is the sweet spot for an LLM-heavy box.
Setting up your model directory the right way
Three small choices that pay off over the life of the drive:
- Put models on a separate filesystem from your OS. A dedicated 1TB drive (NVMe or SATA) means a full Windows / Linux reinstall does not wipe your model library. Mount it as
/modelson Linux,D:\modelson Windows. - Use a stable directory layout.
/models/<provider>/<model-id>/<quant>.ggufis the layout most runtimes auto-detect. Avoid spaces and special characters in filenames. - Enable read-only mount where possible. Once you have a curated library, mounting it read-only prevents accidental delete and reduces accidental write amplification.
For multi-user shared boxes, an NFS or SMB share over 10GbE works for model distribution to multiple consuming machines — but never for live inference. Local NVMe always.
A note on Gen4 and Gen5 NVMe drives
We have stayed on Gen3 NVMe (the WD Blue SN550) as the recommendation because the marginal benefit of Gen4 or Gen5 NVMe for LLM workloads is small. Cold-load times on a 14 GB model drop from ~7 seconds on a 2400 MB/s Gen3 drive to ~4 seconds on a 7000 MB/s Gen4 drive — a real but modest improvement that costs $40-$80 more per drive. For pure LLM use the Gen3 drive is still the value pick; for users who also want fast game-load times on Direct Storage titles, Gen4 makes more sense.
Bottom line — pick by your slot situation
- Free M.2 slot + a budget for $90: WD Blue SN550 1TB NVMe is the default. Fast cold loads, low cost per GB, TLC, known-good.
- No free M.2 slot: Crucial BX500 1TB SATA at ~$70 or SanDisk Ultra 3D 1TB SATA at ~$80. Either works; SanDisk has slightly better endurance.
- Cold-tier archive next to a working NVMe: Samsung 870 EVO 250GB as a small cold-tier, or another 1TB SATA for bulk.
Related guides on SpecPicks
- Best SATA SSD for Gaming and Everyday Upgrades in 2026
- Best Storage Upgrades for Retro and Budget PC Builds in 2026
- Is 12GB VRAM Still Enough for Local LLMs in 2026?
- LM Studio on an RTX 3060 12GB: Setup and tok/s
- Ollama vs llama.cpp vs vLLM on an RTX 3060
Citations and sources
- Western Digital — WD Blue SN550 product page — sequential read/write specs and endurance.
- Crucial — BX500 product page — SATA specs, TLC NAND, endurance rating.
- Samsung — 870 EVO product page — IOPS and TBW reference for the cold-tier pick.
This piece is editorial synthesis based on publicly available information. No independent first-party benchmarking is reported.
